‹‹ Back to SVS Home

Text File

3.2 Text File

Text files such as comma, space or tab delimited files that are saved with extensions .csv, .txt, or .dat can be imported into SVS using the Import Text File dialog. This dialog has two tabs associated with it, Input File, which specifies general dataset import options, and Advanced Options. See Figure 6.


[Picture]

Figure 6: Text File Import Window

  • Input File: After opening the Import Text window, select a file by clicking on the Browse button, which allows you to navigate your file system.

    You must specify how a file is delimited in order to properly import the file. If the wrong delimiter is specified, a warning message will indicate the file may be using a different delimiter.

    The dataset name may be given at this time. This name will be applied to both the dataset node as well as the spreadsheet viewer node. The spreadsheet and parent node can be renamed after import.

    If the text file has a row label column, that column can be specified, or generic row labels can be created. A row label is generally a sample name or information to identify a row and not generally used for analysis. If a text file is imported with the wrong column specified for row labels this can be changed by using the Spreadsheet Editor (see Editing the Row Label Header and Row Labels) without needing to re-import the file. The default is to use the first column of data as row labels.

  • Advanced Options: The Advanced Options Tab allows you to specify a custom encoding list for missing data if your text file uses different characters besides an empty field, period, comma, ?,  - (dash) or  --- (three dashes). In the custom encoding box, enter a whitespace delimited list of missing value encodings. This list will overwrite the built-in missing encoding list except for the empty field.

    [Picture]
    Figure 7: Text File Import Window - Advanced Options

    If genotype data exists in the text file you can specify whether or not the program should read the data as genotypic. If you un-check the “Read Genotypic Data” box then all genotype data will be read as categorical. The allele delimiter character can be specified by choosing from the drop-down menu or by choosing “Other ->” and indicating the character in the text box to the right of the menu. If there are non-genotype fields that have an underscore in the field, these columns will be read as genotypic. These columns can be changed to categorical using the spreadsheet editor after the file is imported into the project. The default behavior is to read all fields containing an underscore as genotype data. Columns with all missing values can be encoded as Genotypic by checking “Encode columns with all missing data as genotypic.”

    Header lines can be skipped by checking the “Skip” box and selecting the number of rows to skip. The default is to not skip any header lines.

    The Base numeric type default is Boolean. This means that if a column of all 0’s was detected, it would be encoded as Boolean. You can also choose Integer, Single or Double precision float for the default type. There is also an option to encode real columns with single precision floats (as opposed to double precision). The values would then be stored in 4 bytes rather than 8 bytes.