General Considerations

HelixTree is able to easily import data from a wide variety of spreadsheets, relational databases, data files generated by other software packages, etc.

In addition to spreadsheets and databases, HelixTree also imports family-based data in FBAT/PBAT Pedigree and Phenotype formats and in text format.

Except for FBAT/PBAT-format data, all data that is imported must ultimately conform to a tabular layout, where the first row contains labels identifying all of the columns, and the subsequent rows contain the data that is to be analyzed.

Each column of data is expected to contain data of the same type. A data file may optionally have one column that contain labels identifying each of the rows. This column is referred to as the label column.

It is not required that every data cell contain data. Rather, HelixTree is able to successfully deal with missing data. Missing data may be represented by a question mark (?), a period (.), one minus sign (-) or three minus signs (—). An exception to this is in the FBAT/PBAT data format and the pedigree columns for text pedigree format, where a zero (0) denotes missing data. In non-white-space-delimited files, missing data may also be represented by a blank space or an empty field.

For formats other than the FBAT/PBAT format and the pedigree columns for text pedigree format, HelixTree will automatically determine the type of the data being read at the time the data is being imported using the rules listed in the following tables:


Data Types Description
Binary True or false, Boolean type data. This must be stored as a 0 for false and a 1 for true. No other value, including missing data, is allowed.
Integer Whole numbers such as 1, 7 or 256. Missing values are allowed. Note that if the entire data set consist of 1s and 0 it will be converted as a binary.
Double or Real Floating point numbers. Missing values are allowed. These numbers can have fractions and may look like 3.3, 1.2e10 or 9.0. If integer values are mixed with floating point values, the entire column will be considered as double/real.
Categorical Non-numeric data, usually text.
Genetic A genetic variable is used to represent genetic marker genotypes. It is of the form "UsepV”, where "U” and "V" may be sequence of any characters except the separator (sep) character. For a specific data set, the separator character can be only one of the following: underscore ’_’, forward slash ’/’, comma ’,’, or blank ’ ’. Missing genetic markers are represented as "?sep?”, "?”, or a period. The genetic format can also be represented within a (non-comma-delimited) file as U,V. If all entries in a column conform to one of these formats, the column is considered genetic. Linkage disequilibrium and Hardy Weinberg equilibrium plots can be computed when genetic data is imported in this format. Furthermore, haplotype analysis is only possible on the genetic data type.