4.1 General Considerations
ChemTree is able to easily import data from a wide variety of spreadsheets, relational databases, data files generated by other software packages, etc.
All data that is imported must ultimately conform to a tabular layout, where the first row contains labels identifying all of the columns, and the subsequent rows contain the data that is to be analyzed.
Each column of data is expected to contain data of the same type. A data file may optionally have one column that contain labels identifying each of the rows. This column is referred to as the label column.
It is not required that every data cell contain data. Rather, ChemTree is able to successfully deal with missing data. Missing data may be represented by a question mark (?), a period (.), one minus sign (-) or three minus signs (—). In non-white-space-delimited files, missing data may also be represented by a blank space or an empty field.
ChemTree will automatically determine the type of the data being read at the time the data is being imported using the rules listed in the following tables:
| Data Types | Description |
|---|---|
| Binary | True or false, Boolean type data. This must be stored as a 0 for false and a 1 for true. No other value, including missing data, is allowed. |
| Integer | Whole numbers such as 1, 7 or 256. Missing values are allowed. Note that if the entire data set consist of 1s and 0s, it will be converted to the binary data type. |
| Double or Real | These numbers can have fractional parts and may look like 3.3, 1.2e10 or 9.0. If integer values are mixed with floating point values, the entire column will be considered as double or real. Missing values are allowed. |
| Categorical | Non-numeric data, usually text. |