Affymetrix CNT File Format
The Affymetrix CNT file format is a tab separated ASCII file format, in which each file represents data for one sample. Within each CNT file, the data is arranged such that each row contains all data for a given marker. Each row must contain the marker name, chromosome, and position and may contain any other information which should be associated with that marker. Currently, HelixTree requires a normalized copy number intensity column named Log2Ratio in addition to the name, chromosome and position columns that will be present in all CNT files.
If you have copy number data that you would like to import into HelixTree, you can use the CNT format to do so. For instructions on how to convert the CNT files to Copy Number DSF files, see section 4.4.1.2.
The CNT format consists of a header section, a column name section, and a data section. The beginning of each of these sections is marked with a specific token, and all sections are required. Each section is briefly described below.
C.4.1 Header Section
The header section of the file can contain meta-data for the file, and need not be in a tab delimited form. Any meta information about the file can be listed here, however, it is required to specify a value for the variable ChipType1, e.g., ChipType1=MappingK_Hind240. This value must appear on its own line in the header section of each file, and must match across all CNT files that you will be importing together. This value is used ensure that the CNT files that you import into HelixTree are of the same type. If you are converting your data into this format for use with HelixTree, you can set the value for ChipType1 to be your own ASCII string so long as the value is consistent over all the files that will be imported together. The start of the header section is indicated by the ’[Header]’ section token and the header continues until the column names section.
C.4.2 Column Names Section
The column names section contains a tab-separated list of the names of the columns contained in the file. This list should be in the same order as the data columns themselves and each file must contain the following columns:
- ProbeSet – this will become the column name in any HelixTree data sets. This column must be listed first in all files.
- Chromosome – the chromosome associated with the marker.
- Position – the marker map position of the marker.
- Log2Ratio – this represents the normalized copy number intensities, and is required for use with HelixTree.
The beginning of the column names section is marked by the ’[ColumnName]’ section token and continues until the data section.
C.4.3 Data Section
The data section contains the actual data for each marker. The data should appear as a tab separated list of values where each line represents the values for one marker. The order of the values must match the order of the columns listed in the column names section. Missing values are indicated by empty strings, i.e., two consecutive tab characters, or a tab followed by the end of the line if the missing value is in the last column. Note: The markers listed in the data section must be in marker map order, and the markers must appear in the same order accross all input files. The start of the data section is indicated by the ’[Data]’ section token and continues until the end of the file.
C.4.4 Example File
An example file might look like the following:
[Header]
ChipType1=MyType [ColumnName] ProbeSet Chromosome Position Log2Ratio [Data] Marker1 1 2224111 0.054294 Marker2 1 3084986 0.051188 Marker3 2 53452 0.288990 |