‹‹ Back to SVS Home

Affymetrix CNT File Format

18.4 Affymetrix CNT File Format

The Affymetrix CNT file format is a tab separated ASCII file format, in which each file represents data for one sample. Within each CNT file, the data is arranged such that each row contains all data for a given marker. Each row must contain the marker name, chromosome, and position and may contain any other information which should be associated with that marker. Currently, SVS requires a normalized copy number intensity column named Log2Ratio in addition to the name, chromosome and position columns that will be present in all CNT files.

If you have copy number data that you would like to import into SVS, you can use the CNT format to do so.

The CNT format consists of a header section, a column name section, and a data section. The beginning of each of these sections is marked with a specific token, and all sections are required. Each section is briefly described below.

Header Section

The header section of the file can contain meta-data for the file, and need not be in a tab delimited form. Any meta information about the file can be listed here, however, it is required to specify a value for the variable ChipType1, e.g., ChipType1=MappingK_Hind240. This value must appear on its own line in the header section of each file, and must match across all CNT files that you will be importing together. This value is used ensure that the CNT files that you import into HelixTree are of the same type. If you are converting your data into this format for use with HelixTree, you can set the value for ChipType1 to be your own ASCII string so long as the value is consistent over all the files that will be imported together. The start of the header section is indicated by the [Header] section token and the header continues until the column names section.

Column Names Section

The column names section contains a tab-separated list of the names of the columns contained in the file. This list should be in the same order as the data columns themselves and each file must contain the following columns:

  • ProbeSet: This will become the column name in a dataset. This column must be listed first in all files.
  • Chromosome: Chromosome associated with the marker.
  • Position: Position of the marker in the genetic marker map.
  • Log2Ratio: This represents the normalized copy number intensities, and is required for use with copy number analysis in SVS.

The beginning of the column names section is marked by the ‘[ColumnName]’ section token and continues until the data section.

Data Section

The data section contains the actual data for each marker. The data should appear as a tab separated list of values where each line represents the values for one marker. The order of the values must match the order of the columns listed in the column names section. Missing values are indicated by empty strings, i.e., two consecutive tab characters, or a tab followed by the end of the line if the missing value is in the last column.

NOTE:

  • The markers listed in the data section must be in marker map order, and the markers must appear in the same order across all input files. The start of the data section is indicated by the ‘[Data]’ section token and continues until the end of the file.
Example File

An example file might look like the following:

[Header]  
ChipType1=MyType  
[ColumnName]  
ProbeSet Chromosome Position Log2Ratio  
[Data]  
Marker1 1 2224111 0.054294  
Marker2 1 3084986 0.051188  
Marker3 2 53452 0.288990