‹‹ Back to SVS Home
Affymetrix CNT File Format
18.4 Affymetrix CNT File Format
The Affymetrix CNT file format is a tab separated ASCII file format, in which each file represents data for one sample.
Within each CNT file, the data is arranged such that each row contains all data for a given marker. Each row
must contain the marker name, chromosome, and position and may contain any other information which
should be associated with that marker. Currently, SVS requires a normalized copy number intensity column
named Log2Ratio in addition to the name, chromosome and position columns that will be present in all CNT
files.
If you have copy number data that you would like to import into SVS, you can use the CNT format to do
so.
The CNT format consists of a header section, a column name section, and a data section. The beginning of each of
these sections is marked with a specific token, and all sections are required. Each section is briefly described
below.
Header Section
The header section of the file can contain meta-data for the file, and need not be in a tab delimited form. Any meta
information about the file can be listed here, however, it is required to specify a value for the variable ChipType1, e.g.,
ChipType1=MappingK_Hind240. This value must appear on its own line in the header section of each file, and must match
across all CNT files that you will be importing together. This value is used ensure that the CNT files that you import into
HelixTree are of the same type. If you are converting your data into this format for use with HelixTree, you can set the value
for ChipType1 to be your own ASCII string so long as the value is consistent over all the files that will be imported together.
The start of the header section is indicated by the [Header] section token and the header continues until the column names
section.
Column Names Section
The column names section contains a tab-separated list of the names of the columns contained in the file. This list should be in the same order as the data columns themselves and each file must contain the following columns:
- ProbeSet: This will become the column name in a dataset. This column must be listed first in all files.
- Chromosome: Chromosome associated with the marker.
- Position: Position of the marker in the genetic marker map.
- Log2Ratio: This represents the normalized copy number intensities, and is required for use with copy number analysis in SVS.
The beginning of the column names section is marked by the ‘[ColumnName]’ section token and continues until the data
section.
Data Section
The data section contains the actual data for each marker. The data should appear as a tab separated list of values where each line represents the values for one marker. The order of the values must match the order of the columns listed in the column names section. Missing values are indicated by empty strings, i.e., two consecutive tab characters, or a tab followed by the end of the line if the missing value is in the last column.
NOTE:
- The markers listed in the data section must be in marker map order, and the markers must appear in the same order across all input files. The start of the data section is indicated by the ‘[Data]’ section token and continues until the end of the file.
Example File
An example file might look like the following:
[Header]
ChipType1=MyType [ColumnName] ProbeSet Chromosome Position Log2Ratio [Data] Marker1 1 2224111 0.054294 Marker2 1 3084986 0.051188 Marker3 2 53452 0.288990 |