‹‹ Back to SVS Home

Affymetrix Files

3.8 Affymetrix Files

Both SNP and CNV data from Affymetrix chips can be imported into an SVS project for analysis. SNP data can be imported in either the CHP or CNT formats. CNV data can be imported in the CEL, CNT and CNCHP formats.

Analysis results generated by the Affymetrix Chromosome Analysis Suite (ChAS) in the CYCHP format and CEL files generated by processing the Cytogenetics Whole-Genome Arrays can also be imported into an SVS project for analysis.

CNV Data

For the Affymetrix 500k, SNP 5.0, and SNP 6.0 arrays, the Copy Number Analysis Module (CNAM) supports reading CEL intensity files and calculating normalized log2 ratios for copy number segmentation and association analysis. For the Affymetrix 10k, 100k, and 500k arrays, you may use the Affymetrix CNAT Batch Analysis tool to create CNT files; or for the 100k, 500k, and SNP 6.0 arrays use the Genotyping Console to create CNCHP files. These files contain normalized log2 ratios and can be imported into a dataset for analysis in SVS with the CNAM module. See Extracting Affymetrix Copy Number Data for use in SVS for instructions on creating CNT or CNCHP files using Affymetrix tools. Affymetrix CEL, CNT, or CNCHP files can be imported directly into SVS versions 7.0 and higher without any additional steps.

Affymetrix CHP File

SVS is able to directly import Affymetrix 10k, 100k, 500k, 5.0 and 6.0 GeneChip® mapping array, CHP files.

Affymetrix Files Installation

For mapping arrays prior to the SNP 5.0 and SNP 6.0 arrays, you must have the corresponding library file installed for each type of mapping array you want to import. SNP 5.0 and 6.0 mapping arrays do not require library files. If you have GCOS installed, it is likely the library files are already installed in either C:/GeneChip/Library or C:/GeneChip/Affy_Data/Library (on Windows).

If you do not have GCOS installed, or need other Library files, they can be downloaded from Affymetrix through the NetAffx service. See Genetic Marker Maps and Affymetrix Library Files for importing library files through NetAffx.

SVS will, by default look for mapping array library files in the
C:/Program Files/Golden Helix SVS/AffyLibraryFiles directory, or the last directory used for Library files. There is an option for specifying the directory whenever a library file is needed for importing Affymetrix files.

The library files available from the NetAffx service are for the final versions of these mapping arrays. If you were using an experimental early access array, you will need to get the appropriate library files from Affymetrix. All that is needed for SVS is the CDF file for the array.

Affymetrix Mapping Array Import


[Picture]

Figure 10: Affymetrix CHP File Import Window

To import Affymetrix CHP files you can either select all of the files to import using Add Files from the Import CHP Files... dialog, or add an entire directory by choosing Add Directory. If you wish to remove CHP files from the list in the File box, select the files to remove and click Remove. Multiple selection is allowed by <Shift>-left-click to select a block of files or by <Ctrl>-left-click to select individual files. See Figure 10.

NOTE:

  • The “100k” and “500k” arrays, are composed of two “50k” and two “250k” chips, respectively, with their corresponding library files. These need to be imported separately, and joined together from the spreadsheet file menu. See Joining or Merging Spreadsheets for instructions on how to join the datasets from the two chips.

The other options in this import window include specifying a dataset name, changing the library path, and filtering calls based on confidence score (p-value). When importing CHP files from SNP 5.0 or 6.0 arrays the library file location can be ignored, as it is not be needed for the import process.

If you wish to use a different threshold for the confidence score, check the box and fill in the desired confidence score (a number between 0 and 1). Changing the confidence score is only valid for certain, more recent file types such as 100k or 500k CHP files. During the import process SVS will screen whether changing the confidence score is valid for your particular files.

NOTE:

  1. The Affymetrix CHP files do not contain phenotypic information about individuals. This data must be imported separately. When doing so, make sure that the label column for those individuals matches the CHP file identifier in the spreadsheet. From either spreadsheet, you can join on column labels to get a combined spreadsheet. See Joining or Merging Spreadsheets for more information on joining spreadsheets.
  2. You may wish to import marker map information for the mapping array dataset. Annotation data can be retrieved from Affymetrix NetAffx service, with appropriate login privileges. See Genetic Marker Maps and Affymetrix Library Files for instructions on how to obtain annotation data from NetAffx.

Affymetrix CEL Files

The Affymetrix CEL import tool reads CEL intensity files, normalizes the intensity values against the chosen or default reference samples, and imports the normalized log2 ratios into SVS. The methodology for calculating and normalizing log2 ratios from the CEL files is described in the Quantile Normalization of Affymetrix CEL Files section.


[Picture]

Figure 11: Affymetrix CEL File Import Window

From the Import Affymetrix CEL file dialog (see Figure 12), first select the CEL files you want to include in the dataset. For Mapping 500k data, you must select files from both the NSP and STY arrays for each sample. To select CEL files, click the Add Files button and use the file browser to select multiple CEL files. The CEL files you selected will appear in the CEL import dialog window. You may add all of the CEL files in a directory by using the Add Directory button. To remove CEL files from the window, select the unwanted samples and click Remove. You may continue adding CEL files by clicking the Add Files or the Add Directory buttons again. Multiple selection is allowed by <Shift>-left-click to select a block of files or by <Ctrl>-left-click to select several individual files.

In the next window, specific import options can be specified.


[Picture]

Figure 12: Affymetrix CEL File Import Window

For the import of Mapping 500k CEL files, a matching spreadsheet containing the file names must be available in SVS. This spreadsheet will tell the CEL import tool how to join the NSP and STY samples together to create one sample per patient. The matching spreadsheet should have a row label column and at least two data columns. The row labels should be the sample names. The first and second columns should be the NSP and STY file names. Other columns in the dataset are optional but may contain the reference status for the sample. For Mapping 500k CEL file import, check the 500k NSP/STY Matching box and select the matching spreadsheet by clicking on Select Sheet.

The default reference set includes all samples. Another option is to select a subset from a spreadsheet containing the Reference Status for the samples. The row labels should match the sample names. For the SNP 5.0 and SNP 6.0 Array, the row labels should be the file names of the CEL files with the CEL extension removed. The reference status column should contain 0’s and 1’s where 0 denotes reference and 1 denotes non-reference status. All of the samples will be normalized against the reference samples. When a spreadsheet is selected, the 0=Ref 1=Non-Ref Column drop down box will contain the names of columns of binary data in the selected spreadsheet. Select the name of the column to be used as the reference status.

You also have the option to omit samples with the reference designation from the final output spreadsheet. To do this, check the appropriately named box. If this option is selected, reference samples will be used in normalization of data and calculation of LogR values, but will not be included in the output spreadsheet.

Another reference set option is to use HapMap precomputed populations. All 270 samples or an ethnic subset can be used.

A Marker Map needs to be selected for use in the analysis. Probes that are not contained in the marker map will not be imported. In other words, if the marker map does not contain copy number probes and the CEL files do, those probes will not be in the resulting spreadsheet. The CEL files are scanned prior to this dialog and the appropriate marker map will be detected and auto-downloaded. If the marker map has already been downloaded, navigate and select it by clicking on Select Marker Map.

The Library Path where the CDF library files are located is also automatically detected. The directory can be changed by clicking on View Library Folder. The library files should contain both SNP and CN Probes.

You may optionally select a temporary directory where intermediate DSF files will be stored. If your project is located on a shared network drive, for performance reasons you should specify a Temp Directory on a local disk.

Output options include both A and B alleles before quantile normalization, after quantile normalization and before the log ratios are computed, and the LogR ratios with samples column wise and row wise.

The name of the Dataset can be specified at this time, the default is to name the dataset “Affy CEL Dataset”.

NOTE:

  1. Affymetrix recommends using at least 25 samples as references in un-paired copy number analysis.
  2. The gender of the reference samples should be considered for copy number analysis of the X and/or Y chromosomes.
  3. The CEL conversion process will take several hours to complete.
  4. NSP files can be imported without the corresponding STY files (or vice-versa), to do so select only the NSP files, do not use a matching spreadsheet, and make sure that the row labels in the reference sheet match the CEL file names exactly.

Affymetrix CNT Files

The Affymetrix CNT import tool converts multiple CNT files into one aggregate spreadsheet that contains the log2 ratio values in a format ready to be used for analysis. CNT files can be created for the Mapping 10k, 100k, and 500k arrays or for any copy number data that can be converted into a text file. See Creating CNT Files using the Affymetrix CNAT Batch Analysis Tool and Affymetrix CNT File Format for information on creating Affymetrix CNT files.


[Picture]

Figure 13: CNT File Import Window

From the Import CNT Files... dialog (see Figure 13), you can click Add Files to select CNT files to convert. This will open a file chooser where you can select one or more CNT files. The CNT files you selected will appear in the CNT file convert window. You can add all of the files in a directory by clicking Add Directory. To remove CNT files from the window, select the unwanted files and click Remove. You may continue adding CNT files by clicking the Add Files button again. Files cannot be added more than once, but files with the same name stored in different locations may be added to the same import.

You can also change the name of the dataset at this time.

NOTE:

  • Row labels in the output spreadsheet will be determined by the file names, so files with the same name stored in different locations will have the same row labels.

Affymetrix CNCHP Files

The Affymetrix CNCHP import tool converts multiple CNCHP files into one aggregate spreadsheet containing the log2 ratio values in a format ready to be used for analysis. CNCHP files can be created for the Mapping 100k, 500k, and SNP 6.0 arrays. See Creating CNCHP Files Using Affymetrix Genotyping Console 2.0 for information on creating Affymetrix CNCHP files.


[Picture]

Figure 14: Affymetrix CNCHP File Import Window

From the Import CNCHP Files... dialog (see Figure 14), you can click Add Files to select CNCHP files to convert. This will open a file chooser where you can select one or more CNCHP files. The CNCHP files you selected will appear in the CNCHP file convert window. You can add all of the files in a directory by clicking Add Directory. To remove CNCHP files from the window, select the unwanted files and click Remove. You may continue adding CNCHP files by clicking the Add Files button again. Files cannot be added more than once, but files with the same name stored in different locations may be added to the same import.

You can also change the name of the dataset at this time.

NOTE:

  • Row labels in the output spreadsheet will be determined by the file names, so files with the same name stored in different locations will have the same row labels.

Affymetrix CYCHP Files

The Affymetrix CYCHP import tool converts multiple CYCHP files into aggregate spreadsheets containing one or more of the possible datasets contained within CYCHP files. The possible datasets are:

  • Log2Ratio: Creates a spreadsheet containing values that are the ratio of signal to median of Reference signal for every sample/marker pair in the array.
  • CN Segments: Creates a spreadsheet listing the segments for the CN dataset. The ‘Value’ column contains the Copy Number State. Also reported is the confidence, an indicator score for non-normal copy numbers.
  • LOH Segments: Creates a spreadsheet listing the segments for the LOH dataset. The value of the ‘Value’ column is 1 when a Loss of Heterozygosity is found, and 0 when not found. Also reported is the confidence, the ratio of the probability of the SCAR measurements under the LOH model to the sum of the probability under each of the LOH and non-LOH models.
  • Normal Diploid Segments: Creates a spreadsheet listing the segments for the Normal Diploid dataset. The value of the ‘Value’ column is 1 when CN State is 2 and LOH is 0. Otherwise, the value of this column is 0. Also reported is the confidence, the ratio of the probability of the SCAR measurements under the LOH model to the sum of the probability under each of the LOH and non-LOH models.
  • CN Neutral LOH Segments: Creates a spreadsheet listing the segments for the CN Neutral LOH dataset. The value of the ‘Value’ column is 1 when CN State is 2 and LOH is 1. Otherwise, the value of this column is 0. Also reported is the confidence, the ratio of the probability of the SCAR measurements under the LOH model to the sum of the probability under each of the LOH and non-LOH models.
  • Mosaicism Segments: Creates a spreadsheet listing the segments for the Mosaicism dataset. The ‘Value’ column contains the Copy Number State. Also reported is the confidence and mosaicism. Confidence is the proportion of markers that are above or below the thresholds required to make a CN change call for a running median segment size of 251. The value of the ‘Mosaicism’ column is 1 if more than one CN call was found in the segment.

[Picture]

Figure 15: Affymetrix CYCHP File Import Window

From the Import CYCHP Files... dialog (see Figure 15), you can click Add Files to select CYCHP files to convert. This will open a file chooser where you can select one or more CYCHP files. The CYCHP files you selected will appear in the CYCHP file convert window. You can add all of the files in a directory by clicking Add Directory. To remove CYCHP files from the window, select the unwanted files and click Remove. You may continue adding CYCHP files by clicking the Add Files button again. Files cannot be added more than once, but files with the same name stored in different locations may be added to the same import.

You can also change the name of the dataset at this time.

Select the output datasets to create as well as indicate whether or not CN segment covariate spreadsheets should be created. The covariates spreadsheets are defined below.

  • One column per marker: A new column is created for every marker present in the all segments for all samples. Column headers are marker names.
  • First column of each segment: A new column is created every time there is a new segment value for any one sample over all the samples. This creates common segments for all samples, although for a particular sample there may be more columns than there are segments. In the case where a new column is introduced but the segment value has not changed, then the segment value is repeated fro all columns in a segment. Column headers are the marker names of the first marker in each segment.