‹‹ Back to SVS Home
Affymetrix Files
3.8 Affymetrix Files
Both SNP and CNV data from Affymetrix chips can be imported into an SVS project for analysis. SNP data can be
imported in either the CHP or CNT formats. CNV data can be imported in the CEL, CNT and CNCHP
formats.
Analysis results generated by the Affymetrix Chromosome Analysis Suite (ChAS) in the CYCHP format and CEL files generated by processing the Cytogenetics Whole-Genome Arrays can also be imported into an SVS project for analysis.
CNV Data
For the Affymetrix 500k, SNP 5.0, and SNP 6.0 arrays, the Copy Number Analysis Module (CNAM) supports reading CEL intensity files and calculating normalized log2 ratios for copy number segmentation and association analysis. For the Affymetrix 10k, 100k, and 500k arrays, you may use the Affymetrix CNAT Batch Analysis tool to create CNT files; or for the 100k, 500k, and SNP 6.0 arrays use the Genotyping Console to create CNCHP files. These files contain normalized log2 ratios and can be imported into a dataset for analysis in SVS with the CNAM module. See Extracting Affymetrix Copy Number Data for use in SVS for instructions on creating CNT or CNCHP files using Affymetrix tools. Affymetrix CEL, CNT, or CNCHP files can be imported directly into SVS versions 7.0 and higher without any additional steps.Affymetrix CHP File
SVS is able to directly import Affymetrix 10k, 100k, 500k, 5.0 and 6.0 GeneChip® mapping array, CHP files.Affymetrix Files Installation
For mapping arrays prior to the SNP 5.0 and SNP 6.0 arrays, you must have the corresponding library file installed for
each type of mapping array you want to import. SNP 5.0 and 6.0 mapping arrays do not require library files. If
you have GCOS installed, it is likely the library files are already installed in either C:/GeneChip/Library or
C:/GeneChip/Affy_Data/Library (on Windows).
If you do not have GCOS installed, or need other Library files, they can be downloaded from Affymetrix through the
NetAffx service. See Genetic Marker Maps and Affymetrix Library Files for importing library files through
NetAffx.
SVS will, by default look for mapping array library files in the
C:/Program Files/Golden Helix SVS/AffyLibraryFiles directory, or the last directory used for Library files.
There is an option for specifying the directory whenever a library file is needed for importing Affymetrix
files.
The library files available from the NetAffx service are for the final versions of these mapping arrays. If you were using an experimental early access array, you will need to get the appropriate library files from Affymetrix. All that is needed for SVS is the CDF file for the array.
Affymetrix Mapping Array Import
To import Affymetrix CHP files you can either select all of the files to import using Add Files from the Import CHP
Files... dialog, or add an entire directory by choosing Add Directory. If you wish to remove CHP files from the list in the
File box, select the files to remove and click Remove. Multiple selection is allowed by <Shift>-left-click to select a block of
files or by <Ctrl>-left-click to select individual files. See Figure 10.
NOTE:
- The “100k” and “500k” arrays, are composed of two “50k” and two “250k” chips, respectively, with their corresponding library files. These need to be imported separately, and joined together from the spreadsheet file menu. See Joining or Merging Spreadsheets for instructions on how to join the datasets from the two chips.
The other options in this import window include specifying a dataset name, changing the library path, and filtering calls
based on confidence score (p-value). When importing CHP files from SNP 5.0 or 6.0 arrays the library file location can be
ignored, as it is not be needed for the import process.
If you wish to use a different threshold for the confidence score, check the box and fill in the desired confidence score (a
number between 0 and 1). Changing the confidence score is only valid for certain, more recent file types such as 100k or 500k
CHP files. During the import process SVS will screen whether changing the confidence score is valid for your particular
files.
NOTE:
- The Affymetrix CHP files do not contain phenotypic information about individuals. This data must be imported
separately. When doing so, make sure that the label column for those individuals matches the CHP file identifier
in the spreadsheet. From either spreadsheet, you can join on column labels to get a combined spreadsheet. See
Joining or Merging Spreadsheets for more information on joining spreadsheets.
- You may wish to import marker map information for the mapping array dataset. Annotation data can be retrieved from Affymetrix NetAffx service, with appropriate login privileges. See Genetic Marker Maps and Affymetrix Library Files for instructions on how to obtain annotation data from NetAffx.
Affymetrix CEL Files
The Affymetrix CEL import tool reads CEL intensity files, normalizes the intensity values against the chosen or default
reference samples, and imports the normalized log2 ratios into SVS. The methodology for calculating and
normalizing log2 ratios from the CEL files is described in the Quantile Normalization of Affymetrix CEL Files
section.
From the Import Affymetrix CEL file dialog (see Figure 12), first select the CEL files you want to include in the dataset.
For Mapping 500k data, you must select files from both the NSP and STY arrays for each sample. To select CEL files, click
the Add Files button and use the file browser to select multiple CEL files. The CEL files you selected will
appear in the CEL import dialog window. You may add all of the CEL files in a directory by using the Add
Directory button. To remove CEL files from the window, select the unwanted samples and click Remove. You
may continue adding CEL files by clicking the Add Files or the Add Directory buttons again. Multiple
selection is allowed by <Shift>-left-click to select a block of files or by <Ctrl>-left-click to select several individual
files.
In the next window, specific import options can be specified.
For the import of Mapping 500k CEL files, a matching spreadsheet containing the file names must be available in SVS.
This spreadsheet will tell the CEL import tool how to join the NSP and STY samples together to create one sample per
patient. The matching spreadsheet should have a row label column and at least two data columns. The row labels should be
the sample names. The first and second columns should be the NSP and STY file names. Other columns in
the dataset are optional but may contain the reference status for the sample. For Mapping 500k CEL file
import, check the 500k NSP/STY Matching box and select the matching spreadsheet by clicking on Select
Sheet.
The default reference set includes all samples. Another option is to select a subset from a spreadsheet containing the
Reference Status for the samples. The row labels should match the sample names. For the SNP 5.0 and SNP 6.0 Array, the
row labels should be the file names of the CEL files with the CEL extension removed. The reference status column should
contain 0’s and 1’s where 0 denotes reference and 1 denotes non-reference status. All of the samples will be normalized
against the reference samples. When a spreadsheet is selected, the 0=Ref 1=Non-Ref Column drop down box will contain the
names of columns of binary data in the selected spreadsheet. Select the name of the column to be used as the reference
status.
You also have the option to omit samples with the reference designation from the final output spreadsheet. To do this,
check the appropriately named box. If this option is selected, reference samples will be used in normalization of data and
calculation of LogR values, but will not be included in the output spreadsheet.
Another reference set option is to use HapMap precomputed populations. All 270 samples or an ethnic subset can be
used.
A Marker Map needs to be selected for use in the analysis. Probes that are not contained in the marker map will not be
imported. In other words, if the marker map does not contain copy number probes and the CEL files do, those probes will
not be in the resulting spreadsheet. The CEL files are scanned prior to this dialog and the appropriate marker map will be
detected and auto-downloaded. If the marker map has already been downloaded, navigate and select it by clicking on Select
Marker Map.
The Library Path where the CDF library files are located is also automatically detected. The directory
can be changed by clicking on View Library Folder. The library files should contain both SNP and CN
Probes.
You may optionally select a temporary directory where intermediate DSF files will be stored. If your project is
located on a shared network drive, for performance reasons you should specify a Temp Directory on a local
disk.
Output options include both A and B alleles before quantile normalization, after quantile normalization and before the
log ratios are computed, and the LogR ratios with samples column wise and row wise.
The name of the Dataset can be specified at this time, the default is to name the dataset “Affy CEL Dataset”.
NOTE:
- Affymetrix recommends using at least 25 samples as references in un-paired copy number analysis.
- The gender of the reference samples should be considered for copy number analysis of the X and/or Y chromosomes.
- The CEL conversion process will take several hours to complete.
- NSP files can be imported without the corresponding STY files (or vice-versa), to do so select only the NSP files, do not use a matching spreadsheet, and make sure that the row labels in the reference sheet match the CEL file names exactly.
Affymetrix CNT Files
The Affymetrix CNT import tool converts multiple CNT files into one aggregate spreadsheet that contains the log2 ratio
values in a format ready to be used for analysis. CNT files can be created for the Mapping 10k, 100k, and 500k arrays or for
any copy number data that can be converted into a text file. See Creating CNT Files using the Affymetrix
CNAT Batch Analysis Tool and Affymetrix CNT File Format for information on creating Affymetrix CNT
files.
From the Import CNT Files... dialog (see Figure 13), you can click Add Files to select CNT files to convert. This will
open a file chooser where you can select one or more CNT files. The CNT files you selected will appear in the CNT file
convert window. You can add all of the files in a directory by clicking Add Directory. To remove CNT files from the
window, select the unwanted files and click Remove. You may continue adding CNT files by clicking the Add Files button
again. Files cannot be added more than once, but files with the same name stored in different locations may be added to the
same import.
You can also change the name of the dataset at this time.
NOTE:
- Row labels in the output spreadsheet will be determined by the file names, so files with the same name stored in different locations will have the same row labels.
Affymetrix CNCHP Files
The Affymetrix CNCHP import tool converts multiple CNCHP files into one aggregate spreadsheet containing the log2
ratio values in a format ready to be used for analysis. CNCHP files can be created for the Mapping 100k, 500k, and SNP 6.0
arrays. See Creating CNCHP Files Using Affymetrix Genotyping Console 2.0 for information on creating Affymetrix CNCHP
files.
From the Import CNCHP Files... dialog (see Figure 14), you can click Add Files to select CNCHP files to convert. This
will open a file chooser where you can select one or more CNCHP files. The CNCHP files you selected will appear in the
CNCHP file convert window. You can add all of the files in a directory by clicking Add Directory. To remove CNCHP files
from the window, select the unwanted files and click Remove. You may continue adding CNCHP files by clicking the Add
Files button again. Files cannot be added more than once, but files with the same name stored in different locations may be
added to the same import.
You can also change the name of the dataset at this time.
NOTE:
- Row labels in the output spreadsheet will be determined by the file names, so files with the same name stored in different locations will have the same row labels.
Affymetrix CYCHP Files
The Affymetrix CYCHP import tool converts multiple CYCHP files into aggregate spreadsheets containing one or more of the possible datasets contained within CYCHP files. The possible datasets are:
- Log2Ratio: Creates a spreadsheet containing values that are the ratio of signal to median of Reference signal for every sample/marker pair in the array.
- CN Segments: Creates a spreadsheet listing the segments for the CN dataset. The ‘Value’ column contains the Copy Number State. Also reported is the confidence, an indicator score for non-normal copy numbers.
- LOH Segments: Creates a spreadsheet listing the segments for the LOH dataset. The value of the ‘Value’ column is 1 when a Loss of Heterozygosity is found, and 0 when not found. Also reported is the confidence, the ratio of the probability of the SCAR measurements under the LOH model to the sum of the probability under each of the LOH and non-LOH models.
- Normal Diploid Segments: Creates a spreadsheet listing the segments for the Normal Diploid dataset. The value of the ‘Value’ column is 1 when CN State is 2 and LOH is 0. Otherwise, the value of this column is 0. Also reported is the confidence, the ratio of the probability of the SCAR measurements under the LOH model to the sum of the probability under each of the LOH and non-LOH models.
- CN Neutral LOH Segments: Creates a spreadsheet listing the segments for the CN Neutral LOH dataset. The value of the ‘Value’ column is 1 when CN State is 2 and LOH is 1. Otherwise, the value of this column is 0. Also reported is the confidence, the ratio of the probability of the SCAR measurements under the LOH model to the sum of the probability under each of the LOH and non-LOH models.
- Mosaicism Segments: Creates a spreadsheet listing the segments for the Mosaicism dataset. The ‘Value’ column contains the Copy Number State. Also reported is the confidence and mosaicism. Confidence is the proportion of markers that are above or below the thresholds required to make a CN change call for a running median segment size of 251. The value of the ‘Mosaicism’ column is 1 if more than one CN call was found in the segment.
From the Import CYCHP Files... dialog (see Figure 15), you can click Add Files to select CYCHP files to convert. This
will open a file chooser where you can select one or more CYCHP files. The CYCHP files you selected will appear in the
CYCHP file convert window. You can add all of the files in a directory by clicking Add Directory. To remove CYCHP files
from the window, select the unwanted files and click Remove. You may continue adding CYCHP files by clicking the Add
Files button again. Files cannot be added more than once, but files with the same name stored in different locations may be
added to the same import.
You can also change the name of the dataset at this time.
Select the output datasets to create as well as indicate whether or not CN segment covariate spreadsheets should be
created. The covariates spreadsheets are defined below.
- One column per marker: A new column is created for every marker present in the all segments for all samples. Column headers are marker names.
- First column of each segment: A new column is created every time there is a new segment value for any one sample over all the samples. This creates common segments for all samples, although for a particular sample there may be more columns than there are segments. In the case where a new column is introduced but the segment value has not changed, then the segment value is repeated fro all columns in a segment. Column headers are the marker names of the first marker in each segment.