Step 1. Generate Log Ratios Versus Reference Samples

Before you can perform copy number analysis, you first need a DSF file containing log2 ratios (from now on referred to as LogRs) created by normalizing raw intensity data against a reference sample. CNAM offers direct support to create LogR DSF files from Illumina and Affymetrix platforms with additional functionality to create them from other providers. This tutorial will focus on preparing a LogR DSF file from Affymetrix CEL files. To learn how to create DSF files from Affymetrix CNT or CNCHP files, Illumina data or data from other providers, see the following sections in the manual.
›› Importing Affymetrix CNT or CNCHP files
›› Importing Illumina data
The workflow CNAM uses to generate normalized LogRs from Affymetrix 500K, SNP 5.0 and 6.0 CEL files is analogous to the methodology employed by Affymetrix's Genotyping Console 2. However, CNAM can perform quantile normalization without gender bias, scale to handle thousands of samples, and allows greater flexibility in choosing a reference set. To learn more about the methodology CNAM processes Affymetrix CEL files click here.
Preparing Files Needed to Process Affymetrix CEL files
Before CNAM can process CEL files, the following are needed:
- Spreadsheet matching NSP and STY (for 500K data only)
- Spreadsheet with CEL file names and column indicating reference status
- Affymetrix marker maps
- Affymetrix library files
Spreadsheet matching NSP and STY (for 500K only)
To properly process both the NSP and STY CEL files, a spreadsheet matching the two needs to be imported into HelixTree. This spreadsheet will tell the CEL file import tool how to join the NSP and STY samples together to create one sample per patient in the DSF file.
IMPORTANT: The matching spreadsheet must have a row label column and at least two data columns. The row labels must be the sample names. The first and second columns must be the NSP file names and the STY file names that are to be joined together, respectively. Other columns in the data set are optional and may contain the reference status for the sample. If you include the reference status in this spreadsheet, you can also use this spreadsheet to indicate reference status (see below).

set to 1.
The easiest way to import this file into HelixTree is to create a CSV file with the appropriate columns and then select >File >Import Data >Import ASCII File from the Project Navigator.
IMPORTANT: Make sure to enter the Row Label Column Number representing your sample names.
The imported spreadsheet should resemble the image below.
the NSP and STY arrays for 500K analysis.
Spreadsheet with CEL file names and column indicating reference status
This spreadsheet needs two columns, sample names (as row labels) and reference status. For the SNP 5.0 and SNP 6.0 Array,
the row labels should be the file names of the CEL files with the “.CEL” extension removed. “0s” should denote samples to be
used as references and “1s” should denote non-references.
Note: It is up to the researcher to finalize a reference strategy. In CNAM you can use any external or internal samples as your reference set. Affymetrix recommends using at least 25 samples as references in un-paired copy number analysis. As discussed later, if using an external reference set, these samples can be dropped from the corresponding DSF file.
spreadsheet.
As with the NSP and STY matching spreadsheet above, the easiest way to
import this file into HelixTree is to create a CSV file and then select >File
>Import Data >Import ASCII File from the project navigator window.
Make sure to enter the Row Label Column Number representing your
sample names. The imported spreadsheet should resemble the image to
the right (Figure 3).
Affymetrix marker maps (annotation files)
You will need an Affymetrix marker map corresponding to the CEL files
you wish to import. Probes not contained in the marker map will not be included
in the resulting LogR DSF file. For example, if the marker map does
not contain copy number probes, those probes will not exist in the DSF file
for copy number analysis.
The latest Affymetrix marker maps can be downloaded using the Affymetrix NetAffx service in HelixTree. To access this feature from the Project Navigator window, select >File >Import Data >Download Affymetrix Marker Map. You will be prompted for your Affymetrix NetAffx login information, which can be freely obtained by registering on Affymetrix’s website.
https://www.affymetrix.com/analysis/netaffx/index.affx
After entering your NetAffx login information, the Download Annotations window will appear listing various Affymetrix annotation files (Figure 4).
window with both SNP 5.0 marker maps
selected.
IMPORTANT: There are actually two annotation files for each of the 500K, 5.0 and 6.0 files (500K = NSP + STY, SNP 5.0 and 6.0 = SNP + CN probes). For each file set, both corresponding annotation files need to be downloaded at the same time for HelixTree to properly merge them. To do this, highlight both annotation files (as seen in Figure 4), make sure the box Import into project when downloaded is checked and click Download. If you downloaded the annotation files previously, they should show up as a single merged file in the lower section of the window. If this is the case, you can just highlight that file and click Load Into Project.
Affymetrix library files
Similar to Affymetrix marker maps, Affymetrix library files can be downloaded using the NetAffx service in HelixTree. The library files should contain both SNPs and CN Probes when appropriate. To download library files, select the >Tools >Download Affymetrix Library File menu option from the Project Navigator window. After entering your login information, HelixTree will load a list of library files available through the NetAffx service.
To download library files, select one or more files from the upper window, and click Download. The file(s) will automatically be downloaded to the ../ HelixTree/AffyLibraryFiles directory.
Converting Affymetrix CEL files to LogR DSF File
Dialog in HelixTree.
From the Project Navigator window, select >CNAM >Import Affymetrix >Import CEL Files.
From the CEL file import dialog (Figure 5), first select the CEL files you want to include in the data set. For 500K data, you must select files from both the NSP and STY arrays for each sample. To select CEL files, click the Add CEL button, navigate to the appropriate folder and select the CEL files you want to process (you can hold down the Shift key to select multiple files at once). The CEL files you selected will appear in the CEL import dialog window (Figure 6). You may add all of the CEL files in a particular directory by using the Add Directory button.
into a log2 ratio DSF file.
This is especially helpful if, for example, you store your NSP and STY files in separate directories. To remove CEL files from the window, select the unwanted samples and click Remove Selected. You may continue adding CEL files by clicking the Add CEL or the Add Directory buttons again.
For 500K CEL file import, next check the 500K NSP/STY Matching check box and select the matching spreadsheet previously imported to be used (Figure 7).

array matching.
If you are importing 5.0 or 6.0 CEL files, leave this box unchecked.
Next, select a spreadsheet containing the Reference Status for the samples (Figure 8). When a spreadsheet is selected, the 0=Ref 1=Non-Ref Column drop down box will fill with the various binary data columns in the selected spreadsheet. Select the name of the column that indicate the reference status.

determining reference status.
NOTE: The effect of gender of the reference samples should be considered for copy number analysis of the X and/or Y chromosomes.
Check the Don’t include reference sample in output Log R DSF box if you are using external reference samples (e.g.
HapMap data) and do not want them included in the resulting DSF file.
Next, select the Marker Map previously imported to be used in the analysis and choose the Library Path where the CDF library files for the appropriate array can be found (Figure 9). These reside in the directory where you previously saved them.

where CDF (or *.gcdf) library files are located.
NOTE: After using the CEL import tool for your given array (500K, 5.0, 6.0), an AffyLibraryFiles directory will be created in the HelixTree installation directory containing *.gcdf library files. These files can be used from that point on instead of CDF files so you don't have to download them everytime.
You have the option to specify a Temp Directory (Figure 10) where intermediate DSF files will be stored. If your project is located on a shared network drive (not recommended), you should specify a Temp Directory on a local disk. Finally select the Output LogR DSF filelocation and click OK to begin the import.

name selected.
The conversion will take several minutes per CEL file to complete. The DSF file created is ready to be imported and analyzed using either the Copy Number Segmentation tool or the LogR Association Tests and PCA window.
From here you can proceed to Step 2: Identify markers to exclude, Step 3: Correct for batch effects/stratification, Step 4: Perform whole genome log ratio association tests, or Step 5: Run segmenting algorithm.