Welcome to the Marker Maps Tutorial!
Updated: February 18, 2021
Packages: All Packages of SVS
This tutorial will cover some of the common steps for creating, applying, and augmenting genetic marker maps in SVS.
SVS uses a genetic marker map to identify the genomic coordinates of SNPs and other genetic features in a spreadsheet. Marker maps are the basis for data visualization in GenomeBrowse as they provide relative order, size and scaling of each feature included in the spreadsheet. For different species the size, order and naming of the chromosomes can differ greatly, so an accurate marker map is essential for correct visualization of data. Marker maps are also used to inform certain analysis functions (such as Haplotype Analysis, DESeq Expression Analysis or CNV segmentation) about the relative order and length of features on the genome. For the DESeq Analysis tool the length of each gene (or isoform) region is essential in determining expression values–the length is determined using the start and stop position listed in the marker map of the spreadsheet. Additionally, marker maps can be used to gather annotation information (such as RS Identifier, Classifications or Functional Predictions) into one central location. This information will then be available in the output of any subsequent analysis on that spreadsheet.
To follow along you will need to download and unzip the following file, which contains the starter project for this tutorial:
Spreadsheets included within the SVS Project:
- 500K Geno HapMap Data – Sheet 1 – this spreadsheet contains a set of SNPs from the HapMap Project. This is not the full dataset but a subset of filtered markers.
- Affy 500K na32 Filtered Markers – Sheet 1 – this spreadsheet contains information for a basic marker map with chromosome and position together with an additional RS ID field.
Marker maps may be used to store general annotations about your genetic markers, along with certain information which is required by SVS to be included in all marker maps.
A marker map must include the following three fields:
- Marker: This is a unique feature name, such as an RS ID or other identifier.
- Chromosome: In humans, this would be 1-22, X, Y, XY or MT.
- Position: Absolute position within the current chromosome, usually in units of base pairs.
There are three additional, optional fields that are reserved for specific uses by SVS:
- Stop: This is the last base of a multi-base region, such as the end of a gene or transcript. The stop field is used for heatmap visualizations in GenomeBrowse, for RNA-seq analysis, and for other functions in SVS.
- Reference: This is the reference allele for the given position for the genome assembly being used in the current analysis. Many functions within SVS either require that the Reference field is present in the marker Map or are able to optionally make use of the Reference field.
- Alternate: This is the observed alternate allele. This field is used in addition to the Reference field by many SVS functions.
SVS marker maps are stored in a special format in files which have the extension DSM. Once a marker map is created, it will be available in the SVS marker map manager and can be applied to any spreadsheet containing features (in rows or columns) that have names matching the marker names in the map.
Marker maps are automatically created when importing data from certain formats, such as VCF files and PLINK PED/MAP files. Marker maps for common GWAS chips can be downloaded directly from Golden Helix by going to Tools > Manage Marker Maps and selecting Download from Golden Helix. Alternatively, marker maps can be created from text files or from SVS spreadsheets. Maps can also be manipulated to add extra annotation fields.
2. Create Marker Map from Spreadsheet
- Start by opening the Marker Map Tutorial project and then the Affy 500K na32 Filtered Markers – Sheet 1 and choose File > Create Marker Map from Spreadsheet.
- On the resulting dialog the default selections for the marker name, chromosome and position fields should be automatically detected. If not, then click Select Column for each required field to choose the correct data column.
NOTE: For this tool to work correctly the marker name column must be Categorical (C), the position column must be Integer (I), and the chromosome column can either be Categorical (C) or Integer (I). If your data is recognized differently you can go to Edit > Edit this Spreadsheet to make the necessary changes.
- You can also give the map an informative name at the bottom of the dialog–the default name will be the name of the spreadsheet.
NOTE: Optionally a Stop Position column can be included. If creating a map for RNA-Seq gene count data you will want to have data for this field as the length of the region is used for most analyses.
- The dialog should look like Figure 2-1. Click Next >.
- On the next dialog you can select any additional fields to be included in the new map. For this dataset the only additional column in the spreadsheet is dbSNP RS ID and is selected by default. The dialog should look like Figure 2-2. Click Create.
- Once the map is created you should receive a message that the new marker map has been stored in your local marker maps folder (Figure 2-3).
To see a list of your available marker maps, including the one you just created, go to the Project Navigator and select Tools > Manage Marker Maps. The new marker map Affy 500K na32 Filtered Markers will be listed and have 63,803 markers.
3. Apply Marker Map to Spreadsheet
Now that the new marker map matching the HapMap dataset has been created we can apply it to the spreadsheet of genotypic data.
- Open the 500K Geno HapMap Data – Sheet 1 and navigate to File > Apply Genetic Marker Map.
- Select the marker map you just created. The dialog should look similar to Figure 3-1. Click OK.
NOTE: If your marker names are down the row labels of your spreadsheet make sure and select Row Labels under the Marker Names Are option in the lower left corner of the dialog.
- A new window appears asking which fields in the marker map you would like visible by default. Leave the defaults as in Figure 3-2 and click OK.
A final information message will state how many markers were successfully mapped. For this spreadsheet all 63,803 should be mapped.
Now the spreadsheet icon in the Project Navigator window has a green bar across the top–this indicates a marker map is attached to the spreadsheet. Additionally, if you open the mapped spreadsheet the Map button in the upper left corner above the row numbers should be green. If you left-click the green Map button this will make the map fields visible.
If you right-click on the green Map button you can check or uncheck the fields in the map you would like to remain visible. This is especially useful if your spreadsheet is row-mapped and the marker fields are quite long, as this will limit the number of data columns that will show at one time.
4. Add Fields to Marker Map
Sometimes it is necessary and helpful to have additional information in a marker map aside from the basic chromosome and position information. In this section you will add gene name information to the marker map that was created in the previous steps.
- Go to Tools > Manage Marker Maps from the Project Navigator window.
- In the Manage Genetic Marker Maps window that appears select Utilities > Add Annotation Data to Marker Map. See Figure 4-1.
- The first dialog that will appear is shown in Figure 4-2.
- In this dialog click Select Marker Map to select the Affy 500K na32 Filtered Markers marker map.
- Leave the default entry for Marker Map Name:
NOTE: The “
* with @” entry in the name field will create a new name for the edited map using the original marker map name and the new field name that is being added. For this example the new map will be called Affy 500K na32 Filtered Markers Marker Map with Gene Name. This can be changed to a more informative name depending on your preference.
- Click Select Track and choose RefSeq Genes 105.20201022 v2, NCBI which should be available from your Local annotations source. See Figure 4-3. Click Select.
- The dialog should now look like Figure 4-4. Click Next >.
- On the next dialog select the field from the annotation track you would like to include in the marker map. The default options should be correct–see Figure 4-5. Click Next.
NOTE: The default selection of “
*” for Marker Map Field Name will use the Annotation Track Field name in the marker map. So for this example Gene Name will be used. Once again this can be changed to your preference.
- When the process is complete, a final information window will appear stating that the new marker map has been added to your local marker map folder. Figure 4-6.
- Now to apply the edited map to your genotype spreadsheet, close the Manage Genetic Marker Maps window, open 500K Geno HapMap Data – Mapped Sheet 1 and go to File > Apply Genetic Marker Map.
- Select the marker map you just created and click OK, as previously shown in Figure 3-1. Leave the defaults for which Marker Map Fields are Visible by Default and click OK. This will drop the original map and apply the new one, creating a child spreadsheet with the new map applied.
Once the new map is successfully applied to the data you can click the green Map button in the upper left corner of the spreadsheet (above the row numbers) to view the edited map. Figure 4-7.
NOTE: Since this data is from a microarray it will contain many markers outside of any gene regions. This means that those markers will have empty Gene Name fields in the marker map, as their chromosome and position did not overlap any gene positions from the RefSeq annotation source. The second marker in the spreadsheet shown in Figure 4-7 is an example of this.
5. Additional Marker Map Tools
Other Marker Map Functions in SVS
The following functions might also be helpful for working with marker maps. Each is documented in the SVS manual. Pressing the Help button on any dialog will take you to the corresponding section of the manual.
- File > Add Columns to Marker Map: A row-mapped spreadsheet is required to use this feature. This option allows users to add column data from a spreadsheet to the current marker map. A new map will be created and saved in the marker maps folder. This tool is good for adding annotation information to a marker map that was generated with a different tool in SVS, for example, Classification information provided by DNA-Seq > Variant Classification.
- File > Convert Genetic Marker Map to Spreadsheet: Converts the marker map applied to the current spreadsheet into a new separate spreadsheet of just the marker map data. This tool is good for editing an existing marker map, for example, renaming existing marker map fields.
- File > Export Genetic Marker Map: Exports a separate file of just the applied marker map data. This tool is good if you will need to apply the same map to a second dataset, especially if that dataset is located in another project.
- Edit > Sort by Marker Map Field: A row-mapped spreadsheet is required to use this feature. Choose a marker map field and sort direction (either ascending or descending) to sort the rows based on the marker map. This tool is another way of rearranging a spreadsheet, for example, if you want your quality values listed in ascending order to determine filter thresholds.
- Select > Select Variants by Filtering on Marker Map Field: This tool selects either marker mapped rows or columns based on filtering on a marker map field. The field can either be numeric or a string field. The selected variants can either be activated or inactivated in the original spreadsheet. For example, if you used the above script to help you determine your quality filter thresholds, this tool can be used to filter your data based on these thresholds.
- Edit > Recode > Rename Marker Mapped Labels: This feature, which requires that a marker map has been applied to the spreadsheet, auto-detects the applied marker map orientation and allows the Row/Column Name Headers with marker map information to be replaced with a string field from this spreadsheet’s marker map. This tool is good for renaming your markers so that your samples can be appended to public data. See Marker Label FAQ for an example of this workflow.
Marker Map Add-On Scripts
Several functions for manipulating marker maps are available as downloadable add-on functions for SVS. Each function comes with its own documentation. All functions are available for download through our online Scripts Repository.
A few of these functions are described below:
- Add Annotation Data to Marker Map from Spreadsheet: This function takes the marker map applied to the current spreadsheet and adds specified annotation data from overlapping interval(s) to each marker in the marker map. It then saves a copy of the new map with the additional information in the users Marker Map Folder as well as applies the new map to the current spreadsheet.
- Apply Additional Marker Map: This function will apply an additional marker map to the currently mapped spreadsheet. The user can choose to apply the new map’s data to only unmapped columns or to all columns, preferring either new marker map or old marker map information.
- Create Pseudo Marker Mapped Spreadsheet: This function takes a non-marker mapped spreadsheet and creates a new marker mapped spreadsheet with a pseudo marker map, with all markers mapped to chromosome 1 and position values from zero to one minus the number of markers.