QUICK LINKS



Never Let the Important Become Urgent: A reflection on the genetics supply chain and our need to increase value to the end patient » Read blog post
Sign up for updates & info:
Name:  
Email:

What is Python?

Python is a clear and powerful object-oriented programming language, comparable to Perl, Ruby, Scheme, or Java. Integrating Python into SVS 7 provides full programmatic access to many of the software's features enabling the augmentation of existing tools, creating entirely new ones, automation of work flows, integration with other programs and more.

Python Learning Resources

» SVS 7 Scripting Reference
» Python.org
» Beginners Guide to Python

Add-On Scripts Repository for SVS

Here you will find a collection of Python scripts submitted by Golden Helix developers and our customers. All scripts are provided for no additional cost. So feel free to download, use, and even enhance!

The following scripts are for SVS 7.4+
For scripts compatible with older versions, please visit the Scripts Repository for SVS 7.0-7.3.

Share your scripts with the Golden Helix Community

If you have written any scripts and would like to share them with other SVS 7 users, we encourage you to email a *.txt or *.py file to community@goldenhelix.com with any accompanying documentation or special instructions. Once we test your script and check its validity, we'll post it on this page for others to download.

 

Date Modified Category Script Author Download

11/10/2011

Edit

Convert Binary and Integer Values to Genotypes
This script recodes binary and integer genotypes to the standard genotype format of A_A, A_B, and B_B. Prompts for value of A_A, A_B, and B_B. All other numbers are encoded as missing. Thus if there is multi-allelic data in the spreadsheet, all numbers other than those specified will be encoded as "?". More info »

Greta Linse Peterson
Golden Helix

11/10/2011

Analysis

Create Pseudo Marker Mapped Spreadsheet
From a non-marker mapped spreadsheet this script creates a new marker mapped spreadsheet with a pseudo marker map containing chromosome 1, positions 1 - #Rows. More info »

Greta Linse Peterson
Golden Helix

11/10/2011

CNV Analysis

Create Spreadsheet for Segmentation
Based on a column from a spreadsheet, this script creates a new spreadsheet with a pseudo marker map and generic column headers making it suitable for running CNAM optimal segmenting. More info »

Greta Linse Peterson
Golden Helix

11/10/2011

Analysis

Frequency Table
This script will calculate the frequency distribution of two columns in a spreadsheet. The script can be accessed through the scripts menu and will prompt the user to select two non-real columns. More info »

Autumn Laughbaum
Golden Helix

11/10/2011

Edit

Split Column on Specified Delimiter
This script prompts the user to select a column that needs to be split on a specified delimiter and for the delimiter to use. The delimiter can be more than one character. More info »

Greta Linse Peterson
Golden Helix

11/10/2011

Analysis

Genetic Distance between Samples
This script is designed to calculate Cochran-Mantel-Haenszel statistics, given several different spreadsheets corresponding to data from several different strata. More info »

Greta Linse Peterson
Golden Helix

11/10/2011

Edit

MIP CN Transformation
This script creates 5 transposed spreadsheets, one for each column imported from the MIP Array copy number text file: Copy A, Copy B, CopyNumber, AlleleRatio, and AllelicDifference More info »

Greta Linse Peterson
Golden Helix

11/10/2011

Filter

Select Subset of Data by XY Coordinates
This script takes an upper and lower bound for two numeric columns and creates a subset spreadsheet for the two columns. More info »

Greta Linse Peterson
Golden Helix

10/25/2011

Analysis

CMH Test over Several Strata
This script is designed to calculate Cochran-Mantel-Haenszel statistics, given several different spreadsheets corresponding to data from several different strata. More info »

Autumn Laughbaum
Golden Helix

10/04/2011

Analysis

Genotype Statistics Summary
This script takes a spreadsheet that contains a case/control dependent variable and SNPs and runs all of the genotype association tests as well as tests for a heterozygous advantage model (Dd vs DD, dd) and a homozygous comparison model (DD vs dd). Also calculates Chi Squared Scores, Correlation/Trend test scores and completes count tables. More info »

Greta Linse Peterson
Golden Helix

9/29/2011

Filter

Activate or Inactivate based on Marker Map Field
This function takes a map field from the current spreadsheet as input and activates based on existence in another spreadsheet's column or another spreadsheet's map, or both. More info »

Autumn Laughbaum
Golden Helix

7/28/2011

Analysis

Alternate Allele Frequency
This script calculates the percentage of alternate alleles over all samples for each variant. The resulting spreadsheet has columns containing the reference count, alternate allele, alternate allele frequency, reference allele count and alternate allele count. More info »

Mike Thiesen
Golden Helix

6/23/2011

Analysis

Create Table for Significant Region
Creates a spreadsheet with significant regions from a spreadsheet of p-values. This script extracts p-values more extreme than a certain significance value (cutoff) and combines the remaining markers into segments. If two markers are on different chromosomes or more than a certain distance apart (split), a new region is created. More info »

Ingo Helbig
UK-SH Kiel

5/20/2011

Marker Maps

Add Gene Names to Marker Map
Adds the associated gene name(s) to each marker in a marker map. More info »

Sam Gardner
Golden Helix

5/13/2011

Regression

Extract Info from Regression Stats Viewer
This script scans the Regression Statistics Viewer output and prints out the p-value after correcting for any covariates. More info »

Greta Linse Peterson
Golden Helix

4/21/2011

Filter

Filter by Marker Map Field
This function takes a map field from the current spreadsheet as input, then activates or inactivates based on a given threshold or list, or both. More info »

Autumn Laughbaum
Golden Helix

4/21/2011

Analysis

KBAC with Permutation Testing
The Kernel-Based Adaptive Cluster (KBAC) method by Liu and Leal [Liu and Leal 2010] first catalogs the variant data within each of a number of regions into multi-marker genotypes. Since the variants are rare, only a relatively few different multi-marker genotypes will be found in any given region. More info »

James Grover
Golden Helix

4/21/2011

Marker Maps

Apply Additional Marker Map
This function will apply an additional marker map to the a currently mapped spreadsheet. The user can choose to apply the new map's data to only unmapped columns or to all columns, preferring either new marker map or old marker map information. More info »

Sam Gardner
Golden Helix

4/21/2011

Analysis

LD Pairwise Analysis Scripts
This script outputs results from LD analysis, both the EM and CHM methods and both R² and D' values.
More info »

Greta Linse Peterson
Golden Helix

4/18/2011

Plotting

Highlight Values in XY Scatter Plot
This script plots an XY scatter plot with additional graph items to highlight values of interest. An independent column, dependent column and sample list is needed. More info »

Greta Linse Peterson
Golden Helix

4/7/2011

Analysis

Run Multiple Genotype Association Tests
This script runs genotypic association tests on multiple dependent phenotype columns. More info »

Christophe Lambert, Greta Linse Peterson
Golden Helix

3/14/2011

Import

Import PennCNV
This script imports PennCNV input signal intensity files, where each file contains data for a single sample. More info »

Sam Gardner
Golden Helix

3/11/2011

Edit

Append Several Spreadsheets
This function allows the user to append several spreadsheets in one dialog, saving the user from having to append each spreadsheet individually. More info »

Autumn Laughbaum
Golden Helix

3/11/2011

Edit

Join or Merge Several Spreadsheets
This function allows the user to merge several spreadsheets in one dialog, saving the user from having to merge each spreadsheet individually. More info »

Autumn Laughbaum
Golden Helix

3/3/2011

Analysis

ANOVA with Phenotype and SNPs
This function makes use of the scipy package, specifically the scipy.stats.f_oneway and scipy.stats.kruskal functions. This requires a numeric phenotype column and several genotype columns which provide the grouping structure in each test. More info »

Autumn Laughbaum
Golden Helix

3/3/2011

Analysis

ANOVA on Numeric Columns
This function makes use of the scipy package, specifically the scipy.stats.f_oneway and scipy.stats.kruskal functions. This requires a categorical dependent column that provides the grouping structure and several numeric columns. More info »

Autumn Laughbaum
Golden Helix

3/3/2011

Import

Import Affymetrix CN Segment Files
This script will import Affymetrix CN Segment files containing copy number segment data as outputted from Affymetrix. More info »

Sam Gardner
Golden Helix

3/3/2011

Filter

Filter by SIFT Synonymous Classification
This function scans a marker-mapped spreadsheet with several genotypic columns and investigates the corresponding SIFT marker map synonymous or non-synonymous classifications.This script requires the purchase of the Sequence Module to function.
More info »

Gabe Rudy
Golden Helix

1/26/2011

Edit

Activate ATCG SNPs to flip strand or to exclude SNPs
These scripts can be used to identify SNPs that have ambivalent orientation by comparing a genotype dataset with a reference dataset, such as HapMap data. More info »

Joost W. Morsink and Sander W. van der Laan
University Medical Center Utrecht


1/26/2011

CNV Analysis

Affymetrix B Allele Frequency Calculation
Using Affymetrix CEL files as its source, this script combines quantile normalized SNP A and B probe intensities for each marker into a theta value, then calculates B-Allele Frequencies for each marker. More info »

Greta Linse Peterson
Golden Helix

1/26/2011

Import/Export

BEAGLE/BEAGLECALL Scripts Package
These scripts are for importing and exporting files from the BEAGLE and BEAGLECALL Genetic Analysis Software Packages. More info »

Various GHI Staff
Golden Helix

1/26/2011

Analysis

Calculate Expected P-value
This script takes spreadsheet that contains a p-value column and calculates expected p-values for the specified column. It is also optional to export expected –log10 p-values as well. More info »

Greta Linse Peterson
Golden Helix

1/26/2011

SNP Analysis

Chi-Squared Contingency Table
This script computes the Pearson’s Chi-Squared Statistic for a contingency table with m groups and n observations (m rows and n columns). For 2x2 tables the p-value, –log10 p-value, Bonferroni p-value and –log10 Bonferroni p-value are also computed. More info »

Greta Linse Peterson
Golden Helix

1/26/2011

CNV Analysis

CNV PCA Search
Given a spreadsheet, prompt for a principal components spreadsheet, a lower and upper bound on the number of components and a step size. Runs association tests using each components setting, does a linear regression on the least significant 90% of the data and reports the slope of the line and a goodness of fit statistic. This script can be used in conjunction with the CNV PCA Search Tutorial. More info »

Christophe Lambert
Golden Helix

1/26/2011

Analysis

Compute Odds Ratio CI
This script takes a logistic regression results spreadsheet and calculates 90, 95 or 99% confidence intervals for the Odds Ratio. More info »

Greta Linse Peterson
Golden Helix

1/26/2011

Analysis

Create Table for Significant Regions
This script creates a spreadsheet with significant regions from a spreadsheet of p-values (in the first column). It also extracts p-values more extreme than a certain significance value (cutoff) and combines the remaining markers into segments. If two markers are on different chromosomes or more than a certain distance apart (split), a new region is created. More info »

Ingo Helbig
UK-SH Kiel

1/26/2011

Export

Export MACH PED_DAT Files
This script exports MACH/Merlin PED and DAT formatted files. Run this script from a pedigree spreadsheet that can contain as many phenotypes as desired. The user will be provided with the option to create one file per chromosome if a marker map is applied to the pedigree spreadsheet. More info »

Greta Linse Peterson
Golden Helix

1/26/2011

CNV Analysis

Log Ratio Tails
This script calculates percentile values for the upper and lower tails of log ratios using two user-specified thresholds. Missing values are skipped. A log ratio call rate is returned with the results.  This script may also be used to identify percentiles for real-value data other than log ratios. More info »

Christophe Lambert,
Bryce Christensen

Golden Helix

1/26/2011

Analysis

Nonparametric Association Tests (Binary Dependent)
This function makes use of the scipy package, specifically the scipy.stats.ranksums and scipy.stats.mannwhitneyu functions. With one binary dependent column, the user can perform nonparametric association tests on all numeric columns. More info »

Autumn Laughbaum
Golden Helix

1/26/2011

Analysis

Nonparametric Correlation
This function makes use of the scipy package, specifically the scipy.stats.spearmanr and scipy.stats.kendalltau functions. With one numeric dependent column, the user can perform nonparametric correlation tests on all numeric columns. More info »

Autumn Laughbaum
Golden Helix

1/26/2011

CNV Analysis

Row Averages with Histogram
This script will create a column subset from a numeric spreadsheet, then take the row averages and create a histogram of those averages. The subset is specified with a column chooser. This function is useful for LogR spreadsheets to investigate for possible CNVs.
More info »

Autumn Laughbaum
Golden Helix

1/26/2011

Quality Assurance

Sample Pair Mismatch
This script compares genotype calls from NSP and STY files and calculates the correlation between the nearest markers in the two sets. If there is a high correlation, the NSP and STY markers correspond to the same person, otherwise there is a mismatch. More info »

Christophe Lambert,
Greta Linse Peterson

Golden Helix

1/26/2011

Plotting

SNP Cluster Plots
This script creates scatter plots based on A and B allele intensities that can be split on SNP genotypes to create tri-colored cluster plots. The script will work for up to 100 SNPs at a time. More info »

Greta Linse Peterson
Golden Helix

© 2012 Golden Helix, Inc     Facebook     Twitter     Linked In     Blog   YouTube

Site Map   |   Privacy Policy   |   Contact Us