What is Python?

Python is a clear and powerful object-oriented programming language, comparable to Perl, Ruby, Scheme, or Java. Integrating Python into SVS provides full programmatic access to many of the software's features enabling the augmentation of existing tools, creating entirely new ones, automation of work flows, integration with other programs and more.

Python Learning Resources

» SVS Scripting Reference
» Python.org
» Beginners Guide to Python

Add-On Scripts Repository for SVS

Here you will find a collection of Python scripts submitted by Golden Helix developers and our customers. All scripts are provided for no additional cost. So feel free to download, use, and even enhance!

The following scripts are for SVS 8 and SVS 7.4+
For scripts compatible with older versions, please visit the Scripts Repository for SVS 7.0-7.3.

Share your scripts with the Golden Helix Community

If you have written any scripts and would like to share them with other SVS users, we encourage you to email a *.txt or *.py file to community@goldenhelix.com with any accompanying documentation or special instructions. Once we test your script and check its validity, we'll post it on this page for others to download.

Keep informed on new scripts by subscribing to the technical support bulletins feed »

 

 

Do you have a set of steps that you perform over and over again? Consider an Automated Workflow »

Date Modified Category Script Author Download

8/20/2014

Filter

Select Random Subset by Category
This script prompts the user for a fraction and will then active that number of random samples from each unique category of a genotypic, categorical or binary column. More info »

Greta Peterson
Golden Helix

4/23/2013

Filter

Subset by Chromosome
This script scans genetic marker mapped columns and creates a subset spreadsheet for each unique chromosome with active data in the spreadsheet. More info »

Autumn Laughbaum
Golden Helix

2/3/2014

Analysis

Consecutive Numeric Regression Analysis
This script will output the results from consecutive numeric regression tests run on one or more dependents. More info »

Alison Figueira
Golden Helix

5/13/2014

Analysis

Linear and Logistic Regression with Interactions
This script will output the results from either a Linear or Logistic Regression Analysis run with one dependent variable, multiple interacting, and non-interacting covariates on all numeric columns. This script uses the numpy, scipy, and statsmodels packages. More info »

Alison Figueira
Golden Helix

12/27/2013

Import

Import Minimac Output
This script will import Minimac info and dose files phased genotype dosages as output from running the Minimac software. More info »

Jesse Dupre
Golden Helix

6/12/2014

Export

Export Impute2 Genotype Probabilities
From a spreadsheet containing marker-mapped genotypic columns, this script saves the spreadsheet in a series of Imput2 chr*.gen files with the corresponding chr*.sample files to the specified directory. More info »

Greta Peterson
Golden Helix

10/25/2013

Filter

Report Samples with Unique Genotypes
This tool scans a genotype spreadsheet and determines samples that have unique genotypes, or are not found in any other sample at that loci. A report is created with binary columns representing the unique genotypes per sample per variant. More info »

Autumn Laughbaum
Golden Helix

11/20/2013

Filter

Subset by Category
This script creates a row subset spreadsheet for each unique entry in a user selected categorical, binary or genotypic column with active data in the spreadsheet. More info »

Autumn Laughbaum
Golden Helix

8/26/2013

Filter

Subset by Chromosome
This script scans genetic marker mapped columns and creates a subset spreadsheet for each unique chromosome with active data in the spreadsheet. More info »

Autumn Laughbaum
Golden Helix

8/26/2013

Filter

Inactivate Duplicate Row Values
This script scans a selected column in a spreadsheet and inactivates rows based on user prompts by either inactivating all copies of the duplicate values or keeping the first occurrence and inactivating all subsequent duplicates. Row values need to match exactly, including case, to be consider duplicates. More info »

Christophe Lambert
Golden Helix

8/26/2013

Filter

Inactivate Duplicate Row Labels
This script scans a spreadsheet's row labels and inactivates rows based on user prompts by either inactivating all copies of the duplicate row labels or keeping the first occurance and inactivating all subsequent duplicates. More info »

Christophe Lambert
Golden Helix

8/13/2013

Filter

Activate or Inactivate based on Genomic Position
This function activate or inactivate markers in the current spreadsheet based on existence in another spreadsheet's marker map or existence in a marker map file, or both. Matching is done based only on chromosome and position information from both souces and not on marker labels. More info »

Autumn Laughbaum
Golden Helix

8/1/2013

Analysis

Compute Odds Ratio CI
This script takes a logistic regression results spreadsheet and calculates 90, 95 or 99% confidence intervals for the Odds Ratio. More info »

Greta Linse Peterson
Golden Helix

5/21/2013

Edit

Add Annotation Data to Marker Map From Spreadsheet
This function takes the marker map applied to the current spreadsheet and adds specified annotation data from overlapping interval(s) to each marker in the marker map. More info »

Sam Gardner
Golden Helix

5/1/2013

Import

Import Unsorted VCF Files
This script will import 1000 Genomes .vcf file date into multiple spreadsheets and/or marker map fields. More info »

Sam Gardner
Golden Helix

2/26/2013

Filter

Activate Variants by Genotype Count Threshold
This tool scans genotypic columns and activates columns based on a user-specified count or percentage threshold of user-specified genotypes. For example, you could use this tool to activate all genotypic columns that contain at least 20% homozygous alternate variants. More info »

Autumn Laughbaum
Golden Helix

1/2/2013

Filter

Select Rows from String of Values
This script activates rows that contain values contained in a comma separated string entered in the prompt dialog for integer, binary, categorical or genotypic columns. If no values match all rows are inactivated. Allows an option to only change the state of active rows. More info »

Greta Peterson
Golden Helix

12/31/2012

Column Tools

Copy Values into User Notes
This script copies all unique values from integer, binary, categorical or genotypic columns into the User Notes for the selected spreadsheet in a comma separated list for pasting elsewhere. More info »

Greta Peterson
Golden Helix

12/27/2012

Filter

Activate or Inactivate based on Marker Map Field
This function takes a map field from the current spreadsheet as input and activates based on existence in another spreadsheet's column or another spreadsheet's map, or both. More info »

Autumn Laughbaum
Golden Helix

4/18/2013

Filter

Find de Novo Candidate Variants
This tool uses pedigree information to identify candidate functional polymorphism, defined as the offspring in a trio having a genotype classified as a Mendelian error. By default, only heterozygous errors are considered candidates. Optionally, homozygous non-reference errors can be considered and require a reference allele field to be present in the marker map. Another option allows the user to restrict computation to affected offspring. More info »

Autumn Laughbaum
Golden Helix

10/9/2012

Filter

Activate Variants by Sample Genotypes
This tool examines variant data and inactivates genotypic columns that do follow the specified genotypic patterns for the selected samples. The spreadsheet must contain mapped genotypic columns and the marker map must contain a reference allele field. More info »

Autumn Laughbaum
Golden Helix

11/3/2013

Analysis

Calculate Alt Read Ratio between Two Spreadsheets
This tool calculates the ratio of alternate read given an alternate read and reference read spreadsheet. The resulting spreadsheet contains the per-cell ratio as (Alt Depth)/(Alt Depth + Ref Depth) and can be used for filtering purposes with Set Genotypes to No-Call. More info »

Autumn Laughbaum
Golden Helix

8/8/2012

Tools

Build Variant Spreadsheet
This tool builds a variant spreadsheet based on a probe track and region definition selected by the user. By default, the entire track is included in the output spreadsheet. More info »

Autumn Laughbaum
Golden Helix

7/31/2012

Filter

Subset Informative Genotypes by Category
This tool scans genotypic columns to find informative genotypes defined by having at least one non-missing, non-reference allele. Informative genotype column sets are found for each unique category in a user-defined categorical column. More info »

Autumn Laughbaum
Golden Helix

7/31/2012

Edit

Build Sample Collated Spreadsheet
This tool transposes and collates several spreadsheets together. The collated spreadsheet contains a row for each intersecting column in the original spreadsheets and several columns for each original row over all spreadsheets. More info »

Autumn Laughbaum
Golden Helix

7/25/2012

Edit

Create Vector from Matrix Spreadsheet
This script transforms a spreadsheet into a tall-skinny formatted vector. The user will choose which column type to transform and it will result in a child spreadsheet node. If the user selects genotypic columns, the allele delimiter can optionally be removed or replaced. More info »

Greta Peterson
Golden Helix

12/27/2013

Import

Import Concatenated Genotype String File
This import script is designed to import genotypic data that is stored in a concatenated string format. The user will specify the variant name column, sample name column, and data column(s) as well as the genotype encoding. More info »

Greta Peterson
Golden Helix

7/10/2012

Filter

Filter by SIFT Synonymous Classification
This filter inactivates mapped markers that are either predicted as synonymous or are predicted as nonsynonymous, depending on the inactivation option selected. More info »

Gabe Rudy
Golden Helix

7/10/2012

Filter

Filter by PolyPhen2 Score
This filter inactivates mapped markers that are either predicted as tolerated or have low confidence (do not pass filters) or are predicted as damaging (pass filters), depending on the inactivation option selected. More info »

Gabe Rudy
Golden Helix

6/22/2012

Analysis

Absolute Risk Reduction
This script calculates the reduced risk for each genotype given binary disease status and treatment status columns. The script requires a spreadsheet that contains at least two binary columns and several genotypic columns. More info »

Autumn Laughbaum
Golden Helix

6/21/2012

Edit

Convert Dosages to Genotypes
This script converts allelic dosage values to genotypes based on user-specified thresholds. The dosage data may be in Single- or Double-Dosage format and may have samples in the row labels or column headers. If the samples are in the column headers, the spreadsheet may contain map information and allele translation values. More info »

Autumn Laughbaum
Golden Helix

5/17/2013

Edit

Quantile Transformation
This script categorizes a numeric column into N user-specified quantiles. The cutoff points are calculated over all non-missing values and column values are compared against these cutoffs with <=. More info »

Autumn Laughbaum
Golden Helix

6/20/2012

Analysis

Calculate Pseudo Lambda
This script calculates a pseudo-lambda value on a column containing p-values. The formula used to calculate the pseudo lambda value is as follows: More info »

Autumn Laughbaum
Golden Helix

6/15/2012

Analysis

Correct P-Values for Multiple Tests
This script takes a column of p-values and outputs several multiple testing corrections including Bonferroni, FDR (Storey 2002), BH FDR (Benjamini-Hochburg 1995) and BY FDR (Benjamini-Yekutieli 2001). More info »

Greta Peterson
Golden Helix

5/31/2012

Edit

Move Columns to Location
This script moves all columns of a user-specified type to the user-specified location (Beginning or End). This will allow the user to group all columns of the same type together. More info »

Autumn Laughbaum
Golden Helix

5/3/2012

Analysis

Chi-Squared Test with Continuity Correction
This script performs a Chi-squared test on a spreadsheet with a binary dependent and genotypic data. The output will contain results for the traditional test as well as results with the Yates continuity correction applied. The following genetic models are available; Basic Allelic, Dominant, or Recessive. More info »

Autumn Laughbaum
Golden Helix

11/10/2011

Filter

Inactivate Duplicate Column Headers
This script scans a spreadsheet's column headers and inactivates all additional occurrences of a column header (only the first occurrence remains active). More info »

Greta Peterson
Golden Helix

2/28/2012

Import

Import Merlin PED DAT
This import script imports PED/DAT files created in MERLIN. File delimiters may be comma, whitespace or tab and allele delimiters may be whitespace or /. The data may include several phenotype, covariate and genotype columns. More info »

Autumn Laughbaum
Golden Helix

2/10/2012

Filter

MAF Filtering on Recoded Spreadsheet
This script calculates minor allele frequency (MAF) on recoded data created by Recode Genotypes with X Chromosome Adjustment. More info »

Autumn Laughbaum
Golden Helix

2/10/2012

Recode

Recode Genotypes with X Chromosome Adjustment
This script recodes genotypes based on an additive model with major/minor allele classification. Markers within the selected chromosomes are adjusted for male samples. More info »

Autumn Laughbaum
Golden Helix

2/7/2012

Import

Import Tall Skinny Format
This import script is designed to import genotypic data that is stored in a tall skinny format. The user will specify the variant name column, sample name column, and data column(s). More info »

Autumn Laughbaum
Golden Helix

2/3/2012

Filter

Filter Columns by Regular Expression
This script takes a regular expression and activates all columns which contain an expression match in the column header. More info »

Autumn Laughbaum
Golden Helix

2/2/2012

Edit

Rename Genotypes
This script scans the genotypic columns to find all existing genotypes, and then prompts for replacements. The resulting spreadsheet has the same dimensions with the appropriate genotype substitutions. More info »

Greta Linse Peterson
Golden Helix

1/30/2012

Edit

Create Column From Row Labels
This script allows the user to add the row labels as a column in the spreadsheet. More info »

Autumn Laughbaum
Golden Helix

1/25/2012

Analysis

Average Markers by Gene
This script calculates an average value for each row over each region as defined by a gene annotation track or a string marker map field. This script requires a marker mapped spreadsheet with several quantitative columns. More info »

Autumn Laughbaum
Golden Helix

11/10/2011

Edit

Convert Binary and Integer Values to Genotypes
This script recodes binary and integer genotypes to the standard genotype format of A_A, A_B, and B_B. Prompts for value of A_A, A_B, and B_B. All other numbers are encoded as missing. Thus if there is multi-allelic data in the spreadsheet, all numbers other than those specified will be encoded as "?". More info »

Greta Linse Peterson
Golden Helix

11/10/2011

Analysis

Create Pseudo Marker Mapped Spreadsheet
From a non-marker mapped spreadsheet this script creates a new marker mapped spreadsheet with a pseudo marker map containing chromosome 1, positions 1 - #Rows. More info »

Greta Linse Peterson
Golden Helix

11/10/2011

CNV Analysis

Create Spreadsheet for Segmentation
Based on a column from a spreadsheet, this script creates a new spreadsheet with a pseudo marker map and generic column headers making it suitable for running CNAM optimal segmenting. More info »

Greta Linse Peterson
Golden Helix

11/10/2011

Analysis

Frequency Table
This script will calculate the frequency distribution of two columns in a spreadsheet. The script can be accessed through the scripts menu and will prompt the user to select two non-real columns. More info »

Autumn Laughbaum
Golden Helix

11/10/2011

Edit

Split Column on Specified Delimiter
This script prompts the user to select a column that needs to be split on a specified delimiter and for the delimiter to use. The delimiter can be more than one character. More info »

Greta Linse Peterson
Golden Helix

11/10/2011

Analysis

Genetic Distance between Samples
This script is designed to calculate Cochran-Mantel-Haenszel statistics, given several different spreadsheets corresponding to data from several different strata. More info »

Greta Linse Peterson
Golden Helix

11/10/2011

Edit

MIP CN Transformation
This script creates 5 transposed spreadsheets, one for each column imported from the MIP Array copy number text file: Copy A, Copy B, CopyNumber, AlleleRatio, and AllelicDifference More info »

Greta Linse Peterson
Golden Helix

11/10/2011

Filter

Select Subset of Data by XY Coordinates
This script takes an upper and lower bound for two numeric columns and creates a subset spreadsheet for the two columns. More info »

Greta Linse Peterson
Golden Helix

10/25/2011

Analysis

CMH Test over Several Strata
This script is designed to calculate Cochran-Mantel-Haenszel statistics, given several different spreadsheets corresponding to data from several different strata. More info »

Autumn Laughbaum
Golden Helix

10/04/2011

Analysis

Genotype Statistics Summary
This script takes a spreadsheet that contains a case/control dependent variable and SNPs and runs all of the genotype association tests as well as tests for a heterozygous advantage model (Dd vs DD, dd) and a homozygous comparison model (DD vs dd). Also calculates Chi Squared Scores, Correlation/Trend test scores and completes count tables. More info »

Greta Linse Peterson
Golden Helix

7/28/2011

Analysis

Alternate Allele Frequency
This script calculates the percentage of alternate alleles over all samples for each variant. The resulting spreadsheet has columns containing the reference count, alternate allele, alternate allele frequency, reference allele count and alternate allele count. More info »

Mike Thiesen
Golden Helix

6/23/2011

Analysis

Create Table for Significant Region
Creates a spreadsheet with significant regions from a spreadsheet of p-values. This script extracts p-values more extreme than a certain significance value (cutoff) and combines the remaining markers into segments. If two markers are on different chromosomes or more than a certain distance apart (split), a new region is created. More info »

Ingo Helbig
UK-SH Kiel

6/23/2014

Regression

Extract Info from Regression Stats Viewer
This script scans the Regression Statistics Viewer output and prints out the p-value after correcting for any covariates. More info »

Greta Linse Peterson
Golden Helix

4/21/2011

Filter

Filter by Marker Map Field
This function takes a map field from the current spreadsheet as input, then activates or inactivates based on a given threshold or list, or both. More info »

Autumn Laughbaum
Golden Helix

4/21/2011

Analysis

KBAC with Permutation Testing
The Kernel-Based Adaptive Cluster (KBAC) method by Liu and Leal [Liu and Leal 2010] first catalogs the variant data within each of a number of regions into multi-marker genotypes. Since the variants are rare, only a relatively few different multi-marker genotypes will be found in any given region. More info »

James Grover
Golden Helix

4/21/2011

Marker Maps

Apply Additional Marker Map
This function will apply an additional marker map to the a currently mapped spreadsheet. The user can choose to apply the new map's data to only unmapped columns or to all columns, preferring either new marker map or old marker map information. More info »

Sam Gardner
Golden Helix

4/21/2011

Analysis

LD Pairwise Analysis Scripts
This script outputs results from LD analysis, both the EM and CHM methods and both R² and D' values.
More info »

Greta Linse Peterson
Golden Helix

4/18/2011

Plotting

Highlight Values in XY Scatter Plot
This script plots an XY scatter plot with additional graph items to highlight values of interest. An independent column, dependent column and sample list is needed. More info »

Greta Linse Peterson
Golden Helix

4/7/2011

Analysis

Run Multiple Genotype Association Tests
This script runs genotypic association tests on multiple dependent phenotype columns. More info »

Christophe Lambert, Greta Linse Peterson
Golden Helix

3/14/2011

Import

Import PennCNV
This script imports PennCNV input signal intensity files, where each file contains data for a single sample. More info »

Sam Gardner
Golden Helix

3/11/2011

Edit

Append Several Spreadsheets
This function allows the user to append several spreadsheets in one dialog, saving the user from having to append each spreadsheet individually. More info »

Autumn Laughbaum
Golden Helix

3/11/2011

Edit

Join or Merge Several Spreadsheets
This function allows the user to merge several spreadsheets in one dialog, saving the user from having to merge each spreadsheet individually. More info »

Autumn Laughbaum
Golden Helix

3/3/2011

Analysis

ANOVA with Phenotype and SNPs
This function makes use of the scipy package, specifically the scipy.stats.f_oneway and scipy.stats.kruskal functions. This requires a numeric phenotype column and several genotype columns which provide the grouping structure in each test. More info »

Autumn Laughbaum
Golden Helix

3/3/2011

Analysis

ANOVA on Numeric Columns
This function makes use of the scipy package, specifically the scipy.stats.f_oneway and scipy.stats.kruskal functions. This requires a categorical dependent column that provides the grouping structure and several numeric columns. More info »

Autumn Laughbaum
Golden Helix

3/3/2011

Import

Import Affymetrix CN Segment Files
This script will import Affymetrix CN Segment files containing copy number segment data as outputted from Affymetrix. More info »

Sam Gardner
Golden Helix

3/3/2011

Filter

Filter by SIFT Synonymous Classification
This function scans a marker-mapped spreadsheet with several genotypic columns and investigates the corresponding SIFT marker map synonymous or non-synonymous classifications.This script requires the purchase of the Sequence Module to function.
More info »

Gabe Rudy
Golden Helix

1/26/2011

Edit

Activate ATCG SNPs to flip strand or to exclude SNPs
These scripts can be used to identify SNPs that have ambivalent orientation by comparing a genotype dataset with a reference dataset, such as HapMap data. More info »

Joost W. Morsink and Sander W. van der Laan
University Medical Center Utrecht


1/26/2011

CNV Analysis

Affymetrix B Allele Frequency Calculation
Using Affymetrix CEL files as its source, this script combines quantile normalized SNP A and B probe intensities for each marker into a theta value, then calculates B-Allele Frequencies for each marker. More info »

Greta Linse Peterson
Golden Helix

12/27/2013

Import/Export

BEAGLE/BEAGLECALL Scripts Package
These scripts are for importing and exporting files from the BEAGLE and BEAGLECALL Genetic Analysis Software Packages. More info »

Various GHI Staff
Golden Helix

1/26/2011

Analysis

Calculate Expected P-value
This script takes spreadsheet that contains a p-value column and calculates expected p-values for the specified column. It is also optional to export expected –log10 p-values as well. More info »

Greta Linse Peterson
Golden Helix

1/26/2011

SNP Analysis

Chi-Squared Contingency Table
This script computes the Pearson’s Chi-Squared Statistic for a contingency table with m groups and n observations (m rows and n columns). For 2x2 tables the p-value, –log10 p-value, Bonferroni p-value and –log10 Bonferroni p-value are also computed. More info »

Greta Linse Peterson
Golden Helix

1/26/2011

CNV Analysis

CNV PCA Search
Given a spreadsheet, prompt for a principal components spreadsheet, a lower and upper bound on the number of components and a step size. Runs association tests using each components setting, does a linear regression on the least significant 90% of the data and reports the slope of the line and a goodness of fit statistic. This script can be used in conjunction with the CNV PCA Search Tutorial. More info »

Christophe Lambert
Golden Helix

1/26/2011

Analysis

Create Table for Significant Regions
This script creates a spreadsheet with significant regions from a spreadsheet of p-values (in the first column). It also extracts p-values more extreme than a certain significance value (cutoff) and combines the remaining markers into segments. If two markers are on different chromosomes or more than a certain distance apart (split), a new region is created. More info »

Ingo Helbig
UK-SH Kiel

12/27/2013

Export

Export MACH PED_DAT Files
This script exports MACH/Merlin PED and DAT formatted files. Run this script from a pedigree spreadsheet that can contain as many phenotypes as desired. The user will be provided with the option to create one file per chromosome if a marker map is applied to the pedigree spreadsheet. More info »

Greta Linse Peterson
Golden Helix

1/26/2011

CNV Analysis

Log Ratio Tails
This script calculates percentile values for the upper and lower tails of log ratios using two user-specified thresholds. Missing values are skipped. A log ratio call rate is returned with the results.  This script may also be used to identify percentiles for real-value data other than log ratios. More info »

Christophe Lambert,
Bryce Christensen

Golden Helix

8/13/2013

Analysis

Nonparametric Association Tests (Binary Dependent)
This function makes use of the scipy package, specifically the scipy.stats.ranksums and scipy.stats.mannwhitneyu functions. With one binary dependent column, the user can perform nonparametric association tests on all numeric columns. More info »

Autumn Laughbaum
Golden Helix

8/13/2013

Analysis

Nonparametric Correlation
This function makes use of the scipy package, specifically the scipy.stats.spearmanr and scipy.stats.kendalltau functions. With one numeric dependent column, the user can perform nonparametric correlation tests on all numeric columns. More info »

Autumn Laughbaum
Golden Helix

1/26/2011

CNV Analysis

Row Averages with Histogram
This script will create a column subset from a numeric spreadsheet, then take the row averages and create a histogram of those averages. The subset is specified with a column chooser. This function is useful for LogR spreadsheets to investigate for possible CNVs.
More info »

Autumn Laughbaum
Golden Helix

1/26/2011

Quality Assurance

Sample Pair Mismatch
This script compares genotype calls from NSP and STY files and calculates the correlation between the nearest markers in the two sets. If there is a high correlation, the NSP and STY markers correspond to the same person, otherwise there is a mismatch. More info »

Christophe Lambert,
Greta Linse Peterson

Golden Helix

1/26/2011

Plotting

SNP Cluster Plots
This script creates scatter plots based on A and B allele intensities that can be split on SNP genotypes to create tri-colored cluster plots. The script will work for up to 100 SNPs at a time. More info »

Greta Linse Peterson
Golden Helix

© 2014 Golden Helix, Inc | Looking for the Golden Helix Institute or Symposia? Facebook Twitter Linked In Blog YouTube

| Site Map| Citing| Privacy Policy| Contact Us