‹‹ Back to SVS Home

Scripting Reference

6.6 Scripting Reference

Project Related Commands

Creating a New Project

Syntax

ghi.newProject(project name, project path)

Example

ghi.newProject(“Discovery”,“/projects”)

To create a new project that can later be viewed in GUI mode, use the above command. Once a project is created , all new SVS objects will be added to the project, in the same manner as the same operations in GUI mode. Note that the project path must be an existing folder on the file system, and the project name will generate a new folder in the project path directory.

Open an Existing Project

Syntax

ghi.openProject(project path and name of project)

Example

ghi.openProject(“/projects/Discovery/Discovery.ghp”)

To open a project previously created either in GUI mode or in script mode, use the openProject command.

Saving a Project

Syntax

ghi.saveProject()

Example

ghi.saveProject()

This command forces the project to save the current state of the project. The program auto-saved after important project modifications, this command ensures that the latest changes are saved.

Saving a Copy of a Project

Syntax

ghi.saveProjectAs(project copy directory, project copy name, switch to copy indicator, preserve password indicator)

Example

ghi.saveProjectAs(“/projects”,“NewDiscovery”,0)

Saves a copy of the current project. There are three required parameters and one optional parameter for this command. The first parameter is the path and the directory where the project is to be saved. The second parameter is the name of the project copy. The third parameter indicates if the open project should remain the current project or switch to the project copy.

The options for the third parameter are:

  • 0: Keep the current project open.
  • 1: Switch to the copy of the project.

The optional parameter specifies if password protection should be kept for the project copy, if the original project had a password associated with it. If password protection is kept, then the password for the project copy will be the same as for the original project. This option is specified by the following:

  • 0: Do not save password protection in the copy of the project.
  • 1 (default): Keep the password protection for the copy of the project.

Closing a Project

Syntax

ghi.closeProject()

Example

ghi.closeProject()

This command will close the current project without saving the state of the project. In order to save the project state before closing, first use the saveProject command and then the closeProject command.

Exiting the Program

Syntax

ghi.exit()

Example

ghi.exit()

To close the SVS program use this command. This command will exit the current project without saving the state of the project.

General GHI Commands

Allowing Viewers to Display

Syntax

ghi.enableNewViewers(viewer setting)

Example

ghi.enableNewViewers(ghi.const.AllGuiElements)

There may be times when running a script showing viewers, such as spreadsheets, can be a hindrance or not desired during the running of script commands. The above command will either suppress or allow the display of GUI viewers while executing scripts. Note that viewers can be turned on and off at any time while running a script, and this command only affects scripts that are run from the Scripts menu of a viewer. If viewers are turned off in a script, they will be turned on again upon completion of the script so new scripts will always start with viewers turned on.

There are three possible viewer settings:

  • ghi.const.NoGuiElements: hide all new viewers
  • ghi.const.AllGuiElements: show all new viewers
  • ghi.const.ProgressOnly: only show progress dialogs; hide all other new viewers

Display a GUI Message

Syntax

ghi.message(message string)

Example

ghi.message(“My important message to display to the viewer”)

Sometimes it may be necessary to pop up a GUI based message to report status, output or other information. This command will take the text parameter and display it in a standard message dialog until the OK button is clicked.

Display a GUI Error Message

Syntax

ghi.error()

Example

ghi.error()

An Example In Context:


try:  
    a=1/0  
except:  
    ghi.error()

Displays a divide by zero error.


If a script uses try/except syntax, this command can be used in the except clause and any exception message will be displayed in a GUI error dialog.

Creating a Unique Temporary File Name

Syntax

ghi.tmpFileName(extention: string)

Example

ghi.tmpFileName(’ext’)

If a project is open, creates a unique temporary file name with the given extention in the project temp folder. If a project is not open, create a unique temporary file name with the given extention in the system temp folder.

Getting a Specific Navigator Node

Syntax

object list = ghi.getObject(navigator node ID)

Example

myObject = ghi.getObject(8)

When a navigator node ID is known, the object representing that navigator node can be retrieved using this command. The above command takes an integer for the node ID, and returns a single object.

Getting the Current Navigator Node

Syntax

object variable = ghi.getCurrentObject()

Example

myObject = ghi.getCurrentObject()

Another way to get access to navigator nodes is to ask for the currently highlighted node. If no node is highlighted an error will be displayed. Otherwise, an object representing the current node will be returned. For any menu based script, the current object will be the one the script was invoked from. Most spreadsheet scripts invoke this command to get a handle to the current active spreadsheet.

Getting Program or Project Paths

Syntax

path = ghi.getProperty(property)

Example

myPath = ghi.getProperty(ghi.const.ProjectPath)

This command returns paths to specific program or project locations.

The possible values for the only argument are:

  • ghi.const.AppPath: path of the application directory
  • ghi.const.ProjectPath: path of the current project directory, returns the empty string if no project is open
  • ghi.const.MarkerMapPath: path of the genetic marker map directory
  • ghi.const.UserScriptsPath: path of the user scripts directory

Choosing a File

Syntax

file path list = ghi.chooseFile(file extension mask, dialog title, [Optional Parameter])

Example

myFilePaths = ghi.chooseFile(“*.txt”, “Choose A File Please”, ghi.const.ChooseSingleFile)

This method will display a dialog window for browsing and selecting a file or multiple files. If a file or multiple files are selected then a tuple with the complete path to all files is returned. If the dialog is canceled then an empty tuple is returned. There are two required parameters. The first parameter defines a file extension mask. For example, if the file extension mask is “*.txt” then the dialog will only display files with the .txt extension. The second parameter is the title to be displayed in the dialog’s title bar.

The third argument is optional.

  • Optional third parameter: Specifies the number of files to choose, or indicates a file should be created.
    • ghi.const.ChooseSingleFile (default): Only a single existing file may be selected.
    • ghi.const.ChooseMultipleFiles: Allow multiple selection of existing files.
    • ghi.const.ChooseSaveAs: Dialog is a “Save As” dialog. The file need not exist.

Choosing a Directory

Syntax

directory path list = ghi.chooseDirectory(dialog caption text, start directory)

Example

myDirPath = ghi.chooseDirectory(“Select a directory to save the file”, “c:/ProjectFiles/”)

This command presents a dialog allowing directory selection.

The return value is the path of the selected directory.

The parameters for this command are as follows:

  • caption: the string to be used as the selection dialog caption
  • startDir: (optional) the directory that the selection dialog will be opened to initially.

Opening a Web URL

Syntax

ghi.openUrl(url path)

Example

ghi.openUrl( http://www.goldenhelix.com)

Opens the specified URL in the default web browser.

Commands for Importing Data

Importing a Text File

Syntax

new spreadsheet object = ghi.importText(path and file name of text file, name for dataset, [Optional Parameters])

Example

mySS = ghi.importText(“/data/myTextFile.csv”, “My Data”, rowLabelColumn = 1, delimiter = “,”, missingEncoding = “?”, readGenetic = 1, alleleDelimiter = “_”, skipNumLines = 0)

This command may be used to import a text based file which uses any one-character delimiter. A spreadsheet generated from the data file is returned and may be assigned to a variable.

The optional parameters for this command specified by keyword arguments. The keyword arguments and their defaults are detailed below. Note: Not all keyword arguments need to be used.

  • rowLabelColumn:
    • if not specified, default is to generate generic row labels
    • 1: or any number greater than 0 and less than the number of columns in the data file
  • delimiter:
    • “,”: for comma-delimited files (default setting)
    • “ ”: for space-delimited files
    • \t”: for tab-delimited files
    • or any other one character delimiter
  • missingEncoding:
    • “?”: –question-mark (default setting)
    • or any other one character string signifying a missing value
  • readGenetic:
    • 0: read all data of the form “A_B” as categorical data
    • 1: read all data of the for “A_B” as genetic data
  • alleleDelimiter:
    • “_”: –underscore (default setting)
    • “/”: –slash
    • any other one character string corresponding to the allele delimiter
  • skipNumLines:: skips the specified number of lines before reading the file.
    • 0: (default setting)
    • any other number between 1 and the total number of rows in the dataset

These optional parameters are the same ones for the following functions detailed below:

  • ghi.importTextPedigree()
  • ghi.importTextPhenotype()

Importing Various Third-Party File Formats

Syntax

new spreadsheet object = ghi.importData(path and file name of file, [Optional Parameters])

Example

mySS = ghi.importData(“/data/myFile.xls”, columnHeaderMode = ghi.const.HeaderAutoDetect, alleleDelimiter = “_”, rowLabelColumn = 1,
workSheet = 1)

Corresponding to the Import Third Party Files dialog, this command allows for importing various file formats into the project.

The optional parameters for this command are specified by keyword arguments. The keyword arguments and their defaults are detailed below. Note: Not all keyword arguments need to be used.

  • columnHeaderMode:
    • ghi.const.HeaderAutoDetect: Autodetect if the first row contains column names (default setting)
    • ghi.const.HeaderFirstRow: Use the first row as column names
    • ghi.const.HeaderGenerate: Generate generic column names and use the first row as data
  • alleleDelimiter:
    • “_”: –underscore (default setting)
    • “/”: –slash
    • any other one character string corresponding to the allele delimiter
  • rowLabelColumn:
    • if not specified, default is to generate generic row labels
    • 1 or any number greater than 0 and less than the number of columns in the data file
  • workSheet:
    • 1 (default if applicable)
    • any other number between 1 and the total number of worksheets if this option is applicable to the file type.

Importing PED Files

Syntax

new spreadsheet object = ghi.importPED(dataset name, path and file name of PED file, path and file name of MAP file, [Optional Parameters])

Example

mySS = ghi.importPED(“My PED Dataset”, “/data/myPEDFile.ped”, “/data/myPEDFile.map”, missingPhenotype = -9, missingGenotype = “?”)

PED files can be imported using the ghi.importPED() scripting command. This command takes three string parameters for the dataset name, the PED and MAP file paths, as well as two optional arguments to specify missing values for phenotypes and genotypes.

The optional parameters for this command specified by keyword arguments, these arguments and their defaults are detailed below. Note: Not all keyword arguments need to be used.

  • missingPhenotype:
    • -9 (default setting)
    • or any other integer value signifying a missing value
  • missingGenotype:
    • “0” – zero string (default setting)
    • or any other one character string signifying a missing value

Importing TPED Files

Syntax

new spreadsheet object = ghi.importTPED(dataset name, path and file name of TPED file, path and file name of TFAM file, [Optional Parameters])

Example

mySS = ghi.importTPED(“My TPED Dataset”, “/data/myTPEDFile.tped”, “/data/myTPEDFile.tfam”, missingPhenotype = -9, missingGenotype = “0”)

TPED files can be imported using the ghi.importTPED() scripting command. This command takes three string parameters for the dataset name, the TPED and TFAM file paths, as well as two optional arguments to specify missing values for phenotypes and genotypes.

The optional parameters for this command specified by keyword arguments, these arguments and their defaults are detailed below. Note: Not all keyword arguments need to be used.

  • missingPhenotype:
    • -9 (default setting)
    • or any other integer value signifying a missing value
  • missingGenotype:
    • “0” – zero string (default setting)
    • or any other one character string signifying a missing value

Importing BED Files

Syntax

new spreadsheet object = ghi.importBED(dataset name, path and file name of BED file, path and file name of FAM file, path and file name of BIM file)

Example

mySS = ghi.importBED(“My BED Dataset”, “/data/myBinaryPedFile.bed”, “/data/myBinaryPedFile.fam”,
“/data/myBinaryPedFile.bim”)

Use the above command to import BED (binary PED) files. This command takes 4 arguments: a dataset name, the BED file path, the FAM file path, and the BIM file path.

Importing a DSF File

Syntax

new spreadsheet object = ghi.importDSF(path and file name of DSF file)

Example

mySS = ghi.importDSF(“/data/myDSFfile.dsf”)

This command can be used to import a Dataset Storage Format (DSF) dataset. The resulting spreadsheet is returned and may be assigned to a variable.

Importing a Legacy GHD File

Syntax

new spreadsheet object = ghi.importGHD(path and file name of GHD file)

Example

mySS = ghi.importGHD(“/data/myGHDfile.ghd”)

This command can be used to import a (“Legacy”) GHD format dataset. The resulting spreadsheet is returned and may be assigned to a variable.

Importing Affymetrix CHP Files

Syntax

new spreadsheet object = ghi.importCHP(list of paths and file names of CHP files, dataset name, [Optional Parameters])

Example

mySS = ghi.importCHP([“/data/Sample1.chp”, “/data/Sample2.chp”, “/data/Sample3.chp”], “MyCHPdata”,
libraryPath = “c:/AffyLibraryFiles”, confidenceScore = 0.5)

Using the ghi.importCHP() command, data can be imported from CHP files. This command has all of the functionality of the GUI import dialog.

This command requires the following parameters in this order:

  • Input file list: This list should contain the paths and names for all CHP files that are to be imported. This must be a Python List.
  • Dataset Name: A name for the dataset created on import of the CHP files.

The optional parameters for this command are specified by keyword arguments. The keyword arguments and their defaults are detailed below.

  • libraryPath:
    • Allows the path to a directory containing Affymetrix library files to be specified.
  • confidenceScore:
    • Specify a value to use as a confidence score upper limit threshold if other than the default.

Importing Affymetrix CEL Files

Syntax

new spreadsheet object = ghi.importCEL(list of paths and file names of CEL files, reference sheet node ID, reference status column number, marker map name from marker maps folder, [Optional Parameters])

Example

mySS = ghi.importCEL([“/data/sample1.cel”, “/data/sample2.cel”, “/data/sample3.cel”], 45, 3,
“/MyMarkerMap.dsm”, affyLibDir = “/Golden Helix SVS/AffyLibraryFiles”,
tempDir = “c:/Temp”, dropReferences = 0)

Using the ghi.importCEL() command, a dataset and spreadsheet containing the normalized copy number intensity values contained in the input CEL files will be created. The resulting dataset and corresponding spreadsheet will be available for immediate use in SVS.

This command requires the following parameters in this order:

  • Input file list: This list should contain the paths and names for all CEL files that are to be imported. This must be a Python List.
  • Reference Status Sheet Node ID: An integer representing the node ID of the spreadsheet object that will contain the reference column.
  • Reference Status Column: An integer that represents the column number of the actual reference status column within the reference status spreadsheet.
  • Marker Map Name : The path and file name of the marker map DSM file to be used.

The optional parameters for this command are specified by keyword arguments. The keyword arguments and their defaults are detailed below. Note: Not all keyword arguments need to be used.

  • affyLibDir:
    • Allows the path to a directory containing Affymetrix library files to be specified.
  • tempDir:
    • Allows a directory to use for temporary file creation to be specified.
  • mappingSheetId:
    • An integer representing the node ID for the NSP/STY mapping spreadsheet (For Affy 500k CEL files).
  • dropReferences:
    • 0 (default): If references should be kept in the output spreadsheet.
    • 1: If references should be dropped in the output spreadsheet.
  • importIntermediate
    • 0 (default): Do not import intermediate files.
    • 1: Import intermediate files.
  • datasetName
    • allows specification of the new dataset name
    • (default): “Affy CEL Dataset”

Note:

  1. If the affyLibDir keyword argument is omitted, the appropriate CDF files must be in the AffyLibraryFiles directory of your SVS directory.
  2. The temporary directory used by the import process must be on a local disk. If your project is on a network drive, be sure to specify tempDir as a directory on your local disk.
  3. If you are using Affymetrix 500k data, and are including both NSP and STY CEL files, you must choose an NSP/STY mapping spreadsheet in order for the import to be successful.

See the section on Affymetrix Files for more information on the import process.

Importing CNT Files

Syntax

new spreadsheet object = ghi.importCNT(list of paths and file names of CNT files, dataset name)

Example

mySS = ghi.importCNT([“/data/sample1.cnt”,“/data/sample2.cnt”,“/data/sample3.cnt], “My CNT Dataset”)

Using the ghi.importCNT() command, data can be imported from CNT files. This command has all of the functionality of the GUI import dialog.

This command requires the following parameters in this order:

  • Input file list: This list should contain the paths and names for all CNT files that are to be imported. This must be a Python List.
  • Dataset Name: A name for the dataset created on import of the CNT files.

Importing Affymetrix CNCHP Files

Syntax

new spreadsheet object = ghi.importCNCHP(list of paths and file names of CNCHP files, dataset name)

Example

mySS = ghi.importCNCHP([“/data/sample1.cnchp”,“/data/sample2.cnchp”,“/data/sample3.cnchp], “My CNCHP Dataset”)

Using the ghi.importCNCHP() command, data can be imported from CNCHP files. This command has all of the functionality of the GUI import dialog.

This command requires the following parameters in this order:

  • Input file list: This list should contain the paths and names for all CNCHP files to be imported. This must be a Python List.
  • Dataset Name: A name for the dataset created on import of the CNCHP files.

Importing an FBAT Pedigree File

Syntax

new spreadsheet object = ghi.importFbatPedigree(path and file name of PED file, dataset name, [Optional Parameters])

Example

mySS = ghi.importFbatPedigree(“/data/myFbatPedigreeFile.ped”, “My Pedigree Dataset”, missingPhenotype = “?”, missingGenotype = “?”)

To import an FBAT/PBAT Pedigree file, use the above command; the resulting spreadsheet is returned and may be assigned to a variable. There are two required parameters and two optional parameters. The first required parameter is the path and file name of the PED file, and the second required parameter is the name for the dataset after import into the project navigator.

The optional parameters for this command specified by keyword arguments, these arguments their defaults are detailed below. Note: Not all keyword arguments need to be used.

  • missingPhenotype:
    • “?”: –question-mark (default setting)
    • or any other one character string signifying a missing value
  • missingGenotype:
    • “?”: –question-mark (default setting)
    • or any other one character string signifying a missing value

Importing an FBAT Phenotype File

Syntax

new spreadsheet object = ghi.importFbatPhenotype(path and file name of PHE file, dataset name, character for missing data)

Example

mySS = ghi.importFbatPhenotype(“/data/myFbatPhenotypeFile.phe”, “My Phenotype Dataset”, “?”)

To import an FBAT/PBAT Phenotype file, use the above command; the resulting spreadsheet is returned and may be assigned to a variable. There are two required parameters and one optional parameter. The first required parameter is the path and file name of the PHE file, and the second required parameter is the name for the dataset after import into the project navigator. The optional parameter specifies how missing values are encoded for missing phenotype data. If this optional parameter is not specified, the default missing character is “?” (question mark).

Importing a Text Pedigree File

Syntax

new spreadsheet object = ghi.importTextPedigree(path and file name of text pedigree, dataset name, [Optional Parameters])

Example

myPedSS = ghi.importTextPedigree(“/data/pedigree_v1.txt”, “Pedigree Version1”, delimiter = “ ”, missingEncoding = “?”, readGenetic = 1, alleleDelimiter = “/”, skipNumLines = 1)

This command allows a text pedigree file (in either a CSV or a TXT file format) to be imported. There are two required parameters and several optional parameters. The first required parameter is the path and file name of the text file to be used as a pedigree file; this file must be in the correct pedigree format. See the section Importing PBAT Family-Based Data for instructions on how to correctly format a text Pedigree file. The second required parameter is the dataset name.

See the paragraph on Commands for Importing Data for detailed descriptions of the optional parameters for this command.

Importing a Text Phenotype File

Syntax

new spreadsheet object = ghi.importTextPhenotype(path and file name of text phenotype, dataset name, [Optional Parameters])

Example

myPheSS = ghi.importTextPhenotype(“/data/phenotypeFull.csv”, “Full Phenotype”, delimiter = “,”,
missingEncoding = “.”)

This command allows a text phenotype file (in either a CSV or a TXT file format) to be imported. There are two required parameters and several optional parameters. The first required parameter is the path and file name of the text file to be used as a phenotype file; this file must be in the correct phenotype format. See the section Importing PBAT Family-Based Data for instructions on how to correctly format a text Phenotype file. The second required parameter is the dataset name.

See the paragraph on Commands for Importing Data for detailed descriptions of the optional parameters for this command.

Converting a Affymetrix Annotation Text File into a Genetic Marker Map DSM File

Syntax

ghi.convertAffyAnnotations(list of file paths to the annotation files)

Example

mmResult = ghi.convertAffyAnnotations([“/markerMaps/Mapping250K_Nsp.na24.annot.csv”,
“/markerMaps/Mapping250K_Sty.na24.annot.csv”])

This command converts Affymetrix Annotation files from the text file format available from direct download from the Affymetrix website to a DSM marker map file. This is useful for converting older annotation versions that are not available from the Golden Helix, Inc. Affymetrix Annotation Download utility in SVS.

True is returned if the marker map is successfully converted.

The required parameter is as follows:

  • List of file paths of the names of the text files to convert and/or merge if multiple files are specified.

Converting a Text or MAP File into a Genetic Marker Map DSM File

Syntax

ghi.convertMarkerMap(path and file name of marker map, Marker Map name, SNP ID column number, Chromosome column number, Position column number, [Optional Parameters])

Example

myMM = ghi.convertMarkerMap(“/data/myTextMarkerMap.csv”,“MarkerMap_v1”, 1, 3, 2, delimiter = “,”, missingEncoding = “?”, skipNumLines = 0, optColumns = {“RS ID”:4, “cytoband”:5, “gene”:6, “Allele A/B”:7})

This command allows a marker map stored in a text file or a MAP file to be converted to the DSM (Dataset Storage Marker map) format. All marker maps in SVS versions 7.0 or greater need to be converted to this format to be used as a marker map. There are five required parameters for this command, and two optional parameter dictionaries, which provide the ability to import additional marker map columns as well as change the text file import parameters.

The required parameters, in order, are as follows:

  • Path and file name of text file or MAP file to convert. This file will have one of the following file extensions: .txt, .csv, or .map.
  • Marker map name to easily identify the marker map from the list in the Marker Maps folder.
  • SNP or Probe ID integer column number in the text file that is being converted.
  • Chromosome integer column number in the text file that is being converted.
  • Position integer column number in the text file that is being converted.

There are three text file import parameters that can be specified, their keyword arguments and defaults are detailed below. Note: Not all keyword arguments need to be used.

  • delimiter:
    • “,”: for comma-delimited files (default setting)
    • “ ”: for space-delimited files
    • \t”: for tab-delimited files
    • or any other one character delimiter
  • missingEncoding:
    • “?”: –question-mark (default setting)
    • or any other one character string signifying a missing value
  • skipNumLines:
    • 0: (default setting)
    • any other number between 1 and the total number of rows in the dataset

To specify additional marker map columns to import, a Python dictionary must be used with the keyword argument optColumns. A dictionary is indicated with curly braces and and assignment of a parameter is indicated with a colon (:). Any number of additional columns or fields can be imported using the dictionary, all fields separated with a comma. For example, if it is desired that the fields RS ID, cytoband, gene, and Allele A/B be included in the marker map, then these fields would be specified as follows. See the example above for the full syntax.

optColumns = {‘‘RS ID’’:4, ‘‘cytoband’’:5, ‘‘gene’’:6, ‘‘Allele A/B’’:7}

Importing a Genetic Marker Map DSM File as a Spreadsheet

Syntax

new spreadsheet object = ghi.importMapAsSpreadsheet(marker map DSM file name)

Example

myMMspreadsheet = ghi.importMapAsSpreadsheet(“/Golden Helix SVS/MarkerMaps/myMarkerMap.dsm”)

Imports a DSM file into the project as a spreadsheet object.

Commands Common to All Objects

Append Annotations for the Object

Syntax

node object.appendAnnotations(new annotations text)

Example

myNodeObject.appendAnnotations(“some text that describes an aspect of this object”)

This command will append a string to the end of the current contents of the annotations window for this node object.

Append Node Log for the Object

Syntax

node object.appendLog(new node log text)

Example

myNodeObject.appendLog(“some text that describes an aspect of this object”)

This command will append a string to the end of the current contents of the node change log window for this node object.

Display Object Class

Syntax

object class = node object.className()

Example

myObjectClass = myNodeObject.className()

If needed, the object classification can be obtained with this command. This command returns a string displaying the object’s class.

Delete the Object

Syntax

node object.deleteObject()

Example

myNodeObject.deleteObject()

Deletes the node object and any child nodes. Unlike the GUI, this command does not prompt the user to confirm the delete. Use this command with caution!

Get Object Annotations

Syntax

node object.getAnnotations()

Example

myNodeObject.getAnnotations()

This command will return a string with the current contents of the annotation window for the current node object.

Get Help on the Object

Syntax

node object.help()

Example

myNode.help()

Returns a list of all the possible commands associated with the node object.

Hide the Object

Syntax

node object.hide()

Example

myNode.hide()

Closes or hides the particular node object.

Resume Object Logging

Syntax

node object.resumeLogging()

Example

myNode.resumeLogging()

The output in the Node Change Log can be displayed by using this command, which usually follows the stop logging command.

Set the Name of the Object

Syntax

node object.setName(node display name string)

Example

myNode.setName(“New Node Name Modified”)

Renames the navigator node in the project navigator window.

Show the Object

Syntax

node object.show()

Example

myNode.show()

Shows the specified node object.

Stop Object Logging

Syntax

node object.stopLogging()

Example

myNode.stopLogging()

The output in the Node Change Log can be suppressed by using this command. Logging will resume after using the resumeLogging command.

Get the Navigator Node ID

Syntax

navigator node id = node object.getID()

Example

myNodeID = myNode.getID()

If needed, the navigator node ID from an object can be obtained with this command. The command returns an integer representing the node’s ID.

Get the Navigator Node Name

Syntax

navigator node name = node object.getName()

Example

myNodeName = myNode.getName()

The name of a navigator node can be obtained using this command with any Python object that corresponds to a navigator node.

Get the Node Options

Get the Parent of the Specified Node

Syntax

new node object = node object.getParent()

Example

newObject = myNode.getParent()

To get an object representing the parent of the specified node, use the above command. An object representing the parent node will be returned.

Commands For User Input

Prompting the User for a Marker Map

Syntax

marker map name = ghi.chooseMarkerMap()

Example

markermapname = ghi.chooseMarkerMap()

This method will display the marker map selection window and allow the user to choose a marker map. The file name of the marker map will be returned.

Prompting the User for a Spreadsheet

Syntax

new spreadsheet = ghi.promptSpreadsheet()

Example

newSpreadsheet = ghi.promptSpreadsheet()

This command allows the user to choose a spreadsheet from the Project Navigator Window through the aid of a spreadsheet chooser dialog.

  • The first optional parameter represents the text that will be displayed in the spreadsheet chooser dialog.
  • The second optional parameter allows the specification of a list of requirements which a spreadsheet must meet for it to be selectable. The following is a list of possible requirements:
    • ghi.const.IsPedigree
    • ghi.const.ContainsGenetic
    • ghi.const.ContainsMappedGenetic
    • ghi.const.ContainsMappedReal
    • ghi.const.IsPrecomputedPca
    • ghi.const.ContainsBinary
    • ghi.const.ContainsInteger

Prompting the User for an Binary Value

Syntax

binary result = ghi.question())

Example

result = ghi.question(“Do you want to continue:”)

Displays a dialog box with two buttons, ’Yes’ and ’No’. The appropriate binary response, True or False, is stored in the variable result.

There is one required parameter, the instructions for the user. These instructions can be as simple or as complex as needed. The binary value is returned and can be stored in a variable.

Prompting the User for an Integer

Syntax

integer result = ghi.promptInteger())

Example

result = ghi.promptInteger(“Choose a number between 0 and 10:”, min=0, max=10)

Displays a dialog box with a drop down list of all integers between 0 and 10. A user can either scroll to the appropriate integer, or type the integer in the list box. The integer is stored in the variable result.

This command allows the user to specify an integer. There is one required parameter, the instructions for the user. These instructions can be as simple or as complex as needed. The integer is returned and can be stored in a variable.

There are two optional parameters, a minimum and a maximum bound for the integer. These optional parameters can be specified by a keyword argument and are detailed below.

  • min: If this parameter is specified, the dialog box will display a list box with this minimum value. The user will not be able to choose a value smaller than this minimum value.
  • max: If this parameter is specified,the dialog box will display a list box with this maximum value. The user will not be able to choose a value larger than this maximum value.
  • If neither keyword arguments are used, then the dialog box will display a text box where a user can enter in any integer value.

Prompting the User for Input

Syntax

new dictionary = ghi.promptDialog(list of dictionaries)

Example

myUserInput = ghi.promptDialog([{”name”:”string”,”label”:”Enter string:”, ”type”:”string”, ”tooltip”:”Any string will do”,’ ”default”:”Hello World!”},
{”name”:”myInt”,”label”:”Enter Integer:”, ”type”:”integer”, ”min”:0, ”max”:100, ”default”:50},
{”name”:”value”, ”label”:”Enter double”, ”type”:”double”, ”min”:-1, ”default”:100.0001},
{”name”:”method”, ”label”:”Select method”, ”type”:”combobox”, ”list”:[”method 1”, ”method 2”, ”method 3”]}],
title=”User Input”,width=400)

The example above displays a dialog box with four input fields, three text boxes and a drop down list. The first text box is for a string, the second for an integer, and the third for a real valued number. The drop down list allows the user to specify a method.

The following optional keywords can be specified with the promptDialog command after the list of input fields.

  • scrollableLayout=<boolean> If True, the layout for the input dialog will be placed in a scrollable frame. This is useful if you have too many inputs to fit on a normal height screen.
  • width=<int> and height:<int> Specify minimum sizes for the input dialog. This will not shrink the dialog to be smaller than is required to display all the widgets when not using a scrollable layout but can be used to make the dialog wider or taller than it otherwise would be.
  • title=<string> Provide a window title for the input dialog.
  • okText=<int> Provide an alternative text for the OK button on the input dialog. For example, you may want the button to say Run or Next>> if you intend to have further dialogs of user input.

The input fields of the promptDialog() function are specified in a list of dictionaries, each with some required and optional attributes. Every item requires a type field. Items with a type that produces an output require a name field, which will be used as the key in the results dictionary to store the output for that item. For example, to access the integer value you would use the command myUserInput[‘‘myInt’’] since the name attribute is defined as “myInt”.

There are a number of data entry methods available. Most items require a label attribute for providing a user prompt. Items with an explicit label left of the input widget allow the setting the “checkable” attribute to True to make that label into a checkbox. If not checked, the input widget is disabled the item returns None for its value. The checkbox is checked by default but can be changed with the “checked” attribute.

Each entry may have an optional tooltip attribute, which is a message that appears when the user hovers the mouse over the field. Labels are listed at the left of the data field.You can make a input a optional field by setting its required field to False. Most input types also allow specifying a default value. The “type” attribute defines which type of data entry field is to be constructed.

The available types and there specific behavior and attributes are as follows:

  • integer: prompts user for an integer. Does error checking to ensure that a valid integer has been entered. Required attribute: “label”: the input prompt. Aliases: int.
    Optional attributes:
    • “default”:<integer> specifies a default value shown in the dialog window.
    • “min”:<integer> specifies that integer must be greater than or equal to the specified minimum value.
    • “max”:<integer> specifies that integer must be less than or equal to the specified maximum value.
    • “checkable”<boolean> allows the input to be required if it is checked, otherwise not required.
      • “checked”<boolean> default status if “checkable”:True.
  • double: prompts user for a double precision (64-bit) number. Does error checking to see that a valid double has been entered. Required attribute: “label” the input prompt, Aliases: float, real
    Optional attributes:
    • “default”:<double> specifies a default value shown in the dialog window.
    • “min”:<double> specifies that user-entered double must be greater than or equal to the specified minimum value.
    • “max”:<double> specifies that user-entered double must be less than or equal to the specified maximum value.
    • “checkable”<boolean> allows the input to be required if it is checked, otherwise not required.
      • “checked”<boolean> default status if “checkable”:True.
  • string: prompts user for a string. If a required input, checks that the input string is not blank. Required attribute: “label” the input prompt.
    Optional attributes:
    • “default”:<string> specifies a default value shown in the dialog window.
    • “checkable”<boolean> allows the input to be required if it is checked, otherwise not required.
      • “checked”<boolean> default status if “checkable”:True.
  • checkbox: provides a checkbox with the text provided by the label as the prompt. The check box state of check correlates to a value of True and unchecked to False. Required attribute: “label” the input prompt, Aliases: check.
    Optional attributes:
    • “default”:<boolean> the checked state.
  • combobox: takes an additional non-optional attribute, “list”, which contains a list of strings to form a list of choices for the user to choose from. For example, the dictionary entry “list”:[“item 1”, “item 2”] specifies a combobox with two possible values to choose from, “item 1” and “item 2”. The first item is specified by default. Returns the selected string from the list. Required attribute: “label” the input prompt, Aliases: combo.
    Optional attributes:
    • “default”:<boolean> the string value of the item to be selected.
    • “checkable”<boolean> allows the input to be required if it is checked, otherwise not required.
      • “checked”<boolean> default status if “checkable”:True.
  • radio: Similar to combobox but presents each option as a radio button. Requires “list”, which contains a list of strings to form a list of choices for the user to choose from. For example, the dictionary entry “list”:[“item 1”, “item 2”] specifies a two radio buttons to choose from, “item 1” and “item 2”. The first item is specified by default. Returns the selected string from the list. Required attribute: “label” the input prompt, Aliases: combo.
    Optional attributes:
    • “default”:<boolean> the string value of the item to be selected.
    • “orientation”:<int> Either ghi.const.OrientationVertical or ghi.const.OrientationHorizontal to specify the layout of the buttons (vertical as the default).
    • “checkable”<boolean> allows the input to be required if it is checked, otherwise not required.
      • “checked”<boolean> default status if “checkable”:True.
  • dir: Prompts the user for a directory. A Browse button will pop up the operating systems directory chooser. Required attribute: “label” the input prompt, Aliases: directory.
    Optional attributes:
    • “default”:<string> the full path to an existing file.
  • fileopen: Prompts the user for a an existing file. A Browse button will pop up the operating systems file open chooser. Required attribute: “label” the input prompt.
    Optional attributes:
    • “default”:<string> the full path to an existing file.
    • “filter”:<string> The filter string for the open file dialog. Multiple filters can be separated by a double semicolon. For example “Python File (*.py);;All Files (*)” would provide two filters in the dialog, defaulting on only showing files ending in .py.
  • filesave: Prompts the user for a new file. A Browse button will pop up the operating systems file save as chooser. Required attribute: “label” the input prompt.
    Optional attributes:
    • “default”:<string> the full path to a file.
    • “filter”:<string> The filter string for the open file dialog. Mutliple filters can be separated by a double semicolon. For example “Python File (*.py);;All Files (*)” would provide two filters in the dialog, defaulting on only showing files ending in .py.
  • files: Prompts the user for multiple existing files. A Browse button will pop up the operating systems file open chooser. Returns the list of selected files. Required attribute: “label” the input prompt.
    Optional attributes:
    • “filter”:<string> The filter string for the open file dialog. Mutliple filters can be separated by a double semicolon. For example “Python File (*.py);;All Files (*)” would provide two filters in the dialog, defaulting on only showing files ending in .py.
  • markermap Prompts the user for a marker map from the marker maps folder. Required attribute: “label” the input prompt.
  • spreadsheet Prompts the user for a spreadsheet from the open project (requires a project be open). Returns the selected spreadsheet’s ID. Required attribute: “label” the input prompt, Alias: ss.
    Optional attributes:
    • “default”:<int> The ID of a spreadsheet.
    • “mapped”:<boolean> If True, only allow for mapped spreadsheets to be selected.
  • spreadsheets Prompts the user for multiple spreadsheets from the open project (requires a project be open). Returns the list of selected spreadsheets’ IDs. Required attribute: “label” the input prompt.
    Optional attributes:
    • “mapped”:<boolean> If True, only allow for mapped spreadsheets to be selected.
  • column: Prompts the user for a column from a specified spreadsheet. A select button will pop up a column chooser dialog. Returns the selected column’s index. Required attributes: “spreadsheet”:<int—string> ID of the spreadsheet or the name of the previous item of type “spreadsheet” to select the spreadsheet from which columns are chosen. “label” the input prompt, Alias: col.
    Optional attributes:
    • “default”:<int> a valid column index.
    • “types”:<int list> Changes the column type filter in the column chooser dialog to only allowing selection of column from the given types list. Items should be one of ghi.const.Type*. For example “types”:[ghi.const.TypeReal, ghi.const.TypeInteger] would only allow the selection of real and integer columns.
    • “filter”<int> A filter on the columns used from the chosen spreadsheet. This uses the same filter logically ORed options as spreadsheet.dataModel(). For example, to only allow Active data use “filter”:ghi.const.FilterActive
    • “allowLabel”<boolean> indicates whether or not the row labels are an appropriate input and thus can be selected by the user.
  • columns: Prompts the user for a list of columns from a specified spreadsheet. A select button will pop up a column chooser dialog. Returns the selected columns’ indexes. Required attributes: “spreadsheet”:<int—string> ID of the spreadsheet or the name of the previous item of type “spreadsheet” to select the spreadsheet from which columns are chosen. “label” the input prompt, Alias: cols.
    Optional attributes:
    • “types”:<int list> Changes the column type filter in the column chooser dialog to only allowing selection of column from the given types list. Items should be one of ghi.const.Type*. For example “types”:[ghi.const.TypeReal, ghi.const.TypeInteger] would only allow the selection of real and integer columns.
    • “filter”<int> A filter on the columns used from the chosen spreadsheet. This uses the same filter logically ORed options as spreadsheet.dataModel(). For example, to only allow Active data use “filter”:ghi.const.FilterActive.
    • “allowLabel”<boolean> indicates whether or not the row labels are an appropriate input and thus can be selected by the user.
  • markermapfield: Prompts the user for a marker map field from the marker map from a provided spreadsheet. A select button will pop up a field chooser dialog. Returns the selected field’s index. Required attributes: “spreadsheet”:<int—string> ID of the spreadsheet or the name of the previous item of type “spreadsheet” to select the spreadsheet from which columns are chosen. “label” the input prompt, Alias: mapfield.
    Optional attributes:
    • “default”:<int> a valid map field index.
    • “types”:<int list> Changes the field type filter in the field chooser dialog to only allowing selection of map fields from the given types list. Items should be one of ghi.const.Type*. For example “types”:[ghi.const.TypeInteger] would only allow the selection of integer fields.
    • “allowLabel”<boolean> indicates whether or not the row labels are an appropriate input and thus can be selected by the user.
  • coordsys: Prompts the user to select a coordinate system (species and build combination) from the system provided list built from available genome maps. The default coordinate system will be the global or project default. Returns the coordinate system string for use when building Genome Browser tracks.
    • “default”:<string> coordinate system string.
  • track: Prompts the user for a Genome Browser track from the project, user data folder or system folder. Returns the selected track’s URL. Required attribute: “label” the input string, Alias: annotation.
    Optional attributes:
    • “default”:<string> a track URL.
    • “source”:<one of “local”,”network” or “all”> Selects the source of the Genome Browser track. Local indicates tracks that have been entirely copied to a local IDF file. Network indicates tracks that are streamed on demand from data.goldenhelix.com. All provides both source options in the selection dialog. Defaults to “local”.
    • “trackType”:<string> Filters the track selector lists to only contain tracks of the given type. Example types are “Interval”, “Cytoband”, “Gene”, “Probe”, “Allele Sequence”, and “Intensity”. Defaults to no track type filter.
  • idf: Prompts the user to select a new or existing IDF file. Defaulting to the user app data folder, tracks can be appended to existing IDF files or placed on new IDF files. This prompt may be desirable for scripts that need to write track information to a IDF file. Required attribute: “label” the input string.
    Optional attributes:
    • “default”:<string> full path to an IDF file.
  • prompt: displays a label only. This type requires no user input and produces no output. It does not require a name attribute.

There are some types that do not provide input and do not contribute to the returned output list, but allow for the dialogs to have a richer layout and organization. The name, default and required attributes are not used for these types, although label usually is.

These types are containers for other types and have a few attributes in common in common.

First, they require an items attribute, which is a list of dictionary items like the one provided to promptDialog. These are the items that will be inside the container. Containers can contain container items so multiple levels of items can be constructed for a maximum flexibility in displaying items to the user.

Second, containers have an optional layout option that is one either “vertical”, “horizontal” or “columns” where the default is “vertical”. These are described below.

  • vertical: All items are laid out vertically, one on top of the other.
  • horizontal: All items are laid out horizontally, one after the other.
  • columns: Items are laid out in vertical columns. This requires that the items attribute is a list of list of items instead of just a list of items. The other list defines the columns. For example, “items”:[ [item1, item2], [item3, item4, item5], [item6] ] would produce a three column layout with item 1 and 2 in the firs, 3, 4 and 5 in the second and 6 in the third.

Below a list of container types are described:

  • group: places a number of following items into a group box with label as the label. Similar to items with an explicit label widget, this container takes the “checkable” and “checked” attributes. When unchecked, all items in the group are disabled and return None for their values.
  • tab: places a number of following items into a tab with label as the label. Multiple tabs will share a tab widget if an additional tab item follows a tab item the number of items it specified.
  • widget: A blank container widget. Does not require a label. Useful for simply specifying a different layout for a couple items.

If the user cancels, an empty list is returned. It is wise to first check if the list is empty before trying to access its elements. If there is an error in syntax, a Python exception is returned with a description of the error. If the user hits OK, and there is any error in the input, the user is told by the dialog what the problem is, which may be remedied from within the dialog. If there are no errors, a dictionary of the user inputs is returned.

Commands for Downloading Golden Helix, Inc. Hosted Data Files

The Python interface can be used to query data files that are available from download on the Golden Helix, Inc. Data server.

Fetch a List of Categories or File Names

Syntax

python list = ghi.dataFetchListing(category)

Example

categoryList = ghi.dataFetchListing()

OR

Example

fileList = ghi.dataFetchListing(‘MarkerMaps’)

This command fetches listings from our data repository of optional resources. The category parameter is optional.

When category is not set, the list of valid available categories is returned. If a category is provided, a list is returned where each entry is a list of information for a available file in that category.

Fetch a URL and MIME Type for a Specified File

Syntax

python list = ghi.dataFetchUrls(category, query file name)

Example

urlList = ghi.dataFetchUrls(‘MarkerMaps’,‘Affy’)

OR

Example

urlList = ghi.dataFetchUrls(‘RefData’,‘Affy*.dsf’)

This command returns a list or pairs of URL and MIME type for a file on our data repository server with the give category and queryFileName. queryFileName can contain a wild card prefix or postfix to potentially return a list of matching files.

Auto Download Data from the Data Repository

Syntax

file path string = ghi.dataAutoDownload(category, file name, prompt override, manual path)

Example

filePath = ghi.dataAutoDownload(‘RefData’, ‘AffySNP6_YRI_SNP_ABRefs.dsf’,1,‘e:/Tmp’)

This command looks up a file on our data repository server (see ghi.dataFetchUrl) and then downloads that to the application data folder if it is not already downloaded. In either case the full path of the local copy of the file is returned.

The parameters are as follows:

  • Category: The repository category
  • File Name: The repository file name
  • Prompt Override: (optional) Prompts if the file exists to override it rather than not downloading files that exist.
  • Manual Path: (optional) Specify a manual destination path for downloads.
Commands for Importing Data as Genome Browser Annotation Files

The Python interface can be used to create new annotation tracks from text files and/or spreadsheets. The resulting tracks can be visualized in any Genomic mode Plot Viewer. Care should be taken to match the species and build written to each annotation track with the source of its data.

Creating an Annotation Track from 2Bit Data

Syntax

ghi.genomebrowser.importTrackFrom2Bit(input file, output file, title, [Optional Parameters])

Example

ghi.genomebrowser.importTrackFrom2Bit(’mySequence.2bit’, ’MySequence.idf’, ’My Sequence Track’, coordSysId = ’GRCh_37,Chromosome,Homo sapiens’)

This command creates an Allele Sequence type annotation track from the input sequence provided in the binary 2bit format.

This command takes three required parameters:

  • input file: the name of the 2bit input file
  • output file: the name of the IDF file to write to. If the specified file exists, the new annotation track will be added to the file.
  • title: the name of the new annotation track.

This command also supports some optional parameters:

  • coordSysId: sets the coordinate system association used by the output annotation track. The coordinate system id links an annotation track to a species and build. Although any string can be used, for best results, the proper format should be applied. The proper format is ’authority,type,species’ where:
    • authority: is the entity responsible for the build often including a build version identifier
    • type: is the coordinate system type, usually ’Chromosome’
    • species: is the species name Many coordinate system ids of this format can be found here:
      http://www.dasregistry.org/das1/coordinatesystem The coordinate system id defaults to the current project or global default, usually ’NCBI_36,Chromosome,Homo sapiens’
  • sequenceBlockSize: specifies the maximum size of the sequence blocks written to the output annotation track. Except in extremely rare circumstances this parameter should never be adjusted.
    • 16384 (default) 214 base pairs. Sequence data will be broken into blocks of this size to balance performance and file size

Creating an Annotation Track from Fasta Data

Syntax

ghi.genomebrowser.importTrackFromFasta(input file, output file, title, [Optional Parameters])

Example

ghi.genomebrowser.importTrackFromFasta(’mySequence.fa’, ’MySequence.idf’, ’My Sequence Track’, coordSysId = ’GRCh_37,Chromosome,Homo sapiens’)

This command creates an Allele Sequence type annotation track from the input sequence provided in the fasta text format.

This command takes three required parameters:

  • input file: the name of the fasta input file
  • output file: the name of the IDF file to write to. If the specified file exists, the new annotation track will be added to the file.
  • title: the name of the new annotation track.

This command also supports some optional parameters:

  • coordSysId: sets the coordinate system association used by the output annotation track. The coordinate system id links an annotation track to a species and build. Although any string can be used, for best results, the proper format should be applied. The proper format is ’authority,type,species’ where:
    • authority: is the entity responsible for the build often including a build version identifier
    • type: is the coordinate system type, usually ’Chromosome’
    • species: is the species name Many coordinate system ids of this format can be found here:
      http://www.dasregistry.org/das1/coordinatesystem The coordinate system id defaults to the current project or global default, usually ’NCBI_36,Chromosome,Homo sapiens’
  • sequenceBlockSize: specifies the maximum size of the sequence blocks written to the output annotation track. Except in extremely rare circumstances this parameter should never be adjusted.
    • 16384 (default) 214 base pairs. Sequence data will be broken into blocks of this size to balance performance and file size

Creating an Annotation Track from Wiggle Data

Syntax

ghi.genomebrowser.importTrackFromWiggle(input file, output file, title, [Optional Parameters])

Example

ghi.genomebrowser.importTrackFromWiggle(’myValues.wig’, ’Values.idf’, ’My Intensity Track’,
coordSysId = ’GRCh_37,Chromosome,Homo sapiens’)

This command creates an Intensity type annotation track from the real values provided in wiggle format.

This command takes three required parameters:

  • input file: the name of the wiggle input file
  • output file: the name of the IDF file to write to. If the specified file exists, the new annotation track will be added to the file.
  • title: the name of the new annotation track.

This command also supports an optional parameter:

  • coordSysId: sets the coordinate system association used by the output annotation track. The coordinate system id links an annotation track to a species and build. Although any string can be used, for best results, the proper format should be applied. The proper format is ’authority,type,species’ where:
    • authority: is the entity responsible for the build often including a build version identifier
    • type: is the coordinate system type, usually ’Chromosome’
    • species: is the species name Many coordinate system ids of this format can be found here:
      http://www.dasregistry.org/das1/coordinatesystem The coordinate system id defaults to the current project or global default, usually ’NCBI_36,Chromosome,Homo sapiens’

Creating an Annotation Track from Delimited Text Data

Syntax

ghi.genomebrowser.importTrackFromDelimitedText(input file, output file, title, [Optional Parameters])

Example

ghi.genomebrowser.importTrackFromDelimitedText(’myData.csv’, ’MyData.idf’, ’My Track’,
coordSysId = ’GRCh_37,Chromosome,Homo sapiens’, schemaType = ’Interval’, halfOpen = False, headerLine = 0, delimiter = ’,’, subDelimiter = ’;’)

This command creates an annotation track from the input file provided in any column delimited text format. The input file should include data columns specifying: Chromosome, Start, and Stop or Chromosome and Position.

This command takes three required parameters:

  • input file: the name of the delimited text input file
  • output file: the name of the IDF file to write to. If the specified file exists, the new annotation track will be added to the file.
  • title: the name of the new annotation track.

This command also supports some optional parameters:

  • coordSysId: sets the coordinate system association used by the output annotation track. The coordinate system id links an annotation track to a species and build. Although any string can be used, for best results, the proper format should be applied. The proper format is ’authority,type,species’ where:
    • authority: is the entity responsible for the build often including a build version identifier
    • type: is the coordinate system type, usually ’Chromosome’
    • species: is the species name Many coordinate system ids of this format can be found here:
      http://www.dasregistry.org/das1/coordinatesystem The coordinate system id defaults to the current project or global default, usually ’NCBI_36,Chromosome,Homo sapiens’
  • schemaType: the schema type defines the way the output annotation track can be used. It affects the way data is imported and is used during auto-configuration when inputs are set. Possible values are:
    • ’Allele Sequence’ a DNA sequence track
    • ’Cytoband’ a cytoband track with styling based on stain
    • ’Gene’ a gene track including exons and codon alignments
    • ’Intensity’ a track composed of real valued intervals
    • ’Interval’ (default) a versatile general purpose track
    • ’Probe’ a SNP probe marker track
  • maxScanFields: specifies the maximum number of input columns to make accessible for field mappings. A limit can improve performance in scanning and importing from inputs with very high column counts.
    • 0 (default) no limit
    • N limit available number of columns to N
  • maxAutoFields: specifies the maximum number of automatic field mappings to create based on input columns A limit can improve performance in scanning and importing from inputs with very high column counts.
    • 0 (default) no limit
    • N limit automatic field mappings to N
  • titleCaseFieldNames: specifies whether the names of automatic field mappings are converted to title case.
    • 0 set field names to column names
    • 1 (default) capitalize the first letter of each word
  • guessFieldTypes: specifies whether the data types of input fields are guessed during auto-configuration while setting inputs.
    • 0 assume all input columns are ’String’ type
    • 1 (default) guess the type of each input column
  • reuseFieldsForConstraints: specifies whether existing field mappings are adjusted to meet field constraints or that creation of new field mappings is always enforced.
    • 0 create a new field mapping for every constraint
    • 1 (default) attempt to use existing mappings to satisfy constraints
  • ignoreConstraints: specifies whether schema type derived field mapping constraints are respected. If constraints are ignored, the output annotation track is much more likely to be unusable.
    • 0 (default) respect constraints and throw errors when attempting an import which is improperly configured
    • 1 ignore constraints, silently complying with any field mappings that have been overlooked
  • halfOpen: specifies whether input interval data is provided using half-open coordinates or not. Half-open coordinates, sometimes referred to as zero-based, precisely define the outside edges of an interval. The smallest possible value is zero and the largest is equal to the size of the containing segment. A zero width interval may be specified using half-open coordinates. The alternative, indexed coordinates, are often referred to as one-based. Indexed coordinates specify base pair positions within a segment starting from one. The largest possible value is equal to the size of the containing segment. The smallest interval which can be specified using indexed coordinates is of width one.
    • 0 input data is provided as indexed intervals or one-based positions
    • 1 (default) input data is provided as half-open intervals or zero-based positions
  • headerLine: specifies the one-based line number in the input file which defines its input column names. All preceding lines will be ignored. This does not apply to spreadsheets as their column names are provided directly.
    • -1 (default) specifies no header, or that the header is located using a different method
    • N the file header is on line N
  • header: specifies a pattern used to match the line in the input file which defines its input column names. The first line matched by the pattern will be designated as the header. All text matched by the pattern will be removed from the header. Any remaining text is used to define the input column names.
    • ” does not match. Specifies no header, or that the header is located using a different method
    • ’ˆ#’ (default) a Perl compatible regular expression that matches lines that begin with ’#’
  • comment: specifies a pattern used to match comment lines in the input file. Comment lines are not data and are ignored.
    • ” (default) does not match. The input file contains no comment lines
  • delimiter: specifies a pattern used to separate column values on each line.
    • t’ (default) the tab character delimits field values
  • subDelimiter: specifies a pattern used to separate list values within a field value.
    • ’,’ (default) the comma character delimits list values

Creating an Annotation Track from Spreadsheet Data

Syntax

ghi.genomebrowser.importTrackFromSpreadsheet(node id, output file, title, [Optional Parameters])

Example

ghi.genomebrowser.importTrackFromSpreadsheet(4, ’MySS.idf’, ’My Spreadsheet Track’,
coordSysId = ’GRCh_37,Chromosome,Homo sapiens’, schemaType = ’Interval’, halfOpen = False, headerLine = 0, subDelimiter = ”)

This command creates an annotation track from the input spreadsheet. The spreadsheet should include a row marker map with position data or columns specifying: Chromosome, Start, and Stop or Chromosome and Position.

This command takes three required parameters:

  • node id: the project node id of the input spreadsheet.
  • output file: the name of the IDF file to write to. If the specified file exists, the new annotation track will be added to the file.
  • title: the name of the new annotation track.

This command also supports some optional parameters:

  • coordSysId: sets the coordinate system association used by the output annotation track. The coordinate system id links an annotation track to a species and build. Although any string can be used, for best results, the proper format should be applied. The proper format is ’authority,type,species’ where:
    • authority: is the entity responsible for the build often including a build version identifier
    • type: is the coordinate system type, usually ’Chromosome’
    • species: is the species name Many coordinate system ids of this format can be found here:
      http://www.dasregistry.org/das1/coordinatesystem The coordinate system id defaults to the current project or global default, usually ’NCBI_36,Chromosome,Homo sapiens’
  • schemaType: the schema type defines the way the output annotation track can be used. It affects the way data is imported and is used during auto-configuration when inputs are set. Possible values are:
    • ’Allele Sequence’ a DNA sequence track
    • ’Cytoband’ a cytoband track with styling based on stain
    • ’Gene’ a gene track including exons and codon alignments
    • ’Intensity’ a track composed of real valued intervals
    • ’Interval’ (default) a versatile general purpose track
    • ’Probe’ a SNP probe marker track
  • maxScanFields: specifies the maximum number of input columns to make accessible for field mappings. A limit can improve performance in scanning and importing from inputs with very high column counts.
    • 0 (default) no limit
    • N limit available number of columns to N
  • maxAutoFields: specifies the maximum number of automatic field mappings to create based on input columns A limit can improve performance in scanning and importing from inputs with very high column counts.
    • 0 (default) no limit
    • N limit automatic field mappings to N
  • titleCaseFieldNames: specifies whether the names of automatic field mappings are converted to title case.
    • 0 set field names to column names
    • 1 (default) capitalize the first letter of each word
  • reuseFieldsForConstraints: specifies whether existing field mappings are adjusted to meet field constraints or that creation of new field mappings is always enforced.
    • 0 create a new field mapping for every constraint
    • 1 (default) attempt to use existing mappings to satisfy constraints
  • ignoreConstraints: specifies whether schema type derived field mapping constraints are respected. If constraints are ignored, the output annotation track is much more likely to be unusable.
    • 0 (default) respect constraints and throw errors when attempting an import which is improperly configured
    • 1 ignore constraints, silently complying with any field mappings that have been overlooked
  • halfOpen: specifies whether input interval data is provided using half-open coordinates or not. Half-open coordinates, sometimes referred to as zero-based, precisely define the outside edges of an interval. The smallest possible value is zero and the largest is equal to the size of the containing segment. A zero width interval may be specified using half-open coordinates. The alternative, indexed coordinates, are often referred to as one-based. Indexed coordinates specify base pair positions within a segment starting from one. The largest possible value is equal to the size of the containing segment. The smallest interval which can be specified using indexed coordinates is of width one.
    • 0 input data is provided as indexed intervals or one-based positions
    • 1 (default) input data is provided as half-open intervals or zero-based positions
  • subDelimiter: specifies a pattern used to separate list values within a field value.
    • ’,’ (default) the comma character delimits list values
  • skipInactiveRows: specifies whether inactive rows are included in the import operation.
    • 0 all rows are imported
    • 1 (default) inactive rows are not imported

Obtaining a 2Bit Annotation Track Importer Object

Syntax

importer object = ghi.genomebrowser.create2BitImporter()

Example

my2BitImporter = ghi.genomebrowser.create2BitImporter()

This command creates and returns a new 2Bit file importer object which can be used to configure and finally initiate one or more annotation track import operations from 2Bit formatted file input(s).

The 2Bit format encodes DNA sequence in compact form using a binary representation that uses only two bits to encode each base pair of sequence data.

The import operations made possible by the importer returned from this command are more customizable than that performed by the importTrackFrom2bit(...) command. The resulting annotation track(s) will be of type Allele Sequence.

Obtaining a Fasta Annotation Track Importer Object

Syntax

importer object = ghi.genomebrowser.createFastaImporter()

Example

myFastaImporter = ghi.genomebrowser.createFastaImporter()

This command creates and returns a new Fasta file importer object which can be used to configure and finally initiate one or more annotation track import operations from Fasta formatted file input(s).

The Fasta format encodes sequence data in text form using a single character to represent each base pair.

The import operations made possible by the importer returned from this command are more customizable than that performed by the importTrackFromFasta(...) command. The resulting annotation track(s) will be of type Allele Sequence.

Obtaining a Wiggle Annotation Track Importer Object

Syntax

importer object = ghi.genomebrowser.createWiggleImporter()

Example

myWiggleImporter = ghi.genomebrowser.createWiggleImporter()

This command creates and returns a new Wiggle file importer object which can be used to configure and finally initiate one or more annotation track import operations from Wiggle formatted file input(s).

The Wiggle format provides multiple methods of specifying real values that correspond to intervals in a segment or chromosome.

The import operations made possible by the importer returned from this command are more customizable than that performed by the importTrackFromWiggle(...) command. The resulting annotation track(s) will be of type Intensity.

Obtaining a Delimited Text Annotation Track Importer Object

Syntax

importer object = ghi.genomebrowser.createDelimitedTextImporter()

Example

myTextImporter = ghi.genomebrowser.createDelimitedTextImporter()

This command creates and returns a new Delimited Text importer object which can be used to configure and finally initiate one or more annotation track import operations from a wide variety of delimited text format file input(s) and/or spreadsheets and their attached marker map data.

The Delimited Text format supports any text file that can represent a grid of data when each row is split into columns using a delimiting pattern. The input files or spreadsheets should include columns specifying: Chromosome, Start, Stop or Chromosome and Position.

The import operations made possible by the importer returned from this command are more customizable than that performed by the importTrackFromDelimitedText(...) or importTrackFromSpreadsheet(...) commands. The resulting annotation track(s) may be of any available type.

Determining the Input Format for an Annotation Track Importer Object

Syntax

format = importer object.format()

Example

format = myImporter.format()

This command returns a string indicating the input format handled by the importer.

Possible values are:

  • ’Delimited Text’ Imports rows of text with multiple values arranged in columns
  • ’Wiggle’ Imports Wiggle format text files
  • ’2Bit’ Imports the 2Bit binary DNA sequence files
  • ’Fasta’ Imports Fasta format text files

Determining the Schema Type of an Annotation Track Importer Object

Syntax

schema type = importer object.schemaType()

Example

type = myImporter.schemaType()

This command returns a string indicating the current schema type of the importer. The schema type affects the way data is imported and can be used to automatically configure some import settings.

Possible values are:

  • ’Allele Sequence’ a dna sequence track
  • ’Cytoband’ a cytoband track with styling based on stain value
  • ’Gene’ a gene track including exon intervals and codon alignments
  • ’Intensity’ a track composed of real valued intervals
  • ’Interval’ a track defined in part or in whole by the data
  • ’Probe’ a SNP probe marker track

The schema type for some importer formats can be changed using the setOptions(...) command.

Setting Import Options on an Annotation Track Importer Object

Syntax

importer object.setOptions([Optional Parameters])

Example

myImporter.setOptions(coordSysId = ’GRCh_37,Chromosome,Homo sapiens’, schemaType = ’Interval’, halfOpen = True, headerLine = 0, comment = ’ˆ//’, delimiter = ’,’ subDelimiter = ’/’)

This command allows for the adjustment of various optional parameters which affect import operations. Note that some options apply only to particular importer formats.

The parameters are as follows:

  • coordSysId: (All formats) sets the coordinate system association used by the output annotation track. The coordinate system id links an annotation track to a species and build. Although any string can be used, for best results, the proper format should be applied. The proper format is ’authority,type,species’ where:
    • authority: is the entity responsible for the build often including a build version identifier
    • type: is the coordinate system type, usually ’Chromosome’
    • species: is the species name

    Many coordinate system ids of this format can be found here:
    http://www.dasregistry.org/das1/coordinatesystem The coordinate system id defaults to the current project or global default, usually ’NCBI_36,Chromosome,Homo sapiens’

  • schemaType: (Delimited Text format) the schema type defines the way the output annotation track can be used. It affects the way data is imported and is used during auto-configuration when inputs are set. Possible values are:
    • ’Allele Sequence’ a dna sequence track
    • ’Cytoband’ a cytoband track with styling based on stain
    • ’Gene’ a gene track including exons and codon alignments
    • ’Intensity’ a track composed of real valued intervals
    • ’Interval’ (default) a versatile general purpose track
    • ’Probe’ a SNP probe marker track
  • styleAlwaysMaps: (Delimited Text format) specifies whether features can be imported without style mappings. Applies only when style rules are defined.
    • 0 (default) features may be created without styles
    • 1 (default for ’Interval’ schema type) features will always be mapped to a style in the style list. If the style list is empty, a default style will be created.
  • maxScanFields: (Delimited Text format) specifies the maximum number of input columns to make accessible for field mappings. A limit can improve performance in scanning and importing from inputs with very high column counts.
    • 0 (default) no limit
    • N limit available number of columns to N
  • maxAutoFields: (Delimited Text format) specifies the maximum number of automatic field mappings to create based on input columns A limit can improve performance in scanning and importing from inputs with very high column counts.
    • 0 (default) no limit
    • N limit automatic field mappings to N
  • titleCaseFieldNames: (Delimited Text format) specifies whether the names of automatic field mappings are converted to title case.
    • 0 set field names to column names
    • 1 (default) capitalize the first letter of each word
  • guessFieldTypes: (Delimited Text format) specifies whether the data types of input fields are guessed during auto-configuration while setting inputs. This does not apply to spreadsheets as their column types are strictly defined.
    • 0 assume all input columns are ’String’ type
    • 1 (default) guess the type of each input column
  • reuseFieldsForConstraints: (Delimited Text format) specifies whether existing field mappings are adjusted to meet field constraints or that creation of new field mappings is always enforced.
    • 0 create a new field mapping for every constraint
    • 1 (default) attempt to use existing mappings to satisfy constraints
  • ignoreConstraints: (Delimited Text format) specifies whether schema type derived field mapping constraints are respected. If constraints are ignored, the output annotation track is much more likely to be unusable.
    • 0 (default) respect constraints and throw errors when attempting an import which is improperly configured
    • 1 ignore constraints, silently complying with any field mappings that have been overlooked
  • halfOpen: (Delimited Text format) specifies whether input interval data is provided using half-open coordinates or not. Half-open coordinates, sometimes referred to as zero-based, precisely define the outside edges of an interval. The smallest possible value is zero and the largest is equal to the size of the containing segment. A zero width interval may be specified using half-open coordinates. The alternative, indexed coordinates, are often referred to as one-based. Indexed coordinates specify base pair positions within a segment starting from one. The largest possible value is equal to the size of the containing segment. The smallest interval which can be specified using indexed coordinates is of width one.
    • 0 input data is provided as indexed intervals or one-based positions
    • 1 (default) input data is provided as half-open intervals or zero-based positions
  • headerLine: (Delimited Text format) specifies the one-based line number in the input file which defines its input column names. All preceding lines will be ignored. This does not apply to spreadsheets as their column names are provided directly.
    • -1 (default) specifies no header, or that the header is located using a different method
    • N the file header is on line N
  • header: (Delimited Text format) specifies a pattern used to match the line in the input file which defines its input column names. This does not apply to spreadsheets as their column names are provided directly. The first line matched by the pattern will be designated as the header. All text matched by the pattern will be removed from the header. Any remaining text is used to define the input column names.
    • ” does not match. Specifies no header, or that the header is located using a different method
    • ’ˆ#’ (default) a Perl compatible regular expression that matches lines that begin with ’#’
  • comment: (Delimited Text format) specifies a pattern used to match comment lines in the input file. This does not apply to spreadsheets. Comment lines are not data and are ignored.
    • ” (default) does not match. The input file contains no comment lines
  • delimiter: (Delimited Text format) specifies a pattern used to separate column values on each line. This does not apply to spreadsheets as column values are provided directly.
    • t’ (default) the tab character delimits field values
  • subDelimiter: (Delimited Text format) specifies a pattern used to separate list values within a field value.
    • ’,’ (default) the comma character delimits list values
  • skipInactiveRows: (Delimited Text format) applies only to spreadsheet inputs. Specifies whether inactive rows are included in the import operation.
    • 0 all rows are imported
    • 1 (default) inactive rows are not imported
  • sequenceBlockSize: (2Bit and Fasta formats) specifies the maximum size of the sequence blocks written to the output annotation track. Except in extremely rare circumstances this parameter should never be adjusted.
    • 16384 (default) 214 base pairs. Sequence data will be broken into blocks of this size to balance performance and file size

Setting a Spreadsheet as Input to an Annotation Track Importer Object

Syntax

importer object.setInputSpreadsheet(spreadsheet id, [include marker map, auto-configure])

Example

myTextImporter.setInputSpreadsheet(4)

This command sets the input spreadsheet for the importer by project node id.

The parameters for this command are as follows:

  • spreadsheet id: the project node id for the input spreadsheet
  • include marker map: (optional) specifies whether the contents of a row marker map applied to the input spreadsheet are made available for import
    • 0 do not include markermap data
    • 1 (default) include markermap data
  • auto-configure: (optional) specifies whether field mappings and styles are automatically assigned based on column names and types
    • 0 require manual setup of field mappings and styles
    • 1 (default) attempt to automatically map expected fields and create default styles based on schema type. Some expected fields may remain unmapped if no likely candidates are discovered. These will have to be filled in manually

Setting Multiple Spreadsheets as Inputs to an Annotation Track Importer Object

Syntax

importer object.setInputSpreadsheets(spreadsheet id list, [include marker maps, auto-configure])

Example

myTextImporter.setInputSpreadsheets([4, 6, 10])

This command sets a list of input spreadsheets for the importer by project node id. Import will work best if all input spreadsheets share the same set of data columns.

The parameters for this command are as follows:

  • spreadsheet id list: list of spreadsheet project node ids as integers
  • include marker map: (optional) specifies whether the contents of row marker maps applied to the input spreadsheets are made available for import
    • 0 do not include markermap data
    • 1 (default) include markermap data
  • autoConfigure: (optional) specifies whether field mappings and styles are automatically assigned based on column names and types
    • 0 require manual setup of field mappings and styles
    • 1 (default) attempt to automatically map expected fields and create default styles based on schema type. Some expected fields may remain unmapped if no likely candidates are discovered. These will have to be filled in manually

Obtaining the List of Input Spreadsheets Assigned to an Annotation Track Importer Object

Syntax

spreadsheet id list = importer object.inputSpreadsheets()

Example

list = myTextImporter.inputSpreadsheets()

This command returns the importer’s current list of input spreadsheet project node ids.

Setting a File as Input to an Annotation Track Importer Object

Syntax

importer object.setInputFile(file name, [auto-configure])

Example

myTextImporter.setInputFile(’c:/data/myData.csv’)

This command sets the input file for the importer by file name.

The parameters for this command are as follows:

  • file name: the file name of the input file to import
  • auto-configure: (optional) specifies whether field mappings and styles are automatically assigned based on detected file column names and types
    • 0 require manual setup of field mappings and styles
    • 1 (default) attempt to automatically map expected fields and create default styles based on schema type. Some expected fields may remain unmapped if no likely candidates are discovered. These will have to be filled in manually

Setting Multiple Files as Inputs to an Annotation Track Importer Object

Syntax

importer object.setInputFiles(file name list, [auto-configure])

Example

myTextImporter.setInputFiles([’c:/data/myData1.csv’, ’c:/data/myData2.csv’])

This command sets a list of input files for the importer by file name. Import will work best if all input files share the same format and data columns.

The parameters for this command are as follows:

  • fileList: list of input file names to import
  • autoConfigure: (optional) specifies whether field mappings and styles are automatically assigned based on detected file column names and types
    • 0 require manual setup of field mappings and styles
    • 1 (default) attempt to generate a list of field mappings based on the input files and schema type. Some constraint fields may remain unmapped if no likely candidates are discovered. These will have to be filled in manually

Obtaining the List of Input Files Assigned to an Annotation Track Importer Object

Syntax

input file list = importer object.inputFiles()

Example

list = myTextImporter.inputFiles()

This command returns the importer’s current list of input file names.

Obtaining the Field Constraint List of an Annotation Track Importer Object

Syntax

constraint list = importer object.constraintList()

Example

list = myTextImporter.constraintList()

This command returns a list of dict items representing the importer’s field constraint list. Each item represents a field constraint. See setConstraintList(...) for information on the format of each item in the list.

Setting an Annotation Track Importer Object’s Field Constraints to Defaults

Syntax

importer object.loadAutoConstraints([schema type])

Example

myImporter.loadAutoConstraints()

This command sets the importer’s constraint list to the defaults for a given schema type.

The parameters for this command are as follows:

  • schemaType: (optional) specifies the schema type for which the default constraint list will be loaded
    • ’Allele Sequence’
    • ’Cytoband’
    • ’Gene’
    • ’Intensity’
    • ’Interval’
    • ’Probe’

    When this parameter is not specified, the current schema type determines which default styles are loaded.

Mapping an Annotation Track Importer Object’s Fields to Constraints

Syntax

importer object.applyConstraints()

Example

myTextImporter.applyConstraints()

This command can be called to attempt a mapping of all the field constraints in the importer’s constraint list to fields in its field list. Some field constraints may remain unmapped if no likely candidates are discovered. Each will result in a new field being defined without a mapping. These mappings will have to be filled in manually before the import operation can succeed.

Defining the Field Constraint List of an Annotation Track Importer Object

Syntax

importer object.setConstraintList(constraint list)

Example

myTextImporter.setConstraintList([{’match’:[’data’], ’name’:’Value’, ’type’:’Real’}])

This command allows the importer’s field constraint list to be set manually. It may be useful to obtain the current field constraint list using the constraintList() command, make a few modifications, and then re-supply it to the importer using this command.

Field constraints are only utilized by some import formats.

The parameters for this command are as follows:

  • constraintList: a list of dict items. Each item defines a field constraint. The order of the items in the list determines the order in which the constraints are resolved in application or testing for satisfaction. Each item may contain the following keys:
    • match: list - specifies a list of patterns to attempt to match against source columns while applying constraints. Each item in the list is a string which represents a Perl compatible regular expression. The order of the patterns in the list defines the order in which a match to a source column is attempted.
    • defaultName: string - specifies the default name of the constrained field. This will be the name of the constraint generated field if no matching input column is found. Otherwise the field name will be unchanged. When defaultName is specified, name should not be.
    • name: string - specifies the name of the constrained field in the output. Any matching input column’s field name found will be changed to match this value. When name is specified, defaultName should not be.
    • type: string - specifies the schema field type that the constrained field should match. If the type of the matched input column can be coerced into this type, it will still match, and be assigned this field type.
      • ’Boolean’ tristate including True, False, and Unknown
      • ’Byte’ any single character
      • ’Integer’ an integer value
      • ’Real’ a floating point value
      • ’Double Real’ a double precision floating point value
      • ’String’ (default) a string
      • ’Boolean List’ a list of ’Boolean’ values
      • ’Byte List’ a list of ’Byte’ values
      • ’Integer List’ a list of ’Integer’ values
      • ’Real List’ a list of ’Real’ values
      • ’Double Real List’ a list of ’Double Real’ values
      • ’String List’ a list of ’String’ values
    • compress: bool - specifies whether the constrained field’s data should be compressed Possible values are:
      • 0 (default) do not compress output data
      • 1 compress output data for this field using gzip compression. Not all field types support compression. compression is inefficient unless the input data is a very long string or list
    • transformList: list - specifies a list of find and replace operations to perform on the constrained field’s values before writing them to the output. Each item in the list is a dict which defines a transform operation. The order of the items in the list defines the order in which the transforms are applied to the field data. Each item may contain the following keys:
      • find: string - specifies a pattern to find in the input data.
      • replace: string - specifies the string used to replace the text matching the find string if it is found.
    • required: bool - specifies whether this constraint must be met in order for import to succeed Possible values are:
      • 0 (default) this constraint need not necessarily be met
      • 1 this constraint must be met
    • inSchema: bool - specifies whether the constrained field will be included in the output schema. Possible values are:
      • 0 do not include this field in the output schema. Defining a field mapping which is not part of the output schema may be useful if its value will be used during the import operation, but will not be desired in the resulting annotation track. The ’Chromosome’, ’Start’ and ’Stop’ fields are common examples, since the interval data they provide is required to define features, but would be redundant to include in the schema
      • 1 (default) include this field in the output schema
    • fixedIndex: int - specifies the one-based schema index the constrained field must occupy in order to satisfy this constraint. Possible values are:
      • 0 (default) any index will work
      • N the specified index is the only valid index
    • reorder: bool - specifies whether the constrained field is pulled toward the start of the schema when a match is found. Possible values are:
      • 0 a field matching this constraint is not relocated and will remain at its original location in the field list. Note that its absolute index may still be adjusted due to other constraints reordering fields
      • 1 (default) a field matching this constraint is moved to the next available position at the beginning of the field list

Obtaining the Field List of an Annotation Track Importer Object

Syntax

field list = importer object.fieldList()

Example

list = myTextImporter.fieldList()

This command returns a list of dict items representing the importer’s field list. Each item represents a field mapping. See setFieldList(...) for information on the format of each item in the list.

Populating an Annotation Track Importer Object’s Fields List Automatically from Input Data

Syntax

importer object.loadAutoFields()

Example

myTextImporter.loadAutoFields()

This command can be called to scan all input spreadsheets or files to determine the available columns and their types. Mappings for all available columns will replace those in the current importer’s field list.

Changing All Field Names in an Annotation Track Importer Object’s Field List at Once

Syntax

importer object.transformFieldNames(transform)

Example

myTextImporter.transformFieldNames({’find’:’[_-]’, ’replace’:’ ’})

This command allows the names of fields in the importer’s field list to be modified by applying the same find/replace operation to all of them.

The parameters for this command are as follows:

  • transform: the find/replace operation to apply to the field names. This dict may contain the following keys:
    • find: string - specifies a perl compatible regular expression to match against each field name
    • replace: string - specifies the string used to replace the text matching the find pattern if it is found.

Defining the Field List of an Annotation Track Importer Object

Syntax

importer object.setFieldList(field list)

Example

myTextImporter.setFieldList([{’name’:’Log R’, ’mapping’:’%logr%’, ’type’:’Real’}])

This command allows the importer’s field list to be set manually. It may be useful to obtain the current field list using the fieldList() command, make a few modifications, and then re-supply it to the importer using this command.

Field mappings are only utilized by some import formats.

The parameters for this command are as follows:

  • fieldList: a list of dict items. Each item defines a field mapping. The order of the items in the list determines the order of the data written to the output of the import operation. Each item may contain the following keys:
    • name: string - specifies the name of the field in the output a field name is required.
    • mapping: string - specifies a template for the field data as written to the output. The template is a string literal which includes variables for replacement. The variables must be of the form ’%col{idx}%’ where ’col’ is the name of an input column and ’idx’ may optionally be used to disambiguate input columns with the same name. ’idx’ is a zero-based integer indicating the position of the column within any input spreadsheet or file. For spreadsheets with included row marker map data, the first column of the marker map is designated as having index 0. The row label index is equal to the number of columns in the row marker map and the index of the first data column is equal to the number of columns in the row marker map plus one. For example, a typical mapping might be ’%chr%’. In this case, the string ’%chr%’ will be replaced with the data in the input column named ’chr’ for each row imported. A more complicated example would be ’%chr%:%pos%’. In this case the output data for rows in chromosome 1 might include ’1:10000’, ’1:10005’ and so on.
    • type: string - specifies the type to convert the input data into. Possible values are:
      • ’Boolean’ tristate including True, False, and Unknown
      • ’Byte’ any single character
      • ’Integer’ an integer value
      • ’Real’ a floating point value
      • ’Double Real’ a double precision floating point value
      • ’String’ (default) a string
      • ’Boolean List’ a list of ’Boolean’ values
      • ’Byte List’ a list of ’Byte’ values
      • ’Integer List’ a list of ’Integer’ values
      • ’Real List’ a list of ’Real’ values
      • ’Double Real List’ a list of ’Double Real’ values
      • ’String List’ a list of ’String’ values
    • compress: bool - specifies whether to compress the data Possible values are:
      • 0 (default) do not compress output data
      • 1 compress output data for this field using gzip compression. Not all field types support compression. compression is inefficient unless the input data is a very long string or list
    • inSchema: bool - specifies whether this field is included in the output schema. Possible values are:
      • 0 do not include this field in the output schema. Defining a field mapping which is not part of the output schema may be useful if its value will be used during the import operation, but will not be desired in the resulting annotation track. The ’Chromosome’, ’Start’ and ’Stop’ fields are common examples, since the interval data they provide is required to define features, but would be redundant to include in the schema
      • 1 (default) include this field in the output schema
    • transformList: list - specifies a list of find and replace operations to perform on the field values before writing them to the output. Each item in the list is a dict which defines a transform operation. The order of the items in the list defines the order in which the transforms are applied to the field data. Each item may contain the following keys:
      • find: string - specifies a pattern to find in the input data.
      • replace: string - specifies the string used to replace the text matching the find string if it is found.

Resetting the Style Map State of an Annotation Track Importer Object

Syntax

importer object.resetStyleMapState()

Example

myImporter.resetStyleMapState()

This command clears any data generated style mappings accumulated during import and resets the next style pointer to the first item in the style wheel.

This is only necessary if the same importer is going to be used for more import operations and needs to be reset between each import.

Setting an Annotation Track Importer Object’s Style Map to Defaults

Syntax

importer object.loadAutoStyles(schema type)

Example

myImporter.loadAutoStyles()

This command sets the style list to default values based on a schema type.

The parameters for this command are as follows:

  • schemaType: (optional) specifies the schema type for which the default style list values will be loaded
    • ’Allele Sequence’
    • ’Cytoband’
    • ’Gene’
    • ’Intensity’
    • ’Interval’
    • ’Probe’

    When this parameter is not specified, the current schema type determines which default styles are loaded.

Obtaining Auto Categorization state from an Annotation Track Importer Object

Syntax

importer object.autoCategorize()

Example

myImporter.autoCategorize()

Returns the current state of the auto categorization feature. See the setAutoCategorize(...) command for information on the function of auto categorization.

The return values are as follows:

  • 0 (default) disable style mapping generation based on field data auto categorization
  • 1 enable style mapping generation based on field data auto categorization

Enabling Auto Styling Based on Field Data for an Annotation Track Importer Object

Syntax

importer object.setAutoCategorize(enabled)

Example

myImporter.setAutoCategorize(True)

This command sets whether style mappings are created based on field data.

The parameters for this command are as follows:

  • value: (optional)
    • 0 (default) disable style mapping generation based on field data
    • 1 enable style mapping generation based on field data. The data in the style field for each row will be interpreted as a category. Each unique category will be assigned a style from the style wheel and imported features will be styled according to their categories. This style mapping feature works best if the style field contains at least two unique values and many rows share exactly the same value for the field. If style rules are defined, auto categorization is attempted only after all rules fail to map a style

Obtaining the Style Field of an Annotation Track Importer Object

Syntax

field index = importer object.styleField()

Example

index = myTextImporter.styleField()

This command returns the current field list index specifying the field used to map features to styles for this importer.

Designating the Style Field for an Annotation Track Importer Object

Syntax

importer object.setStyleField(field index)

Example

myTextImporter.setStyleField(4)

Sets the field used to map features to styles for this importer.

The parameters for this command are as follows:

  • field: specifies the index of a field in the importer field list whose data is used to map imported features to styles

Designating the Style Field for an Annotation Track Importer Object

Syntax

importer object.setStyleField(field index)

OR

importer object.setStyleField(field name)

Example

myTextImporter.setStyleField(4)

OR

myTextImporter.setStyleField(’Category’)

Sets the field used to map features to styles for this importer.

The parameters for this command are as follows:

  • field index: specifies the index of a field in the importer field list whose data is used to map imported features to styles
  • field name: specifies the name of a field in the importer field list whose data is used to map imported features to styles

Obtaining the Style Rule List from an Annotation Track Importer Object

Syntax

style rule list = importer object.styleRuleList()

Example

list = myTextImporter.styleRuleList()

This command returns a list of dict items representing the importer’s style rule list. Each item represents a style rule. See setStyleRuleList(...) for information on the format of each item in the list.

Defining the Style Rule List for an Annotation Track Importer Object

Syntax

importer object.setStyleRuleList(style rule list)

Example

myTextImporter.setStyleRuleList([{’find’:’gpos50’, ’style’:’gpos50’}])

This command allows the importer’s style rule list to be set. It may be useful to obtain the current style rule list (which is always empty by default) using the styleRuleList() command, make a few modifications, and then re-supply it to the importer using this command.

Style rules are only utilized by some import formats.

The parameters for this command are as follows:

  • style rule list: a list of dict items. Each item defines a style rule. The order of the items in the list determines the order in which they are used to attempt a style mapping to each imported feature. Each item must contain the following keys:
    • find: string - specifies the pattern to match in the style field
    • style: int or string or dict - specifies a style in the importer’s style list by index or name. Alternately a new style definition may be provided as a dict. The specified style is mapped to imported features whenever the find pattern matches the data in the style field, unless another rule matches first

Defining the Style List for an Annotation Track Importer Object

Syntax

importer object.setStyleList(style rule list)

Example

myImporter.setStyleList([{’name’:’gpos50’, ’shape’:’box’, ’bgcolor’:’#868686’}])

This command allows the importer’s style list to be set. It may be useful to obtain the current style list using the styleList() command, make a few modifications, and then re-supply it to the importer using this command.

All styles defined in the style list are written to the output annotation track whether a feature is mapped to them or not. Depending on importer configuration, additional styles may be added to this list during import.

The parameters for this command are as follows:

  • style list: a list of dict items. Each item defines a style. The order of the items in the list determines the order in which they are written to the output annotation track. Each item may contain the following keys:
    • name: string - specifies the name of the style
    • bgcolor: string - specifies the style’s background color using html hexadecimal color format in the form: ’#rrggbb’ The first byte after the hash ’rr’ specifies the intensity of the red channel. The second byte ’gg’ specifies the intensity of the green channel. The third byte ’bb’ specifies the intensity of the blue channel. For example ’#00FF00’ indicates bright green.
    • mincolor: string - specifies the style’s color for minimum values using html hexadecimal color format in the form ’#rrggbb’
    • maxcolor: string - specifies the style’s color for maximum values using html hexadecimal color format in the form ’#rrggbb’
    • shape: string - specifies the style’s shape. Possible values include:
      • ’box’ - a rectangle
      • ’triangle’ - a triangle

Obtaining the Style List from an Annotation Track Importer Object

Syntax

style list = importer object.styleList()

Example

list = myImporter.styleList()

This command returns a list of dict items representing the importer’s style list. Each item represents a style. See setStyleList(...) for information on the format of each item in the list.

Defining the Style Wheel for an Annotation Track Importer Object

Syntax

importer object.setStyleWheel(style list)

Example

myImporter.setStyleWheel([{’name’:’Style A’, ’bgcolor’:’#FF0000’}, {’name’:’Style B’, ’bgcolor’:’#0000FF’}])

This command allows the importer’s style wheel to be set. It may be useful to obtain the current style wheel using the styleWheel() command, make a few modifications, and then re-supply it to the importer using this command.

The style wheel is used by the auto categorization feature. If the style wheel is empty, a builtin set of GHI plot colors is used as the style wheel.

The parameters for this command are as follows:

  • style list: a list of dict items. Each item defines a style. The order of the items in the list determines the order in which they are assigned to new categories. See setStyleList(...) for information on the format of each item in the list.

Obtaining the Style Wheel from an Annotation Track Importer Object

Syntax

style wheel = importer object.styleWheel()

Example

list = myImporter.styleWheel()

This command returns a list of dict items representing the importer’s style wheel. Each item represents a style. See setStyleList(...) for information on the format of each item in the list.

Obtaining the Output File Name from an Annotation Track Importer Object

Syntax

file name = importer object.outputFile()

Example

file = myImporter.outputFile()

This command returns the current file name setting which specifies where this importer’s next output annotation track will be created.

Defining the Output File Name for an Annotation Track Importer Object

Syntax

importer object.setOutputFile(file name)

Example

myImporter.setOutputFile(’mytrack.idf’)

Sets the file name of the output annotation file to create. If the specified file already exists, the new annotation track will be added to the file. This file name should end in ’.idf’ by convention and for convenience of use within SVS. If no path is provided, the file will be created inside the current open project.

The parameters for this command are as follows:

  • file: the output file name

Obtaining the Output Track Name from an Annotation Track Importer Object

Syntax

title = importer object.trackTitle()

Example

title = myImporter.trackTitle()

This command returns the specified title which will be written to the output annotation track.

Defining the Output Track Name for an Annotation Track Importer Object

Syntax

importer object.setTrackTitle(title)

Example

myImporter.setTrackTitle(’My Track’)

This command sets the title of the output annotation track.

The parameters for this command are as follows:

  • title: the name of the output track

Obtaining the Coordinate System Identifier from an Annotation Track Importer Object

Syntax

coordinate system id = importer object.coordSysId()

Example

coords = myImporter.coordSysId()

This command returns the coordinate system id which will be associated to the output annotation track. See setCoordSysId(...) for information about the coordinate system id format.

Setting the Coordinate System Identifier for an Annotation Track Importer Object

Syntax

importer object.setCoordSysId(coordinate system id)

Example

myImporter.setCoordSysId(’GRCh_37,Chromosome,Homo sapiens’)

Sets the coordinate system association used by the output annotation track. The coordinate system id is used by SVS to associate each annotation track with its appropriate species and build. Although any string can be used, for best results, the proper format should be applied. The proper format is ’authority,type,species’ where:

  • authority: is the entity responsible for the specified build often including a build version identifier, such as: ’NCBI_36’ in the case of hg18.
  • type: is the coordinate system type, usually ’Chromosome’
  • species: is the species name, such as: ’Homo sapiens’ in the case of hg18 or other human builds.

Many coordinate system ids of this format can be found in this xml list:
http://www.dasregistry.org/das1/coordinatesystem

The parameters for this command are as follows:

  • coordinate system id: the coordinate system id to associate with the output track. If unset, the importer will use the current project or global default, usually ’NCBI_36,Chromosome,Homo sapiens’

Obtaining the UUID Setting from an Annotation Track Importer Object

Syntax

uuid = importer object.uuid()

Example

uuid = myImporter.uuid()

This command returns the universally unique identifier (UUID) which will be assigned to the output annotation track. The returned string may be empty which indicates that a new UUID will be generated for each track imported. See setUuid(...) for information on UUIDs.

Specifying a UUID Setting for an Annotation Track Importer Object’s Output

Syntax

importer object.setUuid(uuid)

Example

myImporter.setUuid(’My Non-Standard Uuid’)

This command allows a the universally unique identifier (UUID) assigned to the output annotation track to be manually overridden.

The parameters for this command are as follows:

  • uuid: a universally unique identifier to assign to tracks imported by this importer. Care should be taken not to assign identical UUIDs to more than one annotation track. If unset, the importer will automatically generate a UUID for each imported annotation track.

Importing an Annotation Track Using an Annotation Track Importer Object

Syntax

importer object.write()

OR

importer object.write(output file, title, coordinate system id)

Example

myImporter.write()

OR

myImporter.write(’mytrack.idf’, ’My Track’, ’GRCh_37,Chromosome,Homo sapiens’)

Initiates the import process, writing all imported data to the specified output annotation track. This command should be called after all desired importer configuration has been completed.

The parameters for this command are as follows:

  • output file: sets the output file name
  • title: sets the output annotation track title
  • coordinate system id: (optional) sets the coordinate system id associated with the output annotation track. See setCoordSysId(...) for information about the coordinate system id format

Obtaining the Most Recent Status Message from an Annotation Track Importer Object

Syntax

status message = importer object.statusMessage()

Example

status = myImporter.statusMessage()

This command returns the status message from the last import operation. If the string indicates successful completion, the import finished successfully.

Obtaining the Most Recent Error Message from an Annotation Track Importer Object

Syntax

error message = importer object.errorMessage()

Example

error = myImporter.errorMessage()

This command returns the error message from the last import operation. If the string is empty, no error occurred.

Checking Whether the Most Recent Import of an Annotation Track Importer Object was Canceled

Syntax

canceled = importer object.canceledByUser()

Example

canceled = myImporter.canceledByUser()

This command returns whether the importer’s last import operation was canceled by the user.

  • 0 the import was not canceled by the user
  • 1 the import was canceled by the user
Commands for Accessing Genome Browser Annotation Files

The Python interface can be used to query annotation tracks from either local files or network files to retrieve information such as genes or sequences. The following commands are used to obtain a genome browser object and query the files.

Get a List of Local Annotation Sources

Syntax

python list of lists = ghi.genomebrowser.localSourceList()

Example

myList = ghi.genomebrowser.localSourceList()

This command returns a list of the tracks found in the project annotation directory. Each item of the list is itself a list corresponding to one track. Information included in each list for a track is:

  • track name
  • a short description including a short track name and source
  • coordinate system including build, data type and species
  • track type such as ‘cyto’, ‘gene’, etc.

Get a List of Annotation Sources from a URL

Syntax

python list of lists = ghi.genomebrowser.sourceListForUrl(name of IDF file or DAS server URL)

Example

myList = ghi.genomebrowser.sourceListForUrl(‘cytobands_ucsc(NCBI_36).idf’)

OR

Example

myList = ghi.genomebrowser.sourceListForUrl(‘http://data.goldenhelix.com/das’)

This command returns a list of annotation tracks or URLs from the specified source. Each item in the returned list of lists corresponds to one track. Information included for each track is:

  • track name
  • a short description including a short track name and source
  • coordinate system including build, data type and species
  • track type such as ‘generic’, ‘probe’, etc.

Get Item Keys for the Specified Source

Syntax

python list = ghi.genomebrowser.itemKeysForSource(annotation source)

Example

myList = ghi.genomebrowser.itemKeysForSource(‘cytobands_ucsc(NCBI_36).idf:1’)

OR

Example

myList = ghi.genomebrowser.sourceListForUrl(‘http://data.goldenhelix.com/das/Cytobands-UCSC_GRCh_37_Homo_sapiens’)

This command returns a list of data field names (keys) available from the annotation source.

The one parameter for this command is:

  • url: the name of an IDF file in the project annotations directory such as ‘cytobands_ucsc(NCBI_36).idf:1’ or a DAS source URL, such as ‘http://data.goldenhelix.com/das/Cytobands-UCSC_GRCh_37_Homo_sapiens’.

Get Information for an Item in a Specified Range

Syntax

python list = ghi.genomebrowser.featureItemInRange(url, item key, chromosome,start position,stop position, show progress)

Example

myList = ghi.genomebrowser.featureItemInRange(‘refSeq_genes_ucsc(NCBI_36).idf:1’, ‘Name’,‘1’, 64402613, 64402614, 1)

This command returns a list of values for the specified field and genomic range (chromosome, start position and end position).

This command requires the following parameters in this order:

  • URL: The name of an IDF file in the project annotations directory such as ‘cytobands_ucsc(NCBI_36).idf:1’ or a DAS source URL, such as ‘http://data.goldenhelix.com/das/Cytobands-UCSC_GRCh_37_Homo_sapiens’.
  • Item key: A data field name available from the selected track.
  • Chromosome: Chromosome number for genomic region for the query.
  • Start position: Start position number for genomic region
  • End position: End position number for genomic region

The last parameter is optional:

  • Show progress: (Optional) Controls the presence of the status bar.
    • 0: Do not show a status bar
    • 1 (default): Show a status bar

NOTE:

  • Positions are specified in half-open zero based coordinates, i.e. for position 1, specify start = 0 and end = 1.

Get Information for an Item in a List of Ranges

Syntax

python list = ghi.genomebrowser.featureItemInRanges(url, item key, ranges, show progress)

Example

myList = ghi.genomebrowser.featureItemInRange(‘refSeq_genes_ucsc(NCBI_36).idf:1’, ‘Name’, myRangesList, 1)

This command returns a list of values for the specified field and list of genomic range (chromosome, start position and end position).

This command requires the following parameters in this order:

  • URL: The name of an IDF file in the project annotations directory such as ‘cytobands_ucsc(NCBI_36).idf:1’ or a DAS source URL, such as ‘http://data.goldenhelix.com/das/Cytobands-UCSC_GRCh_37_Homo_sapiens’.
  • Item key: A data field name available from the selected track.
  • Ranges: A list of 3-tuples: [Chr, Start, Stop] in that order representing a list of genomic ranges for the query. The ranges list should not contain overlapping intervals.

The last parameter is optional:

  • Show progress: (Optional) Controls the presence of the status bar.
    • 0: Do not show a status bar
    • 1 (default): Show a status bar

NOTE:

  • Positions are specified in half-open zero based coordinates, i.e. for position 1, specify start = 0 and end = 1.

Get Information for a List of Items in a Specified Range

Syntax

python list = ghi.genomebrowser.featureListInRange(url, item keys, chromosome,start position,stop position, show progress)

Example

myList = ghi.genomebrowser.featureListInRange(‘refSeq_genes_ucsc(NCBI_36).idf:1’, [‘Name’,‘ExonCount’],‘1’, 64402613, 64402614, 1)

This command returns a list of values for the specified list of fields and genomic range (chromosome, start position and end position).

This command requires the following parameters in this order:

  • URL: The name of an IDF file in the project annotations directory such as ‘cytobands_ucsc(NCBI_36).idf:1’ or a DAS source URL, such as ‘http://data.goldenhelix.com/das/Cytobands-UCSC_GRCh_37_Homo_sapiens’.
  • Item keys: A list of data field names available from the selected track.
  • Chromosome: Chromosome number for genomic region for the query.
  • Start position: Start position number for genomic region
  • End position: End position number for genomic region

The last parameter is optional:

  • Show progress: (Optional) Controls the presence of the status bar.
    • 0: Do not show a status bar
    • 1 (default): Show a status bar

NOTE:

  • Positions are specified in half-open zero based coordinates, i.e. for position 1, specify start = 0 and end = 1.

Get Information for a List of Items in a List of Ranges

Syntax

python list = ghi.genomebrowser.featureListInRange(url, item keys, ranges, show progress)

Example

myList = ghi.genomebrowser.featureListInRange(’refSeq_genes_ucsc(NCBI_36).idf:1’, [‘Name’,‘ExonCount’],myRangesList, 1)

This command returns a list of values for the specified list of fields and list of genomic ranges (chromosome, start position and end position).

This command requires the following parameters in this order:

  • URL: The name of an IDF file in the project annotations directory such as ‘cytobands_ucsc(NCBI_36).idf:1’ or a DAS source URL, such as ‘http://data.goldenhelix.com/das/Cytobands-UCSC_GRCh_37_Homo_sapiens’.
  • Item keys: A list of data field names available from the selected track.
  • Ranges: A list of 3-tuples: [Chr, Start, Stop] in that order representing a list of genomic ranges for the query. The ranges list should not contain overlapping intervals.

The last parameter is optional:

  • Show progress: (Optional) Controls the presence of the status bar.
    • 0: Do not show a status bar
    • 1 (default): Show a status bar

NOTE:

  • Positions are specified in half-open zero based coordinates, i.e. for position 1, specify start = 0 and end = 1.

Get Information for All Items in a Specified Range

Syntax

python list = ghi.genomebrowser.featureDictInRange(url, chromosome,start position,stop position, show progress)

Example

myList = ghi.genomebrowser.featureDictInRange(’cytobands_ucsc(NCBI_36).idf:1’,‘1’, 64402613, 64402614, 1)

This command returns a list of dictionaries where the keys of the dictionaries are the fields for the specified annotation track for the specified genomic range (chromosome, start position and end position).

Example output:
[{‘Stain’:‘gpos50’, ‘Cytoband’:‘p26.1’},{‘Stain’:‘gpos50’,‘Cytoband’:‘p26.1’}]

This command requires the following parameters in this order:

  • URL: The name of an IDF file in the project annotations directory such as ‘cytobands_ucsc(NCBI_36).idf:1’ or a DAS source URL, such as ‘http://data.goldenhelix.com/das/Cytobands-UCSC_GRCh_37_Homo_sapiens’.
  • Chromosome: Chromosome number for genomic region for the query.
  • Start position: Start position number for genomic region
  • End position: End position number for genomic region

The last parameter is optional:

  • Show progress: (Optional) Controls the presence of the status bar.
    • 0: Do not show a status bar
    • 1 (default): Show a status bar

NOTE:

  • Positions are specified in half-open zero based coordinates, i.e. for position 1, specify start = 0 and end = 1.

Get Information for All Items in a List of Ranges

Syntax

python list = ghi.genomebrowser.featureDictInRange(url, ranges, show progress)

Example

myList = ghi.genomebrowser.featureDictInRange(’cytobands_ucsc(NCBI_36).idf:1’, myRangesList, 1)

This command returns a list of dictionaries where the keys of the dictionaries are the fields for the specified annotation track for the specified list of genomic ranges (chromosome, start position and end position).

Example output:
[{‘Stain’:‘gpos50’, ‘Cytoband’:‘p26.1’},{‘Stain’:‘gpos50’,‘Cytoband’:‘p26.1’}]

This command requires the following parameters in this order:

  • URL: The name of an IDF file in the project annotations directory such as ‘cytobands_ucsc(NCBI_36).idf:1’ or a DAS source URL, such as ‘http://data.goldenhelix.com/das/Cytobands-UCSC_GRCh_37_Homo_sapiens’.
  • Ranges: A list of 3-tuples: [Chr, Start, Stop] in that order representing a list of genomic ranges for the query. The ranges list should not contain overlapping intervals.

The last parameter is optional:

  • Show progress: (Optional) Controls the presence of the status bar.
    • 0: Do not show a status bar
    • 1 (default): Show a status bar

NOTE:

  • Positions are specified in half-open zero based coordinates, i.e. for position 1, specify start = 0 and end = 1.
Reading an Annotation Track

Initialize a Track Reader

Syntax

new track reader = ghi.genomebrowser.openForRead(url)

Example

myReader = ghi.genomebrowser.openForRead(‘cytobands_ucsc(NCBI_36).idf:1’)

This command returns a reader object useful for reading features from a track. The url parameter describes the location of the track, which may be either

Get the Genome Map Coordinate System Identifier for an Annotation Track

Syntax

new variable = track reader.coordSysId()

Example

idString = myReader.coordSysId()

Output

idString == u’NCBI_36,Chromosome,Homo sapiens’

This command returns an identifier string that describes the genome for the track. The identifier format is based on the standard set by http://www.dasregistry.org, where a genome is defined as a trio strings: Authority + version, type, and organism. Examples of an authority plus version are ‘NCBI_36’, and ‘GRCh_37’. Example types are ‘Chromosome’, ‘Contig’, and ‘Scaffold’. Example organisms are ‘Homo sapiens’ and ‘Canis familiaris’. Many coordinate systems currently in use can be found at http://www.dasregistry.org/listCoords.jsp.

Get the Genomic Space Covered by an Annotation Track

Syntax

region list = track reader.coverageSpace()

Example

regionList = myReader.coverageSpace()

Output

regionList == [[u’1’, 1, 247249719], [u’2’, 1, 242951149], [u’3’, 1, 199501827], [u’4’, 1, 191273063], [u’5’, 1, 180857866], [u’6’, 1, 170899992], [u’7’, 1, 158821424], [u’8’, 1, 146274826], [u’9’, 1, 140273252], [u’X’, 1, 154913754], [u’Y’, 1, 57772954], [u’10’, 1, 135374737], [u’11’, 1, 134452384], [u’12’, 1, 132349534], [u’13’, 1, 114142980], [u’14’, 1, 106368585], [u’15’, 1, 100338915], [u’16’, 1, 88827254], [u’17’, 1, 78774742], [u’18’, 1, 76117153], [u’19’, 1, 63811651], [u’20’, 1, 62435964], [u’21’, 1, 46944323], [u’22’, 1, 49691432]]

This command returns a list of genomic regions, each of which is a list with chromosome, start, and stop values. The list contains exactly one region per chromosome and the regions cover all the data found within the track.

Direct the Genome Browser to Cache a Network Annotation Track

Syntax

track reader.doFullDownload()

Example

myReader.doFullDownload()

This command instructs the reader to pre-read the entire track, thus storing any remote data to the local disk cache. This is useful to ensure that data is read from the track in a consistent manner. For tracks from local files this function performs no action.

Get a Dictionary of Field Name to Feature Index

Syntax

new dictionary = track reader.fieldIndexMap()

Example

myMap = myReader.fieldIndexMap()

The field index map is a dictionary of field names to the index within a feature. Each call to next() returns a feature as a python list containing all the fields contained in the track or some subset the reader may be configured for. See Reading an Annotation Track for more details.

Get an Ordered List of Field Names

Syntax

pyton list = track reader.fieldList()

Example

tracklist = myReader.fieldList()

This command returns a list of the field names in the order they are defined in the track’s schema.

See if a Track Has More Data

Syntax

truth value = track reader.hasNext()

Example

while myReader.hasNext(): print myReader.next()

This command returns true if a subsequent call to next() will return a valid feature.

Read the Next Feature From a Track Reader

Syntax

feature = track reader.next()

Example

feature = myReader.next()

Returns the next feature in the track, sorted by the genomic position of the feature. Returns a null if there is no more data in the track. Each feature is returned in the form of a Python list containing all or some of the fields defined in the track schema (see Reading an Annotation Track). The content and field ordering of a feature is in corresponding order to the track schema, but can be modified by the field list parameter to read() (see Reading an Annotation Track) or to readFields() (see Reading an Annotation Track).

Verify a Track is Completely Downloaded

Syntax

truth value = track reader.isDownloaded()

Example

ok = myReader.isDownloaded()

This query returns true if all the data related to this track has been cached to the local disk. For tracks that are from local files, this always returns true.

Find the Index of a Field by Name

Syntax

index = track reader.indexOf(field name)

Example

idx = myReader.indexOf(‘Name’)

This query returns the index by name of a field in a feature returned by next().

Get a Block of Features Together

Syntax

feature list = track reader.nextFeatureSet()

Example

featureList = myReader.nextFeatureSet()

This query returns a block of features together. A useful attribute of this query is that any two features that overlap will be grouped together. It should not be assumed, however, that all features in a set will be mutually overlapping, or necessarily overlap any other features.

Estimate How Much of an Annotation Track is Read

Syntax

value = track reader.percentDone()

Example

value = myReader.percentDone()

This query returns a value between 0 and 100 representing an estimate of the amount of genomic space covered by all of the features returned by next() or nextFeatureSet(). This estimate may be inaccurate, as data may not be uniformly distributed throughout the genomic space of the track.

Initialize an Annotation Track Reader

Syntax

track reader.read()

OR

track reader.read(chromosome, start position, stop position, field list)


OR

track reader.read(region list, field list)


Example

track reader.read()

OR

track reader.read(’1’, 10000, 20000, [‘Name’, ‘Value’])


OR

track reader.read([[’1’,10000,20000],[’1’,30000,40000]], [‘Name’, ‘Value’])


This command initializes the track reader to read either the entire track (when calling read() with no parameters) or to read features that overlap with the specified region(s).

The region list parameter is a list of regions, where each region contains at least a chromosome identifier, but can also contain start and stop positions, i.e. [[’1’],[’2’, 300, 400]] will read all of chromosome ’1’, followed by any features in chromosome ’2’ that overlap the interval (300,400).

The optional field list parameter is an ordered list of names of fields specifying the structure of features returned by next(). To read a whole track while specifying the feature structure use readFields() (see Reading an Annotation Track).

Select Fields to Read From an Annotation Track

Syntax

track reader.readFields(field name list)

Example

myReader.readFields([‘Name’,‘Value’])

This command initializes the reader to read the whole track (equivalent to calling read()), and orders the data in the features returned by next() by name.

Examine the Schema Description of an Annotation Track

Syntax

schema string = track reader.schema()

Example

text = myReader.schema()

Output

text == u’Name=stValue=f’

A schema is a tab delimited set of field descriptions which specify the name and data type of the field. A descriptor such as ’Value=f’ specifies a single field, called ’Value’ that is single precision real. The available data types are:

  • ? - Boolean
  • b - Byte
  • s - String (null terminated)
  • i - Integer (4 bytes)
  • f - Single precision real (4 bytes)
  • f8 - Double precision real (8 bytes)

An example schema describing two fields, one called ‘Name’, of a string, and the other called ‘Value’, of a single precision real number would be the string: ‘Name=stValue=f’. A field can also be described as a list type, using the ‘@’ symbol prior to the dat type, i.e. ’Values=@f’. Field compression can be enabled by prefixing the type with ‘z’, i.e. ‘Values=z@f’. Note that for fields with less than several kilobytes of data, this feature may actually increase the output file size.

There are a variety of types defined which represent certain expectations in the data format exclusively for the sake of visualization. The formats are:

  • Allele Sequence - ‘Data=s’ or ’Data=zs’
  • Cytoband - ‘Cytoband=stStain=s’
  • Gene - ‘Name=stStrand=stCodon Start=itCodon Stop=itExon Starts=@itExon Stops=@i’
  • Interval - Anything, but the first value is used for the name
  • Intensity - ‘Value=f’ or ‘Value=f8’ or ‘Value=i’
  • Probe - ‘Name=stObserved=s’, Observed field contents should be of the form ’A/C’

Set the filter for a Track Reader

Syntax

new variable = track reader.setFilter(filter text)

Example

myReader.setFilter(‘Value < 0.5’)

This command replaces any existing filters with the filter provided. Returns true if the filter was parsed correctly.

A filter is a clause of the format ‘[field name] [comparison] [value]’, i.e.‘Name contains BRC’, or ‘Value < 0.5’. Valid comparisons are: ‘contains’, ‘equals’ (=,==,eq), ‘less than’ (<,lt), ‘greater than’ (>,gt), ‘lt eq’ (<=,lteq), ‘gt eq’ (>=,gteq).

Each clause can be combined with another clause via ‘OR’ or ‘AND’, i.e. ‘Value > 0.4 AND Value < 0.6’, ‘Name contains BRC OR Value < 0.5’.

As a shortcut, values can be comma separated to test multiple values, i.e. ‘Name contains BRC,LOC’ and ‘Name contains BRC OR Name contains LOC’ are equivalent clauses.

Omitting the value key will automatically apply the filter to all fields with applicable types in the track. Given a track with a field ‘Name’, the filter ‘contains Y’ will match all records in chromosome ‘Y’ as well as any records where the Name field contains the string ‘Y’.

Set the filters for a Track Reader

Syntax

new variable = track reader.setFilters(filter list)

Example

ok = myReader.setFilters([‘Name CONTAINS BRC’, ‘Value < 0.5’])

This command replaces any existing filters with the filters provided. Returns true if all the filters were parsed correctly. See Reading an Annotation Track for more details.

Add a filter to Track Reader

Syntax

new variable = track reader.addFilter(filter text)

Example

ok = myReader.addFilter(‘Value < 0.5’)

This command adds a filter to the set of existing filters. Returns true if all the filters were parsed correctly. Set Reading an Annotation Track for more details on track filters.

Retrieve the Title of an Annotation Track

Syntax

new variable = track reader.title()

Example

text = myReader.title()

This query returns the title of the track.

Retrieve the Type of an Annotation Track

Syntax

new variable = track reader.type()

Example

text = myReader.type()

This query returns the type specifier used by the visualization system to determine how to draw the track’s data. See Reading an Annotation Track for details about the schema’s of the various types.

Retrieve the Universally Unique Identifier for an Annotation Track

Syntax

new variable = track reader.uuid()

Example

text = myReader.uuid()

Output

text == u’{b754a7f7-74ad-44ce-9591-5429ff127f86}

This query returns the Universally Unique Identifier (UUID) for the track. This field is useful within the application to identify multiple instances of the same track.

Stream Features to an Annotation Track Writer

Syntax

track reader.write(track writer)

Example

myReader.write(myWriter)

This command is a convenience function that is equivalent to:

while myReader.hasNext(): writer.writeFeature(myReader.next())

Writing to an Annotation Track

Create A New Annotation Track

Syntax

new writer object = ghi.genomebrowser.createTrack(file name, title, type, coordSysId, schemaDesc, uuid)

Example

myReader = ghi.genomebrowser.createTrack(‘conservationTrack.idf’, ‘Mammalian Conservation’, ‘Intensity’, ‘NCBI_36,Chromosome,Homo sapiens’, ‘Conservation=f’, ‘{b754a7f7-74ad-44ce-9591-5429ff127f86}’)

This command creates a new annotation track in a specific IDF file, creating the file if it does not exist.

The file name may be either a relative or an absolute file name. A relative file name will be understood to be in the annotations folder of the active project. Calling this function with a relative name when a project is not open will result in an error. Absolute filenames (such as ’C:/temp/testFile.idf’) can be used as well. For convenience, certain file locations can be described using macros:

  • ‘%PROJECTPATH%’ - References the annotation folder in the active project directory.
  • ‘%DATAPATH%’ - References the annotation folder in the user data directory.
  • ‘%SYSTEMPATH%’ - References the genome maps folder in the application’s installation directory.

To reference a file in the system directory, for example: ‘%SYSTEMPATH%/cytobands_ucsc(NCBI_36).idf’.

The title of the track is the name to be displayed when tracks are listed.

The schema parameter is a tab delimited set of field descriptions which specify the name and data type of the field. See Reading an Annotation Track for more detail.

The optional coordSysId parameter defines the coordinate system identifier. See Reading an Annotation Track. By default the coordSysId is ‘NCBI_36,Chromosome,Homo sapiens’.

The optional uuid parameter enables overriding the creation of a new Universally Unique Identifier for the new track. By assigning this parameter, two tracks can be identified as being equivalent. Most use of this parameter is internal to the application, and it can be safely ignored.

Opening An Existing Track For Writing

Syntax

track writer object = ghi.genomebrowser.openForWrite(file name)

Example

myWriter = ghi.genomebrowser.openForWrite(‘outputFile.idf’)

This command will return a writer for an existing track, allowing new features to be added to the track.

The file name can be either a relative or absolute file name. See Writing to an Annotation Track for more on track file names.

Write a Feature to an Annotation Track

Syntax

track writer.writeFeature(chromosome, start, stop, value, style id)

OR

track writer.writeFeature(chromosome, start, stop, value list, style id)


Example

myWriter.writeFeature(‘1’, 1923450, 1923460, 0.34, 1)

OR

myWriter.writeFeature(‘1’, 1923450, 1923460, [‘p5.1’, ‘gneg’], 1)


This command writes a feature to the annotation track.

The value parameter can be a non-list type for tracks with schema’s containing only a single value. For more complex schemas, data must be provided as a list that is ordered according to the schema description.

The style id parameter references a style record in the style table of the track. See Writing to an Annotation Track for more detail.

Write a Style Record to an Annotation Track

Syntax

track writer.writeStyle(style id, style key, style value)

Example

myWriter.writeStyle(1, ‘shape’: ‘box’)

This command assigns style data to a style id.

Flush Track Data to Disk

Syntax

track writer.flush()

Example

myWrite.flush()

This command blocks execution from returning until all data has been written to disk.

Using Progress or Status Dialogs

Initializing a Progress Dialog

Syntax

progress dialog object = ghi.progressDialog(dialog caption, total number of progress increments, allow cancel)

Example

myProgress = ghi.progressDialog(“Please Wait”, 100, 1)

This method will create a progress dialog which can be used to display the progress of a certain task and to signal the cancellation of a process. There are two required arguments for this method, and one optional argument. The first argument specifies the text to be displayed on the progress dialog. The second argument defines the number of progress increments for the progress dialog.

The third, optional, argument has two possible settings:

  • 0: do not allow the process to be canceled
  • 1 (default): allow the process to be canceled

Initializing a Status Dialog

Syntax

status dialog object = ghi.statusDialog(status caption, allow cancel)

Example

myStatus = ghi.statusDialog(“Calculating...”, 1)

This method will create a status dialog which can be used to indicate a process is underway and to signal the cancellation of a process. There are no required arguments for this method, and two optional arguments. The first optional argument specifies the text to be displayed on the status dialog. The second argument indicates if a Cancel button is included on the dialog to allow the process to be canceled.

  • 0 (default): do not allow the process to be canceled
  • 1: allow the process to be canceled

Setting the Progress Dialog’s Progress

Syntax

progress dialog object.setProgress(progress count)

Example

count += 1
myProgress.setProgress(count)

This command takes an integer value between 0 and the number of progress increments specified in the creation of the progress dialog object. This advances the progress shown in the dialog.

Check to See if the Progress Dialog is Hidden

Syntax

new variable = progress dialog object.isHidden()

Example

myVariable = myProgress.isHidden()

Checks to see if the progress or status dialog is hidden. Returns a 1 if the dialog is hidden and a 0 if it is not hidden.

Check to See if the Progress Dialog was Canceled

Syntax

new variable = progress dialog object.wasCanceled()

Example

myVariable = myProgress.wasCanceled()

Checks to see if the progress or status dialog was canceled. Returns a 1 if the dialog was canceled and a 0 if the dialog is still active.

Set the Progress Mode

Syntax

progress dialog option.setProgressMode(number of steps, new progress label, show cancel indicator, keep step history indicator)

Example

myProgress.setProgressMode(100, “Progress Step 2”, 1, 0)

Updates the progress dialog to keep the dialog active for longer than was originally set during the creation of the progress dialog option. This could also be used to change a current status dialog to a progress dialog. There one required parameter and three optional parameters. The required parameter is the number of steps to use for incrementing the updated progress dialog. The three optional parameters are detailed below.

  • Optional second parameter: Specify a new label string for the progress dialog
  • Optional third parameter: Allow the process to be canceled
    • 0: do not allow cancel
    • 1 (default): do allow cancel
  • Optional fourth parameter: Indicates whether the cumulative step history from the creation of the progress dialog object should be used for the incrementing of the progress
    • 0: do not keep the step history
    • 1: do keep the step history

Set the Status Mode

Syntax

status dialog option.setStatusMode(new status label, show cancel indicator, keep step history indicator)

Example

myStatus.setStatusMode(“Still Working...”, 1, 1)

Renews the status dialog to keep the dialog active with a new message and options, or converts a progress dialog to a status dialog. There are no required parameters and three optional parameters. The three optional parameters are detailed below.

  • Optional first parameter: Specify a new label string for the status dialog
  • Optional second parameter: Allow the process to be canceled
    • 0 (default): do not allow cancel
    • 1: do allow cancel
  • Optional third parameter: Indicates whether the cumulative step history from the creation of the status dialog object should be kept
    • 0: do not keep the step history
    • 1: do keep the step history

Set up a Double Progress Bar Dialog

Syntax

progress dialog object.setDoubleProgressMode(auto increment indicator, total number of steps, new dialog label, show cancel indicator, keep step history indicator)

Example

myProgress.setDoubleProgressMode(1, 1000, “New Progress Step 2”, 1, 1)

Updates the progress dialog to add an additional progress bar for a sub-operation. The parameters for this command are detailed below.

  • First parameter: Indicate whether the progress bar should auto increment.
    • 0: do not auto increment
    • 1: auto increment the primary progress bar when the secondary progress reaches 100%.
  • Second parameter: Set the number of steps for the secondary progress bar.
  • Third parameter: Specify a new label string for the secondary progress dialog.
  • Fourth parameter: Allow the process to be canceled.
    • 0: do not allow cancel
    • 1 (default): do allow cancel
  • Fifth parameter: Indicates whether the cumulative step history from the creation of the progress dialog object should be used for the incrementing of the progress
    • 0: do not keep the step history
    • 1: do keep the step history

Show the Progress Dialog

Syntax

progress dialog object.show()

Example

myProgress.show()

Show the progress dialog, or bring the dialog to the front.

Hide the Progress Dialog

Syntax

progress dialog object.hide()

Example

myProgress.hide()

Hide the progress dialog.

Close and Reset the Progress Dialog

Syntax

progress dialog object.finish()

Example

myProgress.finish()

It is good practice to make sure a progress bar is disposed of when the task is complete. After this method is called, the progress bar will no longer show itself and calling methods on the script object will have no effect.

Reset the Progress Dialog

Syntax

progress dialog object.reset(new progress dialog message, close old dialog indicator)

Example

myProgress.reset(“Progress is still running”, 0)

For this command both parameters are optional. The first parameter is a new progress dialog message. The second parameter indicates if the old dialog should be closed.

  • 0: do not close the old dialog
  • 1: do close the old dialog

Set the Dialog Message

Syntax

progress dialog object.setMessage(new dialog message)

Example

myStatus.setMessage(“This is my new message”)

This command allows the message in the progress or status dialog to be updated without having to reset the counter. This is useful in multi-step processes to let the user know what step the script is on.

Set the Minimum Duration of the Progress Dialog

Syntax

progress dialog object.setMinimumDuration(integer value representing time in milliseconds)

Example

myProgress.setMinimumDuration(10000)

Sets the amount of time the progress dialog should wait before displaying. This is useful when you are unsure if an operation will take long enough to warrant a progress dialog.

Set the Mode as a Single or Multi Step Progress

Syntax

progress dialog object.setMode(integer representing the mode)

Example

myProgress.setMode(1)

This command indicates whether the progress dialog is used for a single- or multi-step process. The possible values for this parameter are:

  • 0: Single-step process
  • 1: Multi-step process

Set the Secondary Total Number of Steps

Syntax

progress dialog object.setSecondaryTotalSteps(total number of secondary steps)

Example

myProgress.setSecondaryTotalSteps(1000)

Sets the total number of secondary steps for use in the secondary progress bar.

Set the Secondary Progress Message

Syntax

progress dialog object.setSecondaryMessage(secondary progress message)

Example

myProgress.setSecondaryMessage(“Step x of y”)

Sets the message for the secondary progress bar.

Update the Secondary Progress

Syntax

progress dialog object.setSecondaryProgress(secondary progress count)

Example

count2 += 1
myProgress.setSecondaryProgress(count2)

This command allows the progress of a secondary progress bar to be set in the progress dialog. The secondary progress count is a number between 0 and the total number of steps for the secondary progress. The progress bar is updated so the count from the total number of steps has been achieved.

Show a Cancel Button on the Progress Dialog

Syntax

progress dialog object.showCancel(show cancel button indicator)

Example

myProgress.showCancel(0)

Either shows or removes a cancel button from the progress dialog.

  • 0: hide cancel button on dialog
  • 1: show cancel button on dialog

Show the Value of the Progress Counter

Syntax

new variable = progress dialog object.value()

Example

myCount = myProgress.value()

Returns the last count used to update the progress dialog.

Building a Dataset

As you manipulate data in scripting there may be times when you would like to add a new dataset and its corresponding spreadsheet to a project. The following set of commands allows you to construct a dataset from Python lists and add the dataset to a project.

Initializing a Dataset Builder

Syntax

dataset builder object = ghi.dataSetBuilder(dataset name, number of rows)

Example

myBuilderObject = ghi.dataSetBuilder(“My Dataset”, 10)

This command returns an object for use in building new datasets. The first parameter is the display name for the dataset when it is added to the Project Navigator Window. The next parameter is the number of rows that will be in the new dataset.

Add Row Labels

Syntax

dataset builder object.addRowLabels(column header, list of row label strings)

Example

myBuilderObject.addRowLabels(“My Samples”, [“Sample1”, “Sample2”, “Sample3”,,”Sample10”])

This command must be used to add row labels to the dataset builder object. There are two parameters for this command, the first is the column header and the second is a list of strings that are the row labels.

Add a Binary Column

Syntax

dataset builder object.addBoolColumn(column header, list of binary values)

Example

myBuilderObject.addBoolColumn(“Case/Control”, [1,1,1,1,1,0,0,0,0,0])

This command adds a column of binary values to the new dataset. Note the values should be either 0’s or 1’s. The length of the Python list of binary values must be equal to the number of rows specified when the dataset builder was initialized.

Add an Integer Column

Syntax

dataset builder object.addIntColumn(column header, list of integer values)

Example

myBuilderObject.addIntColumn(“Investigator”, [1,2,3,4,5,1,2,3,4,5])

This command adds a column of integer values to the new dataset. The length of the Python list of integer values must be equal to the number of rows specified when the dataset builder was initialized.

Add a Real Column

Syntax

dataset builder object.addRealColumn(column header, list of real values, double precision indicator)

Example

myBuilderObject.addRealColumn(“Response”, [0.78945, 2.334, -4.56732, , -7.4560], 0)

This command adds a column of real values to the new dataset. The length of the Python list of real values must be equal to the number of rows specified when the dataset builder was initialized.

The real values can either be single- or double-precision floating point values. This is indicated by the optional third parameter.

  • 0: real values are single-precision floating point values
  • 1 (default): real values are double-precision floating point values

Add a Categorical Column

Syntax

dataset builder object.addCategoricalColumn(column header, list of strings)

Example

myBuilderObject.addCategoricalColumn(“Site”, [“A”,“B”,“C”,“A”,“B”,“C”,“A”,“B”,“C”,“C”])

This command adds a column of categorical values to the new dataset. Note that the values should be strings. The length of the Python list of string values must be equal to the number of rows specified when the dataset builder was initialized.

Add a Genotypic Column

Syntax

dataset builder object.addGenotypicColumn(column header, list of strings)

Example

myBuilderObject.addGeneticColumn(“SNP1”, [“A_A”,“A_B”,“B_B”,“?_?”,,“A_A”])

This command adds a column of genotypic values. Genotypic values are indicated with the allele separator “_”. The length of the Python list of genotypic values must be equal to the number of rows specified when the dataset builder was initialized.

Creating the Dataset

Syntax

dataset builder object.finish([optional parent node id])

Example

myBuilderObject.finish()

OR

Example

myBuilderObject.finish(4)

After all of the columns have been added to the dataset builder object, the dataset is ready to be added to the current project.

This command takes one optional argument specifying the id of the parent node. This command will add the dataset as a child of the project root unless an optional parent node is selected.

Check the Validity of the Dataset Builder

Syntax

dataset builder object.isValid()

Example

myBuilderObject.isValid()

This command checks to see if the dataset builder object is still valid. Either a 1 or a 0 is returned. If a 1 is returned then the dataset builder object is still valid and more columns can be added. If a 0 is returned then the dataset builder object is no longer valid and no more columns can be added without first creating another dataset builder object.

Building a Marker Map

As data is manipulated in scripting, there may be times when it is necessary to create a new marker map. The following set of commands allows a marker map to be constructed from Python lists.

Initializing a Marker Map Builder

Syntax

marker map builder object = ghi.markerMapBuilder(marker map name, list of markers, list of chromosomes, list of chromosome positions)

Example

myMarkerMapBuilder = ghi.markerMapBuilder(“My Marker Map”, [“SNP_1”,“SNP_2”,,“SNP_100”], [“1”,“1”,,“X”], [1,3,,40065])

This command returns an object for use in building a new marker map. There are four required parameters for this command and they are detailed below.

  • First parameter: marker map name – must be a string
  • Second parameter: list of marker names, SNPs, ProbeIDs, etc. – this must be a list of strings
  • Third parameter: list of chromosomes – this must be a list of strings
  • Fourth parameter: list of chromosomal positions – this must be a list of integers

Add an Integer Field

Syntax

marker map builder object.addIntField(integer field name, list of integers)

Example

myMarkerMapBuilder.addIntField(“My Integer Field”, [1,5,7,11,65,,190])

This command adds an optional integer field to the marker map.

Add a String Field

Syntax

marker map builder object.addStringField(string field name, list of strings)

Example

myMarkerMapBuilder.addStringField(“My String Field”, [’string1’,’string2’,,’string***’])

This command adds an optional integer field to the marker map.

Add a Field of Real Values

Syntax

marker map builder object.addRealField(real valued field name, list of real values)

Example

myMarkerMapBuilder.addRealField(“My Real Field”, [-2.0001, 1e-8, , 45.9999991])

This command adds an optional field of real values to the marker map.

Finish Building the Marker Map

Syntax

marker map builder object.finish()

Example

myMarkerMapBuilder.finish()

After all the columns needed have been added to the marker map builder object, the marker map DSM file is ready to be created and added to the marker maps folder. After finishing the marker map it can be applied to a dataset.

Check the Validity of the Marker Map Builder

Syntax

my variable = marker map builder object.isValid()

Example

myVariable = myMarkerMapBuilder.isValid()

This command returns a 1 (or true) if the marker map builder object is still valid, and more columns can be added. A 0 (or false) is returned if the marker map builder was finished and the DSM file has been created, no more columns can be added in this case.

Commands for Spreadsheet Objects

Once a scripting spreadsheet has been created either by importing data or by building a dataset, the following commands can be used to manipulate the spreadsheet.

Specifying a Data Model

Syntax

new dataModel object = spreadsheet.dataModel([Optional Parameters])

Example

myDM = mySS.dataModel(ghi.constFilterBinary | ghi.const.FilterInt)

This command creates a data model based on the filtering flags provided. A PyDataModel is a view of the spreadsheet that can look at just active data and data of specific types. This is useful for analysis where a method has constraints on what data it can handle.

The flags are a logical OR (” | ”) combination of the following constants:

  • ghi.const.FilterActiveOnly Active data, not dependent
  • ghi.const.FilterDependent Dependent data
  • ghi.const.FilterActive (default) Active or dependent data, equivalent to
    ghi.const.FilterActiveOnly | ghi.const.FilterDependent
  • ghi.const.FilterMapped Mapped columns only
  • ghi.const.FilterBinary Binary columns
  • ghi.const.FilterGenotypic Genetic columns
  • ghi.const.FilterCategorical Categorical columns
  • ghi.const.FilterInt Integer columns
  • ghi.const.FilterReal Real columns
  • ghi.const.FilterQuantitative Integer or Real columns, equivalent to ghi.const.FilterInt | ghi.const.FilterReal
  • ghi.const.UnsortedRows Unsorted rows, equivalent to clicking unsort on a spreadsheet

NOTE: The ghi.const.FilterMapped tag does not follow the logical OR but rather a logical AND. For example, (ghi.constFilterMapped | ghi.const.FilterReal) would specify only Real columns that are mapped.

Commands specific to dataModel objects are detailed in Commands for DataModel Objects.

Activate Rows or Columns for a Set of Chromosomes

Syntax

spreadsheet object.activateByChromosome(list of chromosomes)

Example

mySS.activateByChromosome([“1”, “2”, “X”])

This function activates marker mapped rows/columns which have a chromosome specified in the list. Marker mapped columns which have a chromosome that is not contained in the specified list will be inactivated.

The one required parameter for this command is a list of chromosomes which should be used to activate rows/columns. The chromosomes should be included as strings by name, e.g., “1”, “2”, “3”, ... “X”, “Y”, etc.

Create a Subset Spreadsheet From Active Data

Syntax

new spreadsheet object = spreadsheet object.activeSubset()

Example

myNewSS = mySS.activeSubset()

Creates a new spreadsheet object with all of the active data of the current spreadsheet object. If all of the data is active, then an error message indicating that some data needs to be deactivated in order to create an active subset spreadsheet will be displayed.

Append Two Spreadsheets

Syntax

appended spreadsheet object = spreadsheet object.appendSpreadsheet(node id or python spreadsheet object,
dataset name, [Optional Parameters])

Example

myAppendedSpreadsheet = mySS.appendSpreadsheet(45, “mySS Appended with SS Node 45”, dropColumns = 0, addToProjectRoot = 0)

OR

myAppendedSpreadsheet = mySS1.appendSpreadsheet(mySS2,”mySS1 Appended with mySS2”, dropColumns = 0, addToProjectRoot = 1)


This command appends another spreadsheet to the current spreadsheet object. The required parameters are the node ID of the spreadsheet to append (or the name of another spreadsheet object), the new dataset name.

Two optional parameters can be specified, and their keyword arguments and defaults are detailed below. Note: Not all keyword arguments need to be used.

  • dropColumns:
    • 0: All columns are kept, cells missing data are filled with missing values
    • 1 (default): Columns that do not match are dropped.
  • addToProjectRoot:
    • 0 (default): New spreadsheet is created as a child of the current spreadsheet object.
    • 1: New spreadsheet is created as a child of the project root.

Apply a Marker Map to a Spreadsheet

Syntax

marker mapped spreadsheet object = spreadsheet object.applyMarkerMap(file name of DSM marker map file,[Optional Parameters])

Example

myMarkerMappedSS = mySS.applyMarkerMap(“myMarkerMap.dsm”, columnOriented = 1, dropDuplicates = 0)

To apply a marker map to a spreadsheet, you must have first imported a marker map to the Marker Maps folder, or converted a text marker map or MAP file to a DSM marker map dataset. Apply the marker map to a spreadsheet using the above command. The marker map file name should be used as the one parameter and does not need the path if the marker map is saved to the Marker Maps folder in the “SVS Data Directory”. This method will verify compatibility and return the new mapped spreadsheet. The mapped spreadsheet will also be added to the project as a child of the spreadsheet that is being mapped.

Two optional parameters can be specified, their keyword arguments and defaults are detailed below. Note: Not all keyword arguments need to be used.

  • columnOriented:
    • 0: Marker names are row labels
    • 1 (default): Marker names are column name headers
  • dropDuplicates:
    • 0 (default): Apply the marker map to all instances of the same marker name.
    • 1: Only apply the marker map to the first instance of the marker name. Delete all other columns or rows for the same marker from the new spreadsheet.

Select a Spreadsheet Cell

Syntax

new variable = spreadsheet object.cell(row number, column number)

Example

myVariable = mySS.cell(1,4)

This function returns the data from the spreadsheet cell found at the intersection of the specified row and column. Row 0 is the row containing column headers, and column 0 is the column containing the row labels (either generic or informative labels). An invalid row or column index throws an exception.

Select a Spreadsheet Column - 1 Based Index

Syntax

new Python list = spreadsheet object.col(column number, row state)

Example

myList = mySS.col(3, ghi.const.StateActive)

This function returns the spreadsheet column values for the selected column, but does not return the column header. An invalid column index throws an exception. Column number 0 corresponds to the row labels, column number 1 corresponds to the first column of data.

The possible row state values are as follows:

  • -1 (default): return all rows
  • ghi.const.StateInactive: return only inactive rows
  • ghi.const.StateActive: return only active rows

Select a Spreadsheet Column - 0 Based Index

Syntax

new Python list = spreadsheet object.zcol(column number, row state)

Example

myList = mySS.zcol(3, ghi.const.StateActive)

This function returns the spreadsheet column values for the selected column, but does not return the column header. An invalid column index throws an exception. Column number 0 corresponds to the first column of data.

The possible row state values are as follows:

  • -1 (default): return all rows
  • ghi.const.StateInactive: return only inactive rows
  • ghi.const.StateActive: return only active rows

Obtain Column Headers

Syntax

new Python list = spreadsheet object.colHeaders(column state)

Example

myList = mySS.colHeaders(ghi.const.StateActive)

This command returns a list of the column headers for the specified column state.

The possible column state values are as follows:

  • -1 (default): return all column headers
  • ghi.const.StateInactive: return only inactive column headers
  • ghi.const.StateActive: return only independent/active column headers
  • ghi.const.StateDependent: return only dependent column headers

Obtain Column Indexes

Syntax

new Python list = spreadsheet object.col(column type, column state)

Example

myList = mySS.colIndexes(ghi.const.TypeGenotypic, ghi.const.StateActive)

Return a list of 1-based column indexes for a given column type and state.

The possible column type values are as follows:

  • -1 (default): all column type
  • ghi.const.TypeBinary: only binary columns
  • ghi.const.TypeInteger: only integer columns
  • ghi.const.TypeReal: only real columns
  • ghi.const.TypeCategorical: only categorical columns
  • ghi.const.TypeGenotypic: only genotypic columns

The possible column state values are as follows:

  • -1 (default): all column states
  • ghi.const.StateInactive: only inactive columns
  • ghi.const.StateActive: only independent/active columns
  • ghi.const.StateDependent: only dependent columns

Create a Column Subset Spreadsheet

Syntax

new spreadsheet object = spreadsheet object.colSubset()

Example

colSubsetSS = mySS.colSubset()

Creates a new spreadsheet from the active columns of the original spreadsheet.

Create a Top Level Spreadsheet

Syntax

new spreadsheet object = spreadsheet object.createTopLevelSpreadsheet(dataset name, active or all data indicator)

Example

myTopLevelSS = mySS.createTopLevelSpreadsheet(“My Top Level SS”, 0)

Takes a spreadsheet that is a child of a top level spreadsheet and creates a spreadsheet (and a new DSF file) that is a child of the project root. This spreadsheet is not dependent on other spreadsheets, and any dependencies are removed. The top level spreadsheet can be an exact copy of the original child spreadsheet, or just a subset based on active data.

The first parameter, the dataset name, is required.

The second parameter indicates if the top level spreadsheet should be created using only the active data or all data from the original spreadsheet. There are two possible settings:

  • 0: All data is used in the creation of the top level spreadsheet.
  • 1 (default): Only active data is used in the creation of the top level spreadsheet.

Drop a Marker Map from a Spreadsheet

Syntax

new spreadsheet object = spreadsheet object.dropMarkerMap()

Example

myNewSS = mySS.dropMarkerMap()

This command creates a new spreadsheet where the current marker map is removed and returns a reference to that spreadsheet.

Export Spreadsheet as a Comma Delimited Text File

Syntax

spreadsheet object.exportCSV(path and file name for CSV file, save active or all data indicator)

Example

mySS.exportCSV(“/data/results.csv”, 0)

Exports the spreadsheet to a comma-delimited text file.

The first parameter is required and is the path and file name for the file where the spreadsheet will be saved.

The second parameter is optional, and indicates whether only active data or all data will be written to the specified file. There are two possible settings:

  • 0: All data is written to the specified file location.
  • 1 (default): Only active data is written to the specified file location.

Export Spreadsheet as a DSF File

Syntax

spreadsheet object.exportDSF(path and file name for DSF file, dataset name, save active or all data indicator)

Example

mySS.exportDSF(“/data/results.dsf”, “My Results Dataset”, 1)

Exports the spreadsheet to a DSF file.

The first two parameters are required. The first parameter is the path and file name for the file where the spreadsheet will be saved. The second parameter is the name of the dataset. This name will be used on re-importing the DSF file into a project.

The third parameter is optional, and indicates whether only active data or all data will be written to the specified file. There are two possible settings:

  • 0: All data is written to the specified file location.
  • 1 (default): Only active data is written to the specified file location.

Export Spreadsheet as a GHD File

Syntax

spreadsheet object.exportGHD(path and file name for GHD file, save active or all data indicator)

Example

mySS.exportGHD(“/data/results.ghd”, 0)

Exports the spreadsheet to a GHD file.

The first parameter is required and is the path and file name for the file where the spreadsheet will be saved.

The second parameter is optional, and indicates whether only active data or all data will be written to the specified file. There are two possible settings:

  • 0: All data is written to the specified file location.
  • 1 (default): Only active data is written to the specified file location.

Export the Marker Map Associated with a Spreadsheet

Syntax

spreadsheet object.exportMarkerMap(path and file name for DSM file, [Optional Parameters])

Example

mySS.exportMarkerMap(“/data/myresultsmap.dsm”, saveDatasetName = “My Results Marker Map”, activeDataOnly = 1)

Exports the marker map applied to the current spreadsheet object to a DSM file.

Only the first parameters is required. The first parameter is the path and file name for the file where the spreadsheet will be saved.

There are two optional parameters specified by keyword arguments. These keywords arguments and defaults are detailed below. Note: Not all keyword arguments need to be used.

  • saveDatasetName: Specifies the name that will be used for displaying the marker map in the marker maps list.
  • activeDataOnly:
    • 0: All marker map data is written to the specified file location.
    • 1 (default): Only marker map data corresponding to active columns (or rows) is written to the specified file location.

Export Spreadsheet to a File

Syntax

spreadsheet object.exportToFile(path and file name with text or third party extension, [Optional Parameters])

Example

mySS.exportToFile(“/data/myresults.xls”, exportMarkerMap = 0, activeDataOnly = 1, saveRowLabels = 1, fieldDelimiter = “,”, alleleDelimiter = “_”, missingValues = “?”, missingAllele = “?”)

This command exports a spreadsheet to a text file or a third party file. There is one required parameter and seven optional parameters.

The first parameter is required, and is the path and file name specifying the location to save the spreadsheet.

The seven optional parameters are specified by keyword arguments. The keyword arguments and defaults are listed below. Note: Not all keyword arguments need to be used.

  • exportMarkerMap: This option is only valid when the markers names are row labels.
    • 0: No marker map data is exported with the spreadsheet data.
    • 1 (default): Marker map data is exported as additional columns between the row labels and the spreadsheet data.
  • activeDataOnly: Specifies whether all data or only active data is exported.
    • 0: All data is exported.
    • 1 (default): Only active data is exported.
  • saveRowLabels: Specifies whether the row labels are exported.
    • 0: Row labels are not exported.
    • 1 (default): Row labels are exported.
  • fieldDelimiter: This option is ignored for file types not requiring a field delimiter to be specified.
    • “,”: for comma-delimited files
    • “ ”: for space-delimited files
    • \t”: for tab-delimited files
    • or any other one character field delimiter
  • alleleDelimiter: This option is ignored for spreadsheets without genetic data.
    • “_”: for under-score allele delimited files
    • “/”: for forward-slash allele delimited files
  • missingValues: Specifies the string representing missing data.
    • “?”:
    • or any other character representative of missing data
  • missingAllele: Specifies the string representing missing alleles.
    • “?”
    • or any other character representative of missing alleles

Export Transposed Spreadsheet to a Text File

Syntax

spreadsheet object.exportTransposeToFile(path and file name of CSV or TXT file, [Optional Parameters])

Example

mySS.exportTransposeToFile(“/data/myExportTransposedFile.csv”, exportMarkerMap = 0, activeDataOnly = 1, saveRowLabels = 1, fieldDelimiter = “,”, alleleDelimiter = “_”, missingValues = “?”, missingAllele = “?”, columnLabel = “Columns”)

This command transposes a spreadsheet on export to a text file. That is, the columns become the rows in the exported dataset. There is one required parameter and eight optional parameters.

The first parameter is required, and is the path and file name specifying the location to save the spreadsheet.

The eight optional parameters are specified by keyword arguments. The keyword arguments and defaults are listed below. Note: Not all keyword arguments need to be used.

  • exportMarkerMap: This option is only valid when the markers names are column name headers.
    • 0: No marker map data is exported with the spreadsheet data.
    • 1 (default): Marker map data is exported as additional columns between the row labels and the spreadsheet data.
  • activeDataOnly: Specifies whether all data or only active data is exported.
    • 0: All data is exported.
    • 1 (default): Only active data is exported.
  • saveRowLabels: Specifies whether the row labels are exported.
    • 0: Row labels are not exported.
    • 1 (default): Row labels are exported.
  • fieldDelimiter: This option is ignored for file types that do not require a field delimiter to be specified.
    • “,”: for comma-delimited files
    • “ ”: for space-delimited files
    • \t”: for tab-delimited files
    • or any other one character field delimiter
  • alleleDelimiter: This option is ignored for spreadsheets that do not have genetic data.
    • “_”: for under-score allele delimited files
    • “/”: for forward-slash allele delimited files
  • missingValues: Specifies the string representing missing data.
    • “?”:
    • or any other character representative of missing data
  • missingAllele: Specifies the string representing missing alleles.
    • “?”:
    • or any other character representative of missing alleles
  • columnLabel: Specifies the label for the column names.
    • a string parameter for row label header (column names are now row labels)

Find a Column in a Spreadsheet

Syntax

column number = spreadsheet object.findCol(column name)

Example

myColNum = mySS.findCol(“nameOfCol56”)

This command searches for a column in the spreadsheet whose column name header is specified. It returns the index of that column or -1 if no such column is found.

Find Several Columns in a Spreadsheet

Syntax

list of column numbers = spreadsheet object.findCols(list of column names)

Example

myColNums = mySS.findCols([“nameOfCol56”, “nameOfCol58”, …, “nameOfCol200”])

This command searches for the columns in the spreadsheet whose column name headers are specified in the list of column names.

The column index of each of the columns found is returned in a list. If no column name headers match any of the strings provided, an empty list will be returned.

Find a Row in a Spreadsheet

Syntax

row number = spreadsheet object.findRow(row name)

Example

myRowNum = mySS.findRow(“sample24”)

This command searches for a row in the spreadsheet whose row label is specified. It returns the index of that row, or -1 if no such row is found. The spreadsheet must have row labels, otherwise this routine will return an error.

Get the State of a Column

Syntax

new variable = spreadsheet object.getColState(column number)

Example

myVariable = mySS.getColState(4)

This command returns the column state as one of the following values.

  • ghi.const.StateInactive: Inactive
  • ghi.const.StateActive: Independent/Active
  • ghi.const.StateDependent: Dependent

Get a Spreadsheet Column Type

Syntax

new variable = spreadsheet object.getColType(column number)

Example

myVariable = mySS.getColType(3)

This function returns the column type as one of the following values.

  • ghi.const.TypeBinary: Binary
  • ghi.const.TypeInteger: Integer
  • ghi.const.TypeReal: Real
  • ghi.const.TypeCategorical: Categorical
  • ghi.const.TypeGenotypic: Genotypic

Get the Marker Map File Name

Syntax

file name = marker mapped spreadsheet object.markerMapFileName()

Example

myMapName = myMarkerMappedSS.markerMapFileName()

This command returns the name of the project map file used to marker map this spreadsheet. Useful for calling applyMarkerMap() on another spreadsheet with this string as the first parameter.

Get the Marker Map Chromosome Information

Syntax

chromosome list = marker mapped spreadsheet object.getMarkerMapChromosome()

Example

myChromosomes = myMarkerMappedSS.getMarkerMapChromosome()

This command returns a list of the chromosome information for the spreadsheet from the applied marker map in a Python List.

Get the Marker Map Position Information

Syntax

position list = marker mapped spreadsheet object.getMarkerMapPosition()

Example

myPositions = myMarkerMappedSS.getMarkerMapPosition()

This command returns a list of the chromosomal position information for the spreadsheet from the applied marker map in a Python List.

Get Marker Map Information By Field

Syntax

list of field’s data = marker mapped spreadsheet object.getMarkerMapField(field number)

Example

myDataList = mySS.getMarkerMapField(3)

This command returns a list of data for the given marker map field number from the applied marker map in a Python List.

Get Marker Map Information By Cell

Syntax

object containing cell’s data = marker mapped spreadsheet object.getMarkerMapFieldCell(field number, column number)

Example

myVariable = mySS.getMarkerMapFieldCell(3, 6)

This command returns the data from the applied marker map for the given field at the given column number in a Python object of a type corresponding to the cell’s data type.

NOTE:

  • Accessing data for many cells using this method can be time consuming. If many cells need to be accessed, it is faster to get the field as a whole, and access data for all columns in that field before moving on to the next field.

Get Marker Map Field Names

Syntax

list of field names = marker mapped spreadsheet object.getMarkerMapFieldNames()

Example

myFieldNameList = mySS.getMarkerMapFieldNames()

This command returns a list containing the names of the fields contained in the applied marker map in a Python List. The names are listed in the order in which they appear in the marker map.

Get Marker Map Field Types

Syntax

list of field types = marker mapped spreadsheet object.getMarkerMapFieldTypes()

Example

myFieldNameList = mySS.getMarkerMapFieldTypes()

This command returns a list containing the type of each column contained in the applied marker map in a Python List. The types are listed in the order in which they appear in the marker map.

The returned types may take on one of the following values:

  • 1: Integer
  • 2: Real
  • 3: Categorical

Get the Marker Map Offset in a Marker Mapped Spreadsheet

Syntax

offset = marker mapped spreadsheet object.getMarkerMapOffset()

Example

myOffset = myMarkerMappedSS.getMarkerMapOffset()

This command returns the the offset of the mapped markers in the spreadsheet. If the first column (or row, in a row-mapped spreadsheet) is mapped, 0 is returned. For example, if there are 5 unmapped columns (or rows, in a row-mapped spreadsheet) before the first mapped one, 5 will be returned.

Get the Marker Map Orientation of a Spreadsheet

Syntax

orientation = spreadsheet object.getMarkerMapOrientation()

Example

myOrientation = mySS.getMarkerMapOrientation()

This command returns the orientation of the marker map. This function returns the column type as one of the following values.

  • 0: There is no marker map applied
  • 1: The map is oriented along the columns of the spreadsheet
  • 2: The map is oriented along the rows of the spreadsheet

Get the State of a Row

Syntax

new variable = spreadsheet object.getRowState(row number)

Example

myVariable = mySS.getRowState(3)

This command returns the state of a row. There are two states:

  • ghi.const.StateInactive: Inactive
  • ghi.const.StateActive: Active

Determine if a Spreadsheet Has a Marker Map Applied

Syntax

new variable = spreadsheet object.hasMarkerMap()

Example

myVariable = mySS.hasMarkerMap()

This command indicates if a marker map is applied to the spreadsheet object. There are two states: False indicates no marker map is applied, True indicates that a marker map is applied to the spreadsheet.

Determining Whether a Spreadsheet Contains Pedigree Columns

Syntax

pedigree status = spreadsheetObject.hasPedFields()

Example

isPed = mySS.hasPedFields()

This method returns a boolean value indicating whether a spreadsheet contains pedigree columns.

Invert the Active Column State

Syntax

new spreadsheet object = spreadsheet object.invertColState()

Example

myColInvertedSS = mySS.invertColState()

This command causes the state of all columns to be inverted. That is, formerly active or dependent columns are made inactive, and columns formerly inactive are made active.

Invert the Active Row State

Syntax

new spreadsheet object = spreadsheet object.invertRowState()

Example

myRowInvertedSS = mySS.invertRowState()

This command causes the state of all rows to be inverted. That is, rows formerly active are made inactive, and rows formerly inactive are made active.

Join Two Spreadsheets By Row Labels

Syntax

new spreadsheet object = spreadsheet object.joinByRowLabels(node ID of the spreadsheet to join, [Optional Parameters])

Example

joinedSS = mySS.joinByRowLabels(45, newDatasetName = “Joined SS1 and SS2”, merge = 0, childOfCurrent = 0, dupResolution = ghi.const.DupFillLeft)

Spreadsheets can be joined as long as some rows match in each spreadsheet. This is useful for adding columns to a spreadsheet. The only required parameter is the node ID of the second spreadsheet to join to the current spreadsheet object. Four additional optional parameters can be changed from their defaults. A new spreadsheet object will be returned representing the joined spreadsheets. The joined spreadsheet will be added to the navigator window as a child of the spreadsheet object used to issue the join command, unless otherwise specified.

The four optional parameters are specified by keyword arguments. The keywords and defaults are listed below. Note: Not all keyword arguments need to be used.

  • newDatasetName: Specifies the name of the new dataset.
    • “Joined Spreadsheet” default new dataset name
    • or any other dataset name provided as a string
  • merge: Specifies if non-matching rows should be kept or dropped.
    • 0 (default): Drop non-matching rows.
    • 1: Keep non-matching rows, fill appropriate columns with missing values.
  • dupResolution: Specifies the behavior when there are duplicate columns.
    • ghi.const.DupKeepBoth (default): Keep duplicate columns as separate columns in the new spreadsheet.
    • ghi.const.DupFillLeft: Use the columns of the left spreadsheet to fill in missing values of the right spreadsheet.
    • ghi.const.DupFillRight: Use the columns of the right spreadsheet to fill in missing values of the left spreadsheet.
  • childOfCurrent: Specifies if the new spreadsheet should be a child of the project root or a child of the current spreadsheet object.
    • 0: Create the spreadsheet as a child of the project root.
    • 1 (default): Create the spreadsheet as a child of the current spreadsheet object.

Get the Number of Columns in the Spreadsheet

Syntax

new variable = spreadsheet object.numCols()

Example

numberOfColumns = mySS.numCols()

This command returns the number of columns (not including the row label column) in the spreadsheet object.

Get the Number of Columns of a Certain State

Syntax

new variable = spreadsheet object.numColsState(numerical column state id)

Example

numberActive = mySS.numColsState(ghi.const.StateActive)

This command returns the number of columns of the specified state. The possible options for the column state id are:

  • ghi.const.StateInactive: Returns number of inactive columns
  • ghi.const.StateActive: Returns number of active columns
  • ghi.const.StateDependent: Returns number of columns set as dependent variables

Get the Number of Rows in the Spreadsheet

Syntax

new variable = spreadsheet object.numRows()

Example

numberOfRows = mySS.numRows()

This command returns the number of rows in the spreadsheet, not including the column header row.

Get the Number of Rows of a Certain State

Syntax

new variable = spreadsheet object.numRowsState(numerical row state id)

Example

numberActive = mySS.numRowsState(ghi.const.StateInactive)

This command returns the number of rows of the specified state. The possible options for the row state id are:

  • ghi.const.StateInactive: Returns number of inactive rows
  • ghi.const.StateActive: Returns number of active rows

Randomly Shuffle Rows

Syntax

spreadsheet object.permuteRows()

Example

mySS.permuteRows

This command randomly permutes the rows in the spreadsheet by modifying the sort order at random. Subsequent calls to this command will give new permutations, based on the current random number seed.

Plot a Column or Columns in Numeric Value Plot(s)

Syntax

spreadsheet object.plotColumns(column number)

OR

spreadsheet object.plotColumns(list of column numbers, [Optional Parameter])


Example

mySS.plotColumns(1)

OR

mySS.plotColumns([1,2,5], oneItemPerGraph=1)


This command takes a column index or list of indices for a numeric column and plots numeric value plots. The X-axis corresponds to row labels and the Y-axis corresponds to the values in the numeric column.

The one required parameter is either a column index or a Python list of column indices indicating which values to plot.

There is one optional parameter for when more than one column is plotted. This parameter is specified by a keyword argument as detailed below.

  • oneItemPerGraph:
    • 0: All items are plotted in one graph.
    • 1 (default): All items are plotted in separate graphs.

Create XY Scatter Plot(s) by Setting Dependent State in the Spreadsheet

Syntax

spreadsheet object.plotDependents(column number,[Optional Parameter])

Example

mySS.plotDependents(1, oneItemPerGraph = 1)

This command takes a column index for a numeric column and plots XY scatter plots. The independent variable (X-axis) corresponds to the column specified and the dependent variable(s) are all columns whose states are set as dependent in the current spreadsheet object.

The one required parameter is a column index specifying the independent variable or X-axis.

There is one optional parameter for when more than one column is plotted. This parameter is specified by a keyword argument as detailed below.

  • oneItemPerGraph:
    • 0: All items are plotted in one graph.
    • 1 (default): All items are plotted in separate graphs.

Create XY Scatter Plot(s) by Specifying both Independent and Dependent Columns

Syntax

spreadsheet object.plotXY(independent column number,list of dependent columns,[Optional Parameter])

Example

mySS.plotXY(1,[2,3,4,5] oneItemPerGraph = 1)

This command takes a column index for the independent column and plots XY scatter plots for each specified dependent column. The independent variable (X-axis) corresponds to the column specified and the dependent variable(s) are all columns specified in the dependent column list.

The required parameters are the a independent column index and the list of dependent column indexes.

There is one optional parameter for when more than one column is plotted. This parameter is specified by a keyword argument as detailed below.

  • oneItemPerGraph:
    • 0: All items are plotted in one graph.
    • 1 (default): All items are plotted in separate graphs.

Plot a Column or Columns in Histograms

Syntax

spreadsheet object.plotHistograms(column number)

OR

spreadsheet object.plotHistograms(list of column numbers, [Optional Parameter])


Example

mySS.plotHistograms(1)

OR

mySS.plotHistograms([1,2,5], oneItemPerGraph=1)


This command takes a column index or list of indices for a numeric column and plots histograms. The X-axis corresponds to the values in the numeric column(s) and the Y-axis corresponds to frequency counts for each histogram bin.

The one required parameter is a column index or a Python list of column indices indicating which values to plot in a histogram.

There is one optional parameter for when more than one column is plotted. This parameter is specified by a keyword argument as detailed below.

  • oneItemPerGraph:
    • 0: All items are plotted in one graph.
    • 1 (default): All items are plotted in separate graphs.

Plot a Heat Map of a Spreadsheet with Numeric Values

Syntax

spreadsheet object.plotHeatMap()

Example

mySS.plotHeatMap()

This command plots a Heat Map for the active numeric values in the spreadsheet. There are no parameters to specify. If the spreadsheet is marker mapped then the heat map will be plotted with the markers along the X-axis regardless of the orientation of the markers in the spreadsheet. If there is no marker map applied to the spreadsheet then the columns will be plotted along the X-Axis on a uniform scale.

Plot a LD Graph of a Genetic Spreadsheet

Syntax

spreadsheet object.plotLD()

Example

mySS.plotLD()

This command plots LD for the active genotype columns in the spreadsheet. There are no parameters to specify. If the spreadsheet is marker mapped, the columns will be plotted based on genetic distance. Otherwise, the columns will be plotted based on a uniform scale.

Select a Spreadsheet Row - 1 Based Index

Syntax

list of row elements = spreadsheet object.row(row number, column state)

Example

myRowData = mySS.row(3, ghi.const.StateActive)

This command returns a list of elements in a row given by the specified row number, but does not return the row label. Row number 0 is the column header row, row number 1 is the first row of data. Note that row access is generally slower than column access. An error is displayed if an invalid row number is specified.

The possible column state values are as follows:

  • -1 (default): return data from all columns
  • ghi.const.StateInactive: return data from only inactive columns
  • ghi.const.StateActive: return data from only independent/active columns
  • ghi.const.StateDependent: return data from only dependent columns

Select a Spreadsheet Row - 0 Based Index

Syntax

list of row elements = spreadsheet object.zrow(row number, column state)

Example

myRowData = mySS.zrow(3, ghi.const.StateActive)

This command returns a list of elements in a row given by the specified row number, but does not return the row label. Row number 0 is the first row of data. Note that row access is generally slower than column access. An error is displayed if an invalid row number is specified.

The possible column state values are as follows:

  • -1 (default): return data from all columns
  • ghi.const.StateInactive: return data from only inactive columns
  • ghi.const.StateActive: return data from only independent/active columns
  • ghi.const.StateDependent: return data from only dependent columns

Obtain Row Indexes

Syntax

new Python list = spreadsheet object.rowIndexes(row state)

Example

myList = mySS.rowIndexes(ghi.const.StateActive)

Return a list of 1-based row indexes for a given row state.

The possible row state values are as follows:

  • -1 (default): return all row indexes
  • ghi.const.StateInactive: return only inactive row indexes
  • ghi.const.StateActive: return only active row indexes

Obtain Row Labels

Syntax

new Python list = spreadsheet object.rowLabels(row state)

Example

myList = mySS.rowLabels(ghi.const.StateActive)

This command returns a list of the row labels for the specified row state.

The possible row state values are as follows:

  • -1 (default): return all row labels
  • ghi.const.StateInactive: return only inactive row labels
  • ghi.const.StateActive: return only active row labels

Create a Row Subset Spreadsheet

Syntax

new spreadsheet object = spreadsheet object.rowSubset()

Example

rowSubsetSS = mySS.rowSubset()

Creates a new spreadsheet from the active rows of the current spreadsheet object.

Select Rows Based on a Binary Values of a Specified Column

Syntax

spreadsheet object.selectRowsByColumnBoolean(column number,boolean value)

Example

mySS.selectRowsByColumnBoolean(3,0)

This command takes the specified column number that corresponds to a binary column and either activates all of the rows that have a 0 in this column, or all of the rows that have a 1 in this column. The second parameter specifies which rows to activate. The default value is 1, so if the second parameter is not specified, all rows that have a 1 in the indicated column will be activated. In the example above, rows that have a 0 in column 3 are activated.

Select Rows Based on a Value of a Specified Column

Syntax

spreadsheet object.selectRowsByColumnValue(column number, threshold value, less than indicator)

Example

mySS.selectRowsByColumnValue(5,140.2,1)

This command takes the specified column number that corresponds to a real, or integer column and either activates all of the rows that are less than the threshold or greater than the threshold. The first parameter is required and indicates the column to use for determining if a row is to be active or inactive. The second parameter is also required and is the threshold to use in setting row state. NOTE: This threshold must be a double value, for example if a threshold of 140 is desired, this needs to be entered as 140.0.

The third parameter is optional and indicates whether rows corresponding to column values less than or greater than the threshold should be activated. The direction of the activation is indicated below:

  • 0: Greater than
  • 1 (default): Less than

Set the Column State

Syntax

spreadsheet object.setColState(column number or a list of column numbers, column state indicator)

Example

mySS.setColState(3,ghi.const.StateInactive)

OR

mySS.setColState([3,5,9,12],ghi.const.StateInactive)

This command sets the specified column or list of columns to the specified state. The first parameter is the column number or a Python list of column numbers. The second parameter is the column state. Other column states remain unchanged. There are three possible states:

  • ghi.const.StateInactive: Inactive
  • ghi.const.StateActive: Active/Independent
  • ghi.const.StateDependent: Dependent

Set the Row State

Syntax

spreadsheet object.setRowState(row number, row state indicator)

OR

spreadsheet object.setRowState(first row number, last row number, row state indicator)


OR

spreadsheet object.setRowState(list of row numbers, row state indicator)


Example

mySS.setRowState(3, ghi.const.StateInactive)

OR

mySS.setRowState(3, 15, ghi.const.StateInactive)


OR

mySS.setRowState([4,5,7,15], ghi.const.StateInactive)


This command sets the specified row, range of rows, or rows specified in the list to the specified state. In the first and third cases, there are two required parameters; in the second case, there are three required parameters.

The first case sets the state of one particular row. For this case, the first parameter is the row number. The second parameter is the row state. Other row states remain unchanged. There are two possible states detailed below.

The second case sets the state of a range of rows. For this case, the first parameter is the first row number of the range, the second parameter is the last row number of the range of rows. The third parameter is the row state. Other row states remain unchanged. There are two possible states detailed below.

The third case sets the state of those rows specified in a list. For this case, the first parameter is the list of row numbers. The second parameter is the row state. Other row states remain unchanged. There are two possible states detailed below:

  • ghi.const.StateInactive: Inactive
  • ghi.const.StateActive: Active/Independent

Set the State of Rows Randomly

Syntax

spreadsheet object.setRowStateRandom(number of rows to select randomly, row state indicator)

OR

spreadsheet object.setRowStateRandom(percent of rows to select randomly, row state indicator)


Example

mySS.setRowStateRandom(100, ghi.const.StateInactive)

OR

mySS.setRowStateRandom(0.5,ghi.const.StateActive)


This command sets the randomly selected number or percent of rows to the specified state. In the each case, there are two required parameters.

The first case will set a number of randomly selected rows to the specified state. Other rows are set to the opposite state. There are two possible states detailed below.

The second case will at random set a fraction of the total number of rows to the specified state. The percent parameter must be a number between 0 and 1. This is useful for selecting a certain percentage of the data irregardless of the number of rows. Other rows are set to the opposite state. There are two possible states detailed below:

  • ghi.const.StateInactive: Inactive
  • ghi.const.StateActive: Active/Independent

Sort a Column in Ascending Order

Syntax

spreadsheet object.sortByColAscending(column number)

Example

mySS.sortByColAscending(3)

This command sorts the spreadsheet by arranging the specified column in ascending order.

Sort a Column in Descending Order

Syntax

spreadsheet object.sortByColDescending(column number)

Example

mySS.sortByColDescending(3)

This command sorts the spreadsheet by arranging the specified column in descending order.

Sort a Spreadsheet By Custom Order

Syntax

spreadsheet object.sortByCustomOrder(Python list of all row numbers in custom order)

Example

mySS.sortByCustomOrder([1,3,5,7,9,10,8,6,4,2])

This command sorts the rows in the specified order. The one required parameter is a Python list of the row numbers in the desired sort order. This list must be the same length as the number of rows in the spreadsheet. In the example, the spreadsheet contains ten rows (not including the column headers). The rows are sorted exactly as specified in the list.

Get the Column Number Used in Sorting

Syntax

new variable = spreadsheet object.sortColIdx()

Example

sortCol = mySS.sortColIdx()

Returns the column number last used for sorting.

The following numbers could also be returned:

  • 0: sorted by row labels
  • -1: not sorted
  • -2: sorted by custom order

Get the Sort Direction Used in Sorting

Syntax

new variable = spreadsheet object.sortDirection()

Example

mySortDirection = mySS.SortDirection()

Returns the sort direction last used. This command should be used in conjunction with the sortColIdx() command.

There are two possible outcomes:

  • 0: Last sort was in ascending order
  • 1: Last sort was in descending order

Transpose the Spreadsheet

Syntax

new spreadsheet object = spreadsheet object.transpose(new dataset name, type of columns to transpose, [Optional Parameters])

Example

myTransposedSS = mySS.transpose(“SS Transposed”, ghi.const.DataTypeGenotypic, labelHeader = “SNPs”, activeDataOnly = 0, childOfCurrent = 0)

This command transposes all columns of the same column type in the spreadsheet. There are two required parameters for this command and three optional parameters. The two required parameters are, in order, the new dataset name, and the type of the columns to transpose. The possible types of columns are as follows:

  • ghi.const.DataTypeBinary: Binary Columns
  • ghi.const.DataTypeInteger: Integer Columns
  • ghi.const.DataTypeFloat: Columns of Single-Precision (4 bytes) floating point numbers
  • ghi.const.DataTypeDouble: Columns of Double-Precision (8 bytes) floating point numbers
  • ghi.const.DataTypeCategorical: Categorical/Nominal Columns
  • ghi.const.DataTypeGenotypic: Genotypic Columns

The three optional parameters are specified by keyword arguments. The keywords and defaults are listed below. Note: Not all keyword arguments need to be used.

  • labelHeader: Specify a header for the column names. This parameter takes string arguments.
  • activeDataOnly: Indicates if all data or only active data is to be transposed.
    • 0 (default): All data is to be transposed
    • 1: Only active data is to be transposed
  • childOfCurrent: Indicates if the transposed spreadsheet is to be created as a child of the current spreadsheet or of the project root.
    • 0: Created as a child of the project root
    • 1 (default): Created as a child of the current spreadsheet
  • memoryLimit: Indicates the maximum amount of memory to be used while transposing.
    • (default) current program limit for transpose memory usage
    • any integer value 64 value program maximum cache size

Unsort the Spreadsheet

Syntax

spreadsheet object.unsort()

Example

mySS.unsort()

Returns the spreadsheet to the original row order.

Commands for DataModel Objects

Select a Data Model Row - 1 Based Index

Syntax

list of row elements = dataModel object.row(row number)

Example

myRowData = myDM.row(3)

This command returns a list of elements in a row given by the specified row number, but does not return the row label. Row number 0 is the column header row, row number 1 is the first row of data. Note that row access is generally slower than column access. An error is displayed if an invalid row number is specified.

Select a Data Model Row - 0 Based Index

Syntax

list of row elements = dataModel object.zrow(row number)

Example

myRowData = myDM.zrow(3)

This command returns a list of elements in a row given by the specified row number, but does not return the row label. Row number 0 is the first row of data. Note that row access is generally slower than column access. An error is displayed if an invalid row number is specified.

Obtain Row Indexes

Syntax

new Python list = dataModel object.rowIndexes()

Example

myList = myDM.rowIndexes()

Returns the indexes of the row indexes in the original spreadsheet of all the rows in the current datamodel.

Obtain Row Labels

Syntax

new Python list = dataModel object.rowLabels()

Example

myList = myDM.rowLabels()

This command returns a list of the row labels.

Create a Row Subset Data Model

Syntax

new datamodel object = dataModel object.subsetRows(int list)

Example

rowSubsetDM = myDM.subsetRows([1,2,3,4,5])

Creates a new data model from the first five rows of the current data model object.

Select a Data Model Column - 1 Based Index

Syntax

list of column elements = dataModel object.col(column number)

Example

myColumnData = myDM.col(3)

This command returns a list of elements in a column given by the specified column number, but does not return the column header. Column number 0 is the row label column, column number 1 is the first column of data.

Select a Data Model Column - 0 Based Index

Syntax

list of column elements = dataModel object.zcol(column number)

Example

myColumnData = myDM.zcol(3)

This command returns a list of elements in a column given by the specified column number, but does not return the column header. Column number 0 is the first column of data.

Obtain a Column Index

Syntax

new Python list = dataModel object.colIndex(column number)

Example

myList = myDM.colIndex(3)

Returns the index of a given column in the original spreadsheet associated with the current datamodel. Because a datamodel can contain a subset of the columns from the original spreadsheet, this function allows mapping back to the original spreadsheet indexes for changing their state and other actions.

Obtain Column Indexes

Syntax

new Python list = dataModel object.colIndexes()

Example

myList = myDM.colIndexes()

Returns the indexes of the column indexes in the original spreadsheet of all the columns in the current datamodel.

Obtain Column Headers

Syntax

new Python list = dataModel object.colHeaders()

Example

myList = myDM.colHeaders()

This command returns a list of the column headers.

Create a Column Subset Data Model

Syntax

new datamodel object = dataModel object.subsetColumns(int list)

Example

colSubsetDM = myDM.subsetColumns([1,2,3,4,5])

Creates a new data model from the first five columns of the current data model object.

Select a Data Model Cell

Syntax

new variable = dataModel object.cell(row number, column number)

Example

myVariable = myDM.cell(1,4)

This function returns the data from the spreadsheet cell found at the intersection of the specified row and column. Row 0 is the row containing column headers, and column 0 is the column containing the row labels (either generic or informative labels). An invalid row or column index throws an exception.

Create a Subset based on the Current Data Model

Syntax

new subset = dataModel object.subsetSpreadsheet()

Creates and returns a subset spreadsheet based the columns and rows in the current datamodel. The subset spreadsheet will be a child of the spreadsheet this datamodel was based on.

Find a Column in a Data Model

Syntax

column number = dataModel object.findCol(column name)

Example

myColNum = myDM.findCol(“nameOfCol56”)

This command searches for a column in the spreadsheet whose column name header is specified. It returns the index of that column or -1 if no such column is found.

Find Several Columns in a Data Model

Syntax

list of column numbers = dataModel object.findCols(list of column names)

Example

myColNums = myDM.findCols([“nameOfCol56”, “nameOfCol58”, …, “nameOfCol200”])

This command searches for the columns in the spreadsheet whose column name headers are specified in the list of column names.

The column index of each of the columns found is returned in a list. If no column name headers match any of the strings provided, an empty list will be returned.

Find a Row in a Data Model

Syntax

row number = dataModel object.findRow(row name)

Example

myRowNum = myDM.findRow(“sample24”)

This command searches for a row in the spreadsheet whose row label is specified. It returns the index of that row, or -1 if no such row is found. The spreadsheet must have row labels, otherwise this routine will return an error.

Get a Spreadsheet Column Type from a Data Model

Syntax

new variable = dataModel object.colType(column number)

Example

myVariable = myDM.getColType(3)

This function returns the column type as one of the following values.

  • ghi.const.TypeBinary: Binary
  • ghi.const.TypeInteger: Integer
  • ghi.const.TypeReal: Real
  • ghi.const.TypeCategorical: Categorical
  • ghi.const.TypeGenotypic: Genotypic

Get the Number of Columns in the Data Model

Syntax

new variable = dataModel object.numCols()

Example

numberOfColumns = myDM.numCols()

This command returns the number of columns (not including the row label column) in the spreadsheet object.

Get the Number of Rows in the Data Model

Syntax

new variable = dataModel object.numRows()

Example

numberOfRows = myDM.numRows()

This command returns the number of rows in the spreadsheet, not including the column header row.

Determine if a Data Model Has a Marker Map Applied

Syntax

new variable = dataModel object.hasMarkerMap()

Example

myVariable = myDM.hasMarkerMap()

This command indicates if a marker map is applied to the spreadsheet object. There are two states: False indicates no marker map is applied, True indicates that a marker map is applied to the spreadsheet.

Get the Marker Map File Name

Syntax

file name = marker mapped dataModel object.markerMapFileName()

Example

myMapName = myMarkerMappedDM.markerMapFileName()

This command returns the name of the project map file used to marker map this spreadsheet. Useful for calling applyMarkerMap() on another spreadsheet with this string as the first parameter.

Get the Marker Map Chromosome Information

Syntax

chromosome list = marker mapped dataModel object.markerMapChromosomes()

Example

myChromosomes = myMarkerMappedDM.markerMapChromosomes()

This command returns a list of the chromosome information for the data model from the applied marker map in a Python List.

Get an Ordered List of Chromosomes

Syntax

chromosome list = marker mapped dataModel object.orderedChrList()

Example

myChrList = myMarkerMappedDM.orderedChrList()

This command returns an ordered list of chromosomes that are present in the marker map.

Get the Marker Map Position Information

Syntax

position list = marker mapped dataModel object.markerMapPosition()

Example

myPositions = myMarkerMappedDM.markerMapPosition()

This command returns a list of the chromosomal position information for the data model from the applied marker map in a Python List.

Get Marker Map Information By Field

Syntax

list of field’s data = marker mapped dataModel object.markerMapField(field number)

Example

myDataList = myMarkerMappedDM.markerMapField(3)

This command returns a list of data for the given marker map field number from the applied marker map in a Python List.

Get Marker Map Information By Cell

Syntax

object containing cell’s data = marker mapped dataModel object.markerMapFieldCell(field number, column number)

Example

myVariable = myMarkerMappedDM.markerMapFieldCell(3, 6)

This command returns the data from the applied marker map for the given field at the given column number in a Python object of a type corresponding to the cell’s data type.

NOTE:

  • Accessing data for many cells using this method can be time consuming. If many cells need to be accessed, it is faster to get the field as a whole, and access data for all columns in that field before moving on to the next field.

Get Marker Map Field Names

Syntax

list of field names = marker mapped dataModel object.markerMapFieldNames()

Example

myFieldNameList = myMarkerMappedDM.markerMapFieldNames()

This command returns a list containing the names of the fields contained in the applied marker map in a Python List. The names are listed in the order in which they appear in the marker map.

Get Marker Map Field Types

Syntax

list of field types = marker mapped dataModel object.markerMapFieldTypes()

Example

myFieldNameList = myMarkerMappedDM.markerMapFieldTypes()

This command returns a list containing the type of each column contained in the applied marker map in a Python List. The types are listed in the order in which they appear in the marker map.

The returned types may take on one of the following values:

  • 1: Integer
  • 2: Real
  • 3: Categorical

Get the Marker Map Offset in a Marker Mapped Spreadsheet

Syntax

offset = marker mapped dataModel object.markerMapOffset()

Example

myOffset = myMarkerMappedDM.markerMapOffset()

This command returns the the offset of the mapped markers in the spreadsheet. If the first column (or row, in a row-mapped spreadsheet) is mapped, 0 is returned. For example, if there are 5 unmapped columns (or rows, in a row-mapped spreadsheet) before the first mapped one, 5 will be returned.

Get the Marker Map Orientation of a Spreadsheet

Syntax

orientation = dataModel object.getMarkerMapOrientation()

Example

myOrientation = myDM.getMarkerMapOrientation()

This command returns the orientation of the marker map. This function returns the column type as one of the following values.

  • 0: There is no marker map applied
  • 1: The map is oriented along the columns of the spreadsheet
  • 2: The map is oriented along the rows of the spreadsheet
Analysis with Spreadsheet Objects

Once a scripting spreadsheet has been created either by importing data or by building a dataset, the following commands can be used to analyze the spreadsheet’s data.

CNAM Segmentation

Syntax

list of objects = spreadsheet object.cnamSegmentation([Optional Parameters])

Example

myResultList = mySS.cnamSegmentation(algorithm = ghi.const.SegmentMultivariate, movingWindow = 0, maxSegments = 20, memSize = 512)

This command performs CNAM optimal segmenting analysis on the current spreadsheet object and returns a list of objects containing the results of the segmentation procedure.

The objects and indexes of the list returned from this command are as follows:

  • 0: Segmentation Covariates Every Column spreadsheet
  • 1: Segmentation Covariates First Column spreadsheet
  • 2: Segment List spreadsheet
  • 3: Segmentation Run Log viewer

All parameters are specified by keyword arguments and are optional. They are as follows:

  • algorithm: Indicates whether a univariate or multivariate algorithm should be used. The possible values are:
    • ghi.const.SegmentUnivariate (default): univariate algorithm
    • ghi.const.SegmentMultivariate: multivariate algorithm
    • specifyMemoryLimit: If algorithm = ghi.const.SegmentMultivariate, indicates that CNAM should use a user-specified memory limit.
    • memoryLimit: If specifyMemoryLimit = True, memory limit in MB.
  • useMovingWindow: Indicates whether a moving window should be used as a speed optimization. The possible values are:
    • 0 (default): no moving window will be used
    • 1: a moving window should be used
    • windowSize: If applicable, indicates the size of the moving window to be used. The possible values are:
      • 20000 (default)
      • any positive integer value
  • maxSegmentsPer10k: Indicates the maximum number of expected segments in each window. The possible values are:
    • 10 (default)
    • any positive integer value
  • minMarkers: Indicates the minimum number of markers needed to be considered a segment. The possible values are:
    • 1 (default)
    • any positive integer value
  • maxPairwisePVal: Indicates the significance level for comparing adjacent segments. The possible values are:
    • 0.005 (default)
    • any floating point value between 0.0 and 1.0
    • NOTE: The significance level can be much higher if running multivariate segmentation.
    • NOTE: If the significance level is set to 1, permutation testing is not done.
  • numThreads: Indicates the number of threads to use when computing optimal segments. The possible values are:
    • 2 (default)
    • any integer 1 numThreads 64
  • outputEveryColumn: Indicates whether to create output for each marker. The possible values are:
    • 0: Do not create output for each marker
    • 1 (default): create output for each marker
    • NOTE: At least one Segmentation Covariates (Every Column or First Column) must be outputted.
  • outputFirstColumn: Indicates whether to create output for the first column of each segment. The possible values are:
    • 0: Do not create output for the first column of each segment
    • 1 (default): create output for the first column of each segment
    • NOTE: At least one Segmentation Covariates (Every Column or First Column) must be outputted.
  • memSize: Indicates the amount of memory(in MB) to use during segmentation. The possible values are:
    • 256 (default)
    • any integer 128 memSize 32768
  • wiggleFile: Indicates that UCSC wiggle files (one for each row/sample) should be written to the given location.
  • removeOutliers: Indicates whether or not to remove univariate outliers. The possible values are:
    • 0: do not remove outliers
    • 1 (default): remove outliers
  • fullLogging: Indicates whether or not to produce detailed log output.
    • 0 (default): Output basic logging.
    • 1: Output detailed logging.
  • useHardwareAcceleration: Indicates whether to use OpenCL accelerated segmentation.
    • 0 (default): disable OpenCL acceleration
    • 1: enable OpenCL acceleration
    • openCLDevice: If useHardwareAcceleration = True, the one-based index of the device to use.
      • 1(default): Use the first available device.
      • an integer specifying the index of the device to use, such that 1 ¡= device ¡= number of devices.

CNV Association Tests

Syntax

list of objects = spreadsheet object.cnvAssociationTests([Optional Parameters])

Example

myResultList = mySS.cnvAssociationTests(tTest = 1, corrTrend = 1, bonferroni = 0)

This command performs CNV association tests using the current spreadsheet and creates a list of spreadsheets containing the test results. The list of spreadsheets will be of consistent size, and if a particular spreadsheet is not created during the tests, then the returned list will contain 0 at the corresponding index.

The following is an outline of the returned list of spreadsheets by list index:

  • 0: Association test output
  • 1: PCA-corrected input
  • 2: Principal component spreadsheet
  • 3: Principal component eigenvalues
  • 4: PCA outlier spreadsheet

All parameters are specified by keyword arguments and are optional. They are as follows:

  • tTest: Indicates whether or not to perform a T-Test. The possible values are:
    • 0 (default): do not perform a T-Test
    • 1: perform a T-Test
  • corrTrend: Indicates whether or not to perform the Correlation/Trend Test. The possible values are:
    • 0 (default): do not perform Correlation/Trend Test
    • 1: perform Correlation/Trend Test
  • regression: Indicates whether or not to perform regression. The possible values are:
    • 0 (default): do not perform regression
    • 1: perform regression
  • bonferroni: Indicates whether or not to use Bonferroni adjustment. The possible values are:
    • 0: do not use Bonferroni adjustment
    • 1 (default): use Bonferroni adjustment
  • fdr: Indicates whether or not use False Discovery Rate. The possible values are:
    • 0: do not use False Discovery Rate
    • 1 (default): use False Discovery Rate
  • singleValuePermutations: Indicates whether or not to perform single value permutation tests. The possible values are:
    • 0 (default): do not perform single value permutation tests
    • 1: perform single value permutation tests
  • fullScanPermutations: Indicates whether or not to perform full-scan permutation tests. The possible values are:
    • 0 (default): do not perform full scan permutation tests
    • 1: perform full scan permutation tests
  • numPermutations: (if applicable) indicates the number of permutations to be used for the selected permutation tests. The possible values are:
    • 0 (default)
    • any positive integer value 3
  • outputPPQQ: Indicates whether to output data for P-P/Q-Q plots. The possible values are:
    • 0 (default): do not output P-P/Q-Q data
    • 1: output P-P/Q-Q data
  • dataCentering: Indicates whether perform data centering by marker, sample, or both. The possible values are:
    • ghi.const.PcaUncentered (default): do not perform data centering
    • ghi.const.PcaCenteredByMarker: perform data centering by marker
    • ghi.const.PcaCenteredBySample: perform data centering by sample
    • ghi.const.PcaCenteredByMarkerAndSample: perform data centering by marker and by sample
  • usePca: Indicates whether or not to use PCA. The possible values are:
    • 0 (default): do not use PCA
    • 1: use PCA
  • usePcaForDependent: If applicable, indicates whether to use a PCA corrected dependent. The possible values are:
    • 0 (default): do not use a PCA corrected dependent
    • 1: use a PCA corrected dependent

The following PCA parameters are available if usePca = 1 was specified:

  • pcaPrecomputedSheet: Indicates the use of pre-computed components. The possible values are:
    • the node ID (spreadsheet number) of the pre-computed principal components spreadsheet
  • pcaMaxTopComponents: Indicates the maximum number of PCA components. The possible values are:
    • 10 (default)
    • any positive integer value
  • pcaOutputCorrected: (if applicable) indicates whether or not to output a spreadsheet containing PCA corrected values. The possible values are:
    • 0 (default): do not output corrected input data
    • 1: output corrected input data
  • pcaOutputPcSheet: (if applicable) indicates whether or not to output a principal components spreadsheet. The possible values are:
    • 0: do not output a principal components spreadsheet
    • 1 (default): output a principal components spreadsheet
  • pcaOutputEigenSheet: (if applicable) indicates whether or not to output a spreadsheet of eigenvalues. The possible values are:
    • 0: do not output an eigenvalue spreadsheet
    • 1 (default): output an eigenvalue spreadsheet
  • pcaRecompute: (if applicable) whether or not to remove outliers and recompute principal components. The possible values are:
    • 0 (default): do not recompute principal components
    • 1: recompute principal components
  • pcaRecompStdDev: (if applicable) indicates the number or standard deviations with which to identify outliers. The possible values are:
    • 6.0 (default)
    • any floating point value
  • pcaRecompCount: (if applicable) indicates the number of times to recompute principal components. The possible values are:
    • 0 (default)
    • any positive integer value
  • pcaRecompComponents: (if applicable) indicates the number of components used in identifying outliers. The possible values are:
    • 5 (default)
    • any positive integer value

Find LD Between Two Markers

Syntax

LD value = spreadsheet object.computeLD(first column number, second column number, method, output statistic type, [Optional Parameters])

Example

myLDValue = mySS.computeLD(2, 5, ghi.const.ImputeEM, ghi.const.StatRSquared, imputeMissings = 1, maxIters = 40, convTolerance = .001)

Syntax

LD value = dataModel object.computeLD(first column number, second column number, method, output statistic type, [Optional Parameters])

Example

myLDValue = myDM.computeLD(2, 5, ghi.const.ImputeEM, ghi.const.StatRSquared, imputeMissings = 1, maxIters = 40, convTolerance = .001)

This function returns the LD value between the two specified marker columns. This function can be used on both spreadsheet and dataModel objects.

The first four parameters are required. The first two are the column numbers of the markers. The third is the method parameter, which may have two possible settings:

  • ghi.const.ImputeCHM CHM algorithm
  • ghi.const.ImputeEM EM algorithm

The fourth required parameter is the type of output statistic requested, which may have two possible settings:

  • ghi.const.StatRSquared R-squared
  • ghi.const.StatDPrime D-prime

The three optional parameters are specified by keyword arguments. The keyword arguments and defaults are listed below. Note: These keyword arguments are only applicable when using the EM algorithm. Note: Not all keyword arguments need to be used.

  • imputeMissings: Indicates whether or not to impute missing values.
    • 0 (default): do not impute missing values
    • 1: impute missing values
  • maxIters: Indicates the maximum number of EM iterations performed regardless of reaching convergence
    • (default) 50
    • any integer 1
  • convTolerance: Indicates the desired tolerance that the EM algorithm will use to determine convergence
    • (default) 0.0001
    • any floating point value 0

Compute Genotype Allele Counts

Syntax

genotype allele counts = spreadsheet object.genotypeAlleleCounts(column number)

Syntax

genotype allele counts = dataModel object.genotypeAlleleCounts(column number)

This function returns a list of pairs of allele names and the ocunt of those alleles in a genotypic column. The missing allee would always be the last in the list regardless of its count. This function works on both spreadsheet and dataModel objects.

For example, if a genotypic columnn had the values:

  • [’A_A’, ’A_B’, ’A_A’, ’B_B’, ’?_?’]

Calling this function on that column would return: [[’A’, 5], [’B’,3], [’?’,2]]

For bi-allelic columsn, the major allele will always be the first and the minor allee the second in the list.

Filter Data by Genotype

Syntax

marker filter status spreadsheet = spreadsheet object.filterGenotypes([Optional Parameters])

OR

spreadsheet object.filterGenotypes([Optional Parameters])

Example

myStatusSpreadsheet = mySS.filterGenotypes(filterBasis = ghi.const.ControlsOnly, callRate = [ghi.const.LessOrEqual, .75], maf = [ghi.const.LessThan, .07])

This command performs genotype filtering, with an option to deactivate columns that do not pass filtering criteria, as well as an option to create a spreadsheet containing column statistics and filtering status.

This command returns a spreadsheet containing statistics and a filter status for each marker, unless the optional parameter outputMarkersSheet is set to zero. In that case, nothing is returned from this command.

All parameters are specified by keyword arguments and are optional. Four are general parameters. These are as follows:

  • outputMarkersSheet: Indicates whether or not to output a spreadsheet containing statistics and filter status for each marker.
    • 0: do not create a spreadsheet
    • 1 (default): output a spreadsheet
  • inactivateCols: Indicates whether or not to deactivate columns that do not meet filtering criteria.
    • 0: do not inactivate columns
    • 1 (default): inactivate columns
  • filterBasis: Indicates what samples to filter markers by.
    • ghi.const.CasesAndControls (default): filter using all data
    • ghi.const.ControlsOnly: filter using controls only
    • ghi.const.CasesOnly: filter using cases only
  • outputNegLog: If applicable, indicates whether or not to output negative log p-values in the filtering status spreadsheet.
    • 0 (default): do not output negative log P-Values
    • 1: output negative log P-Values

The following parameters indicate which test statistics to calculate. The use of any of these parameters specifies that the corresponding test statistic should be calculated. The values for each of the parameters should be a list where the first index contains an integer indicating what type of filter to use, and the second index should contain a floating point value representing the threshold to use when filtering.

Filtering types are represented as follows:

  • ghi.const.LessThan: drop values less than the selected threshold
  • ghi.const.LessOrEqual: drop values less than or equal to the selected threshold
  • ghi.const.GreaterThan: drop values greater than the selected threshold
  • ghi.const.GreaterOrEqual: drop values greater than or equal to the selected threshold

The parameters that require values in this form are as follows:

  • callRate: indicates whether or not to filter on call rates
  • maf: indicates whether or not to filter on minor allele frequency
  • hwep: indicates whether or not to filter on HWE P-Value
  • fisherHwep: indicates whether or not to filter on Fishers Exact HWE P-Val
  • signedHweR: indicates whether or not to filter on Signed HWE R

Genotype Association Tests

Syntax

list of objects = spreadsheet object.genotypeAssociationTests([Optional Parameters])

Example

myResultList = mySS.genotypeAssociationTests(corrTrend = 1, outputPPQQ = 1, usePca = 1, pcaMaxTopComponents = 15, genoCounts = 1)

This command performs genotype association tests on the current spreadsheet, creating a list of output spreadsheets dependent on the options used in the tests. The list of spreadsheets that is returned will contain 0 for indexes where the corresponding spreadsheet is not created. Note that the tests available when running this analysis are dependent on the type of the dependent spreadsheet column, the genetic model selected, and other parameters specified.

The returned list of spreadsheets objects will always be the same size. If a spreadsheet that exists in a specific index is not created, the list will contain 0 for that index. The following is an outline of the returned list of spreadsheets by list index:

  • 0: Association test output spreadsheet
  • 1: PCA-corrected input spreadsheet
  • 2: Principal component spreadsheet
  • 3: Principal component eigenvalues spreadsheet
  • 4: PCA outlier spreadsheet

All parameters are specified by keyword arguments and are optional. They are as follows:

  • geneticModel: Indicates the genetic model or tests used. The possible values are:
    • ghi.const.ModelAllelic: Basic Allele Tests
    • ghi.const.ModelGenotypic: Genotypic Tests
    • ghi.const.ModelAdditive (default): Additive model
    • ghi.const.ModelDominant: Dominant model
    • ghi.const.ModelRecessive: Recessive model

The following tests are available to be performed when applicable:

  • corrTrend: Indicates whether or not to perform the Correlation/Trend Test
  • chiSq: Indicates whether or not to perform a Chi-Squared Test
  • armitage: Indicates whether or not to perform a Cochran-Armitage Test
  • exactArmitage: Indicates whether or not to perform an exact form of the Cochran-Armitage Test
  • fishers: Indicates whether or not to perform Fishers Exact Test
  • oddsRatio: Indicates whether or not to use Odds Ratio
  • analysisDev: Indicates whether or not to use Analysis of Deviance
  • regression: Indicates whether or not to perform regression
  • fTest: Indicates whether or not to perform an F-Test

For each of the above tests, use

  • 0 (default): do not perform the specified test
  • 1: perform the specified test

The following multiple testing/false positive adjustments are available:

  • bonferroni: Indicates whether or not to use Bonferroni adjustment
  • fdr: Indicates whether or not use False Discovery Rate
  • singleValuePermutations: Indicates whether or not to perform single value permutation tests
  • fullScanPermutations: Indicates whether or not to perform full-scan permutation tests

For each of the above adjustments, use

  • 0 (default): do not use the specified adjustment
  • 1: use the specified adjustment

Other association test parameters are as follows:

  • numPermutations: If applicable, indicates the number of permutations to be used for the selected permutation tests. The possible values are:
    • 0 (default)
    • any positive integer 3
  • showInflationFactor: Indicates whether or not to show inflation factor, Chi-Squares and corrected values used in genomic control. The possible values are:
    • 0 (default): do not show the inflation factor
    • 1: show inflation factor and related values
  • specifyInflationFactor: If applicable, specify an inflation factor to use for genomic control. The possible values are:
    • 0 (default): do not specify an inflation factor
    • 1: specify an inflation factor
  • inflationFactor: If applicable, the inflation factor (lambda) to use for genomic control. The possible values are:
    • 0.0 (default)
    • any floating point value 1
  • useMissings: Indicates whether or not to use missing values. The possible values are:
    • 0 (default) drop missing values
    • 1 use missing values
  • outputPPQQ: Indicates whether to output data for P-P/Q-Q plots. The possible values are:
    • 0 (default) do not output P-P/Q-Q data
    • 1 output P-P/Q-Q data
  • usePca: Indicates whether or not to use PCA. The possible values are:
    • 0 (default) do not use PCA
    • 1 use PCA

The following PCA parameters are available if usePca = 1 was specified:

  • pcaPrecomputedSheet: Indicates the use of pre-computed components. The possible values are:
    • the node ID (spreadsheet number) of the pre-computed principal components spreadsheet
  • pcaMaxTopComponents: Indicates the maximum number of PCA components. The possible values are:
    • 10 (default)
    • any positive integer value
  • pcaOutputCorrected: Indicates whether or not to output a spreadsheet containing PCA corrected values. The possible values are:
    • 0 (default): do not output corrected input data
    • 1: output corrected input data
  • pcaOutputPcSheet: Indicates whether or not to output a principal components spreadsheet. The possible values are:
    • 0: do not output a spreadsheet
    • 1 (default): output a spreadsheet
  • pcaOutputEigenSheet: Indicates whether or not to output an eigenvalues spreadsheet. The possible values are:
    • 0: do not output spreadsheet
    • 1 (default): output spreadsheet
  • pcaNormalization: Indicates which method of PCA marker normalization to use for each marker. The possible values are:
    • ghi.const.PcaNormDefault (default): normalize by theoretical std deviation under HWE
    • ghi.const.PcaNormActual: normalize by actual standard deviation
    • ghi.const.PcaNormNone: do not normalize marker data
  • pcaRecompute: Indicates whether or not to remove outliers and recompute principal components. The possible values are:
    • 0 (default): do not recompute principal components
    • 1: recompute principal components

If pcaRecompute = 1 was specified, then the following PCA parameters are applicable:

  • pcaRecompStdDev: The number of std deviations to identify outliers. The possible values are:
    • 6.0 (default)
    • any floating point value
  • pcaRecompCount: Indicates the number of times to recompute PCs. The possible values are:
    • 0 (default)
    • any positive integer value
  • pcaRecompComponents: Indicates the number of PCs used in finding outliers. The possible values are:
    • 5 (default)
    • any positive integer value

Genotype statistics may be output using the following parameters:

  • maf: Indicates whether to output minor allele frequency.
  • callRate: Indicates whether or not to output call rates.
  • fisherHwep: Indicates whether to output Fishers HWE P-Values.
  • hwep: Indicates whether to calculate HWE P-Values.
  • signedHweR: Indicates whether or not to output Signed HWE R.
  • genoCounts: Indicates whether or not to output genotype counts.
  • alleleCounts: Indicates whether or not to output allele counts.

For each of the above genotype statistics, use

  • 0 (default): do not output the specified value
  • 1: output the specified value

Genotypic Principal Components Analysis

Syntax

list of objects = spreadsheet object.genotypePCA([Optional Parameters])

Example

mySSList = mySS.genotypePCA(pcaMaxTopComponents = 35, pcaNormalization = ghi.const.PcaNormNone, pcaOutputCorrected = 1)

This command performs PCA analysis and returns a list of spreadsheet objects, the contents of which will vary depending on the options used.

The returned list of spreadsheets objects will always be the same size. If a spreadsheet that exists in a specific index is not created, the list will contain 0 for that index. The following is an outline of the returned list of spreadsheets by list index:

  • 0: PCA-corrected input spreadsheet
  • 1: Principal component spreadsheet
  • 2: Principal component eigenvalues spreadsheet
  • 3: PCA outlier spreadsheet

All parameters are specified by keyword arguments and are optional. They are as follows:

  • pcaPrecomputedSheet: Indicates the use of pre-computed components. The possible values are:
    • the node ID (spreadsheet number) of the pre-computed principal components spreadsheet
  • pcaGeneticModel: Indicates which model to perform PCA analysis under. The possible values are:
    • ghi.const.PcaModelAdditive (default): additive model
    • ghi.const.PcaModelDominant: dominant model
    • ghi.const.PcaModelRecessive: recessive model
  • pcaNormalization: Indicates which method of PCA marker normalization to use for each marker. The possible values are:
    • ghi.const.PcaNormDefault (default): normalize by theoretical std deviation under HWE
    • ghi.const.PcaNormActual: normalize by actual standard deviation
    • ghi.const.PcaNormNone: do not normalize marker data
  • pcaMaxTopComponents: Indicates the maximum number of PCA components. The possible values are:
    • 10 (default)
    • any positive integer value
  • pcaOutputCorrected: Indicates whether or not to output a spreadsheet containing PCA corrected values. The possible values are:
    • 0 (default): do not output corrected input data
    • 1: output corrected input data
  • pcaOutputPcSheet: Indicates whether or not to output a principal components spreadsheet. The possible values are:
    • 0: do not output a spreadsheet
    • 1 (default): output a spreadsheet
  • pcaOutputEigenSheet: Indicates whether or not to output an eigenvalues spreadsheet. The possible values are:
    • 0: do not output spreadsheet
    • 1 (default): output spreadsheet
  • pcaRecompute: Indicates whether or not to remove outliers and recompute principal components. The possible values are:
    • 0 (default): do not recompute principal components
    • 1: recompute principal components

If pcaRecompute = 1 was specified, then the following parameters are applicable:

  • pcaRecompStdDev: The number of std deviations to identify outliers. The possible values are:
    • 6.0 (default)
    • any floating point value
  • pcaRecompCount: Indicates the number of times to recompute PCs. The possible values are:
    • 0 (default)
    • any positive integer value
  • pcaRecompComponents: Indicates the number of PCs used in finding outliers. The possible values are:
    • 5 (default)
    • any positive integer value

Genotypic Statistics by Marker

Syntax

statistics spreadsheet object = spreadsheet object.genotypeStatsByMarker([Optional Parameters])

Example

mySS.genotypeStatsByMarker(maf = 1, callRate = 1, hwep = 1, signedHweR = 1)

This command creates an output spreadsheet containing the selected statistics.

All parameters are specified by keyword arguments and are optional. They are as follows:

  • maf: Indicates whether to output minor allele frequency.
  • callRate: Indicates whether or not to output call rates.
  • fisherHwep: Indicates whether to output Fishers HWE P-Values.
  • hwep: Indicates whether to calculate HWE P-Values.
  • signedHweR: Indicates whether or not to output Signed HWE R.
  • genoCounts: Indicates whether or not to output genotype counts.
  • alleleCounts: Indicates whether or not to output allele counts.
  • outputNegLog: Indicates whether or not to output negative log p-values

For each of the above genotype statistics, use

  • 0 (default): do not output the specified value
  • 1: output the specified value

Genotypic Statistics by Sample

Syntax

statistics spreadsheet object = spreadsheet object.genotypeStatsBySample([Optional Parameters])

Example

mySS.genotypeStatsBySample(callRate = 1, hwep = 1, outputNegLog = 1)

This command creates an output spreadsheet containing the selected statistics.

All parameters are specified by keyword arguments and are optional. They are as follows:

  • callRate: Indicates whether or not to output call rates.
  • hwep: Indicates whether to calculate HWE Thw P-Values.
  • outputNegLog: Indicates whether or not to output negative log p-values. This parameter can only be used if hwep = 1.

For each of the above genotype statistics, use

  • 0 (default): do not output the specified value
  • 1: output the specified value

Haplotype Association Tests

Syntax

result spreadsheet object = spreadsheet object.haplotypeAssociationTests([Optional Parameters])

Example

myResultsSS = mySS.haplotypeAssociationTests(blockDefinitionSource = ghi.const.HapBlockAllMarkers, chiSq = 1, oddsRatio = 1, fdr = 1)

This command runs haplotype association tests on the current spreadsheet and returns a reference to the results spreadsheet.

All parameters are specified by keyword arguments and are optional. Note that some parameters are only valid when other parameters are specified. They are as follows:

  • blockDefinitionSource: Indicates the source of the haplotype block definitions. The possible values are:
    • ghi.const.HapBlockPrecomputed (default): perform tests on precomputed blocks
    • ghi.const.HapBlockAllMarkers: perform tests on all markers as a single block
    • ghi.const.HapBlockMovingWindow: perform tests using a moving window
  • blockDefinitionSheet: If applicable, specifies a spreadsheet containing haplotype block information. The possible values are:
    • the node id of the haplotype block spreadsheet
    • the column number of the haplotype block column
  • movingWindowType: If applicable, indicates whether to use a fixed or dynamic moving window. The possible values are:
    • 1 (default) use a fixed window
    • 2 use a dynamic window
  • fixedWindowSize: If applicable, indicates the fixed window size. The possible values are:
    • 2 (default)
    • any integer value 1
  • dynamicWindowBasePairs: If applicable, indicates the dynamic window size in kilo-base pairs. The possible values are:
    • 10 (default)
    • any integer value 1
  • limitDynamicMaxCols: If applicable, indicates the maximum dynamic moving window size. The possible values are:
    • 20 (default)
    • any integer value 1
  • haploCalculationType: (Optional) indicates whether to calculate tests on a per haplotype or per block basis. The possible values are:
    • 1 (default) calculate tests per haplotype
    • 2 calculate tests per block
  • chiSq: (Optional) indicates whether to calculate chi-squared. The possible values are:
    • 0 (default) do not calculate chi-squared
    • 1 calculate chi-squared
  • oddsRatio: (Optional) indicates whether or not to calculate odds ratio. The possible values are:
    • 0 (default) do not calculate odds-ratio
    • 1 calculate odds-ratio
  • regression: (Optional) indicates whether or not to perform regression tests. The possible values are:
    • 0 (default) do not perform regression tests
    • 1 perform regression tests
  • haplotypeMethod: (Optional) indicates whether to use the EM or CHM algorithm. The possible values are:
    • 1 (default) calculate haplotype frequencies using the EM algorithm
    • 2 calculate haplotype frequencies using the CHM algorithm
  • maxEmIterations: (Optional) maximum number of EM iterations regardless of reaching convergence. The possible values are:
    • 50 (default)
    • any integer value 1
  • emConvergeTolerance: (Optional) desired tolerance the EM algorithm will use to determine when it reaches convergence. The possible values are:
    • 0.00001 (default)
    • any floating point value
  • frequencyThreshold: (Optional) minimum imputed frequency to determine whether a haplotype is used. The possible values are:
    • 0.01 (default)
    • any floating point value 0 < frequency < 1
  • imputeMissing: (Optional) indicates whether or not to impute missing values. The possible values are:
    • 0 (default) do not impute missing values
    • 1 impute missing values
  • bonferroni: (Optional) indicates whether or not to use Bonferroni adjustment. The possible values are:
    • 0 do not use Bonferroni adjustment
    • 1 (default) use Bonferroni adjustment
  • fdr: (optional) Indicates whether or not use False Discovery Rate. The possible values are:
    • 0 do not use False Discovery Rate
    • 1 (default) use False Discovery Rate
  • singleValuePermutations: (Optional) indicates whether or not to perform single value permutation tests. The possible values are:
    • 0 (default) do not perform single value permutation tests
    • 1 perform single value permutation tests
  • fullScanPermutations: (Optional) indicates whether or not to perform full-scan permutation tests. The possible values are:
    • 0 (default) do not perform full scan permutation tests
    • 1 perform full scan permutation tests
  • numPermutations: If applicable, indicates the number of permutations to be used for the selected permutation tests. The possible values are:
    • 0 (default)
    • any positive integer value 3
  • outputPPQQ: (Optional) indicates whether to output data for P-P/Q-Q plots. The possible values are:
    • 0 (default) do not output P-P/Q-Q data
    • 1 output P-P/Q-Q data
  • outputHaplotypeFreq: (Optional) indicates whether or not to output haplotype frequencies. The possible values are:
    • 0 (default) do not output frequencies
    • 1 output frequencies
  • outputNegLog: (Optional) indicates whether or not to output -log(P) values. The possible values are:
    • 0 (default) do not output -log(p) values
    • 1 output -log(p) values

Haplotype Block Detection

Syntax

result spreadsheet object = spreadsheet object.haplotypeBlockDetection(method, [Optional Parameters])

Example

myHapBlocksSS = mySS.haplotypeBlockDetection(ghi.const.BlockDetectGabriel, lowerConfidenceBound = .75, imputeMissing = 1)

This command create a haplotype blocks spreadsheet and returns the new sheet.

This command takes one required parameter, which is the method used in the block detection. The possible value is:

  • ghi.const.BlockDetectGabriel minimize historical recombination (Gabriel et al.)

This command also takes the following optional keyword arguments, as follows:

  • upperConfidenceBound: The minimum upper bound of the D’ statistic. The possible values are:
    • 0.98 (default)
    • any floating point value 0.0001 < value < 0.999
  • lowerConfidenceBound: The minimum lower bound of the D’ statistic. The possible values are:
    • 0.70 (default)
    • any floating point value 0.0001 < value < 0.999
  • confidenceLevel: Statistical confidence level. The possible values are:
    • ghi.const.Confidence90 corresponds to a confidence of 0.90
    • ghi.const.Confidence95 (default) corresponds to a confidence of 0.95
    • ghi.const.Confidence99 corresponds to a confidence of 0.99
  • minUpperConfidence: The reject criteria based on the upper bound of the D’ statistic. The possible values are:
    • 0.90 (default)
    • any floating point value 0.0001 < value < 0.999
  • minMAF: Minor allele frequency to filter SNPs from blocks. The possible values are:
    • 0.05 (default)
    • any floating point value 0.00001 < value < 0.999
  • maxMarkers: Maximum number of SNPs in a block. The possible values are:
    • 30 (default)
    • any integer value 1
  • maxBlockLength: Maximum length of a block in kilobase pairs. The possible values are:
    • 160 (default)
    • any integer value 1
  • haplotypeMethod: Indicates the method used to estimate haplotype frequencies. The possible values are:
    • 1 (default) EM algorithm
    • 2 CHM algorithm
  • maxEmIterations: Maximum number of EM iterations regardless of reaching convergence. The possible values are:
    • 50 (default)
    • any integer value 1
  • emConvergeTolerance: Desired tolerance the EM algorithm will use to determine when it reaches convergence. The possible values are:
    • 0.00001 (default)
    • any floating point value
  • frequencyThreshold: Minimum imputed frequency to determine whether a haplotype is used. The possible values are:
    • 0.01 (default)
    • any floating point value 0 < frequency < 1
  • imputeMissing: Indicates whether or not to impute missing values. The possible values are:
    • 0 (default) do not impute missing values
    • 1 impute missing values

Numeric Association Tests

Syntax

list of objects = spreadsheet object.numericAssociationTests([Optional Parameters])

Example

myResultList = mySS.numericAssociationTests(tTest = 1, corrTrend = 1, bonferroni = 0)

This command performs numeric association tests using the current spreadsheet and creates a list of spreadsheets containing the test results. The list of spreadsheets will be of consistent size, and if a particular spreadsheet is not created during the tests, then the returned list will contain 0 at the corresponding index.

The following is an outline of the returned list of spreadsheets by list index:

  • 0: Association test output
  • 1: PCA-corrected input
  • 2: Principal component spreadsheet
  • 3: Principal component eigenvalues
  • 4: PCA outlier spreadsheet

All parameters are specified by keyword arguments and are optional. They are as follows:

  • tTest: Indicates whether or not to perform a T-Test. The possible values are:
    • 0 (default): do not perform a T-Test
    • 1: perform a T-Test
  • corrTrend: Indicates whether or not to perform the Correlation/Trend Test. The possible values are:
    • 0 (default): do not perform Correlation/Trend Test
    • 1: perform Correlation/Trend Test
  • regression: Indicates whether or not to perform regression. The possible values are:
    • 0 (default): do not perform regression
    • 1: perform regression
  • bonferroni: Indicates whether or not to use Bonferroni adjustment. The possible values are:
    • 0: do not use Bonferroni adjustment
    • 1 (default): use Bonferroni adjustment
  • fdr: Indicates whether or not use False Discovery Rate. The possible values are:
    • 0: do not use False Discovery Rate
    • 1 (default): use False Discovery Rate
  • singleValuePermutations: Indicates whether or not to perform single value permutation tests. The possible values are:
    • 0 (default): do not perform single value permutation tests
    • 1: perform single value permutation tests
  • fullScanPermutations: Indicates whether or not to perform full-scan permutation tests. The possible values are:
    • 0 (default): do not perform full scan permutation tests
    • 1: perform full scan permutation tests
  • numPermutations: (if applicable) indicates the number of permutations to be used for the selected permutation tests. The possible values are:
    • 0 (default)
    • any positive integer value 3
  • outputPPQQ: Indicates whether to output data for P-P/Q-Q plots. The possible values are:
    • 0 (default): do not output P-P/Q-Q data
    • 1: output P-P/Q-Q data
  • dataCentering: Indicates whether to perform data centering by marker, sample, or both. The possible values are:
    • ghi.const.PcaUncentered (default): do not perform data centering
    • ghi.const.PcaCenteredByMarker: perform data centering by marker
    • ghi.const.PcaCenteredBySample: perform data centering by sample
    • ghi.const.PcaCenteredByMarkerAndSample: perform data centering by marker and by sample
  • usePca: Indicates whether or not to use PCA. The possible values are:
    • 0 (default): do not use PCA
    • 1: use PCA
  • usePcaForDependent: If applicable, indicates whether to use a PCA corrected dependent. The possible values are:
    • 0 (default): do not use a PCA corrected dependent
    • 1: use a PCA corrected dependent

The following PCA parameters are available if usePca = 1 was specified:

  • pcaPrecomputedSheet: Indicates the use of pre-computed components. The possible values are:
    • the node ID (spreadsheet number) of the pre-computed principal components spreadsheet
  • pcaMaxTopComponents: Indicates the maximum number of PCA components. The possible values are:
    • 10 (default)
    • any positive integer value
  • pcaOutputCorrected: Indicates whether or not to output a spreadsheet containing PCA corrected values. The possible values are:
    • 0 (default): do not output corrected input data
    • 1: output corrected input data
  • pcaOutputPcSheet: Indicates whether or not to output a principal components spreadsheet. The possible values are:
    • 0: do not output a spreadsheet
    • 1 (default): output a spreadsheet
  • pcaOutputEigenSheet: Indicates whether or not to output an eigenvalues spreadsheet. The possible values are:
    • 0: do not output spreadsheet
    • 1 (default): output spreadsheet
  • pcaRecompute: Indicates whether or not to remove outliers and recompute principal components. The possible values are:
    • 0 (default): do not recompute principal components
    • 1: recompute principal components

If pcaRecompute = 1 was specified, then the following PCA parameters are applicable:

  • pcaRecompStdDev: The number of std deviations to identify outliers. The possible values are:
    • 6.0 (default)
    • any floating point value
  • pcaRecompCount: Indicates the number of times to recompute PCs. The possible values are:
    • 0 (default)
    • any positive integer value
  • pcaRecompComponents: Indicates the number of PCs used in finding outliers. The possible values are:
    • 5 (default)
    • any positive integer value

Numeric Principal Components Analysis

Syntax

list of objects = spreadsheet object.numericPCA([Optional Parameters])

Example

mySSList = mySS.numericPCA(pcaMaxTopComponents = 27, pcaOutputCorrected = 1)

This command performs PCA analysis and returns a list of spreadsheet objects, the contents of which will vary depending on the options used.

The returned list of spreadsheets objects will always be the same size. If a spreadsheet that exists in a specific index is not created, the list will contain 0 for that index. The following is an outline of the returned list of spreadsheets by list index:

  • 0: PCA-corrected input spreadsheet
  • 1: Principal component spreadsheet
  • 2: Principal component eigenvalues spreadsheet
  • 3: PCA outlier spreadsheet

All parameters are specified by keyword arguments and are optional. They are as follows:

  • dataCentering: Indicates whether to perform data centering by marker, sample, or both. The possible values are:
    • ghi.const.PcaUncentered (default): do not perform data centering
    • ghi.const.PcaCenteredByMarker: perform data centering by marker
    • ghi.const.PcaCenteredBySample: perform data centering by sample
    • ghi.const.PcaCenteredByMarkerAndSample: perform data centering by marker and by sample
  • pcaPrecomputedSheet: Indicates the use of pre-computed components. The possible values are:
    • the node ID (spreadsheet number) of the pre-computed principal components spreadsheet
  • pcaMaxTopComponents: Indicates the maximum number of PCA components. The possible values are:
    • 10 (default)
    • any positive integer value
  • pcaOutputCorrected: Indicates whether or not to output a spreadsheet containing PCA corrected values. The possible values are:
    • 0 (default): do not output corrected input data
    • 1: output corrected input data
  • pcaOutputPcSheet: Indicates whether or not to output a principal components spreadsheet. The possible values are:
    • 0: do not output a spreadsheet
    • 1 (default): output a spreadsheet
  • pcaOutputEigenSheet: Indicates whether or not to output an eigenvalues spreadsheet. The possible values are:
    • 0: do not output spreadsheet
    • 1 (default): output spreadsheet
  • pcaRecompute: Indicates whether or not to remove outliers and recompute principal components. The possible values are:
    • 0 (default): do not recompute principal components
    • 1: recompute principal components

If pcaRecompute = 1 was specified, then the following parameters are applicable:

  • pcaRecompStdDev: The number of std deviations to identify outliers. The possible values are:
    • 6.0 (default)
    • any floating point value
  • pcaRecompCount: Indicates the number of times to recompute PCs. The possible values are:
    • 0 (default)
    • any positive integer value
  • pcaRecompComponents: Indicates the number of PCs used in finding outliers. The possible values are:
    • 5 (default)
    • any positive integer value

Numeric Regression

Syntax

list of objects = spreadsheet object.numericRegression([Optional Parameters])

Example

mySSList = mySS.numericRegression()

This command performs linear regression on the current spreadsheet object using the currently selected dependent column, and returns a list of objects.

The objects and their list indexes are as follows:

  • 0 = Regression results viewer
  • 1 = Regression results spreadsheet
  • 2 = Detailed regression results output

All parameters are specified by keyword arguments and are optional. They are as follows:

  • fullModelRegressors: Indicates how full model regressors are selected. The possible values are:
    • ghi.const.RegressPerColumn (default) regress once on each data column
    • ghi.const.RegressMovingWindow use a moving window of regressors
    • ghi.const.RegressSelected perform regression on selected covariates only

The following parameters are applicable when fullModelRegressors = 2:

  • movingWindowType: Indicates the type of moving window to use. The possible values are:
    • ghi.const.WindowFixed (default) fixed window
    • ghi.const.WindowDynamic dynamic window defined by size in genetic distance
  • fixedWindowSize: Indicates the size of the fixed moving window. The possible values are:
    • 1 (default)
    • any positive integer value
  • dynamicWindowBasePairs: Indicates the size of the dynamic window in bp. The possible values are:
    • 10000 (default)
    • any positive integer value
  • limitDynamicMaxCols: Indicates whether to limit the number of columns that can be spanned by the dynamic moving window. The possible values are:
    • 0 (default): do not limit the moving window size
    • 1: limit the moving window size
  • dynamicWindowMaxColumns: Indicates the maximum number of columns that can be included in the dynamic moving window. The possible values are:
    • 20 (default)
    • any positive integer
  • outputResidualSheet: Indicates if spreadsheet of residual values is output. The possible values are:
    • 0 (default): do not output a residual spreadsheet
    • 1: output a residual spreadsheet
  • useStepwise: Indicates whether or not to use stepwise regression. The possible values are:
    • 0 (default): do not use stepwise regression
    • 1: use stepwise regression

The following parameters are applicable when useStepwise = 1 was specified:

  • stepwiseCutoff: Indicates the P-Value cutoff for stepwise regression. The possible values are:
    • 0.01 (default)
    • a floating point value 0 stepwiseCutoff 1
  • stepwiseMethod: Indicates stepwise regression method. The possible values are:
    • ghi.const.StepwiseBackward (default) use backward elimination
    • ghi.const.StepwiseForward use forward selection
  • fullModelCovariates: Indicates the columns to use as full model covariates. The possible values are:
    • [] (default) no columns as full model covariates
    • a Python list of column numbers for covariates, ex: [2,3]
  • fullModelInteractions: Indicates pairs of columns to use as interactions in the full model covariates. The possible values are:
    • [[]] (default) no pairs of columns for interaction terms
    • a Python list of lists where each sublist contains the column numbers for the two columns that make up the interaction ex: [[2,3],[3,4]]
  • computeFullVsReducedModel: Indicates full only or full vs. reduced model. The possible values are:
    • 0 (default): do not compute the full vs. reduced model
    • 1: compute the full vs. reduced model

The following parameters are applicable if computeFullVsReducedModel = 1:

  • reducedModelCovariates: Indicates columns for reduced model covariates. The possible values are:
    • [] (default): no reduced model covariates
    • a Python list of column numbers for covariates, ex: [4,6]
  • reducedModelInteractions: Indicates pairs of columns for interaction terms in the reduced model covariates. The possible values are:
    • [[]] (default): no pairs of columns for interaction terms
    • a Python list of lists where each sublist contains the column numbers for the two columns that make up the interaction ex: [[3,4],[4,5]]
  • singleValuePermutations: Indicator for single value permutation testing. The possible values are:
    • 0 (default): do not perform single value permutation tests
    • 1: perform single value permutation tests
  • fullScanPermutations: Indicator for full scan permutation testing. The possible values are:
    • 0 (default): do not perform full scan permutation tests
    • 1: perform full scan permutation tests
  • numPermutations: If applicable, the number of permutations to use. The possible values are:
    • 0 (default)
    • any positive integer 3
  • outputPPQQ: Indicates whether or not to output data for P-P/Q-Q plots. The possible values are:
    • 0 (default): do not output data for P-P/Q-Q plots
    • 1: output data for P-P/Q-Q plots
  • fdr: Indicates whether or not to calculate False Discovery Rate. The possible values are:
    • 0: do not use False Discovery Rate
    • 1 (default): use False Discovery Rate
  • bonferroni: Indicates whether or not to calculate Bonferroni Adjustment . The possible values are:
    • 0: do not use Bonferroni Adjustment
    • 1 (default): use Bonferroni Adjustment
  • outputDetailedResults: Indicates that detailed output should be included if the criteria is satisfied. The possible values are:
    • [] (default)
    • a list with the following form:
      • index 0: value to threshold:
        • ghi.const.ThresholdMainP Main P-Value
        • ghi.const.ThresholdLogMainP Log main P-Value
        • ghi.const.ThresholdFullP Full model P-Value
        • ghi.const.ThresholdLogFullP Log full model P-Value
        • ghi.const.ThresholdRSquared R-Squared
      • index 1: type of threshold to be used:
        • ghi.const.LessThan <
        • ghi.const.LessOrEqual
        • ghi.const.GreaterThan >
        • ghi.const.GreaterOrEqual
      • index 2: a floating point value to be used as a threshold

      Note: Some threshold values may only be available for certain models and regressions.

Runs of Homozygosity

Syntax

list of objects = spreadsheet object.roh([Optional Parameters])

Example

mySSList = mySS.roh(minLengthSnps = 100)

This command tests for runs of homozygosity in the current spreadsheet object. It returns a list of 6 spreadsheet objects which will vary depending on the parameters used and the results of the test. If a spreadsheet is not created for any reason, the list will contain 0 at that index.

The list of returned objects and their list indexes is as follows:

  • 0 = cluster of runs summary spreadsheet
  • 1 = spreadsheet containing each run per sample
  • 2 = spreadsheet containing the incidence of common runs per SNP
  • 3 = spreadsheet containing the binary ROH run status
  • 4 = cluster of runs spreadsheet (first column of each cluster)
  • 5 = cluster of runs spreadsheet (every column of each cluster)

All parameters are specified by keyword arguments and are optional. They are as follows:

  • minLengthKBase: Specify a run based on length in kilo-base pairs. The possible values are:
    • any positive real value 1.0
  • minSnpsKBase: used in conjunction with minLengthKBase, sets the minimum number of SNPs in a run as determined by the length in k-bp. The possible values are:
    • 25 (default)
    • any integer value 2
  • minLengthSnps: Indicates the minimum number of SNPS that constitute a run. The possible values are:
    • any integer value 2
  • minSamples: Minimum number of samples that must contain a run to define a cluster of runs. The possible values are:
    • 20 (default)
    • any integer-value 1
  • allowHetero: Indicates whether or not heterozygotes can be included in a run. The possible values are:
    • 0: do not allow runs to contain any heterozygotes
    • 1 (default): allow runs to contain heterozygotes
  • maxHetero: Used in conjunction with allowHetero = 1, sets the maximum number of allowed heterozygotes. The possible values are:
    • 1 (default)
    • any integer value 1
  • restrictMissing: Indicates whether the number of allowed missing genotypes in a run should be restricted or not. The possible values are:
    • 0: allow any number of missing genotypes
    • 1 (default): allow runs to contain up to ’n’ missing genotypes
  • maxMissing: Used in conjunction with restrictMissing, sets the max number of allowed missing genotypes in a run. The possible values are:
    • 5 (default)
    • any integer value 0
  • restrictGap: Indicates whether the maximum distance between SNPs in a run should be restricted or not. The possible values are:
    • 0: do not restrict max gap between SNPs in a run
    • 1 (default): restrict max gap between SNPs in a run
  • maxGap: Used in conjunction with restrictGap, specifies the max gap in a run. The possible values are:
    • 100 (default)
    • any real-value 1.0
  • restrictDensity: Indicates whether the minimum density of SNPs a run should be restricted or not. The possible values are:
    • 0 (default): do not restrict min density of SNPs a run
    • 1: restrict min density of SNPs in a run
  • minDensity: Used in conjunction with restrictDensity, sets the min density of SNPs in a run. The possible values are:
    • 50 (default)
    • any real-value 1.0
  • createRunsSheet: Indicates whether to output a spreadsheet containing each homozygous run per sample. The possible values are:
    • 0: do not output a spreadsheet
    • 1 (default): output a spreadsheet
  • createIncidenceSheet: Indicates whether to output a spreadsheet containing the incidence of common runs per SNP The possible values are:
    • 0: do not output a spreadsheet
    • 1 (default): output a spreadsheet
  • createBinarySheet: Indicates whether or not to output a spreadsheet containing binary ROH run status. The possible values are:
    • 0: do not output a spreadsheet
    • 1 (default): output a spreadsheet
  • createClusterSheet: Indicates whether to output a spreadsheet containing summary information for clusters of runs if clusters were found. The possible values are:
    • 0: do not output a spreadsheet
    • 1 (default): output a spreadsheet
  • createFirstColClusterSheet: Indicates whether to output a spreadsheet containing cluster information for the first column of each cluster. The possible values are:
    • 0: do not output a spreadsheet
    • 1 (default): output a spreadsheet
  • createEveryColClusterSheet: Indicates whether to output a spreadsheet containing cluster information for every column of each cluster. The possible values are:
    • 0 (default): do not output a spreadsheet
    • 1: output a spreadsheet
Commands for Writing Spreadsheet Editor Scripts

These commands cannot be used in the Python Shell, they can only be used as part of a script where the method is named editData for editing entire spreadsheets, or editColumn for editing a particular column. The first parameter of either method must be dataEditModel, for editing a column the second parameter must be colIndex. All of the following commands listed below are part of the dataEditModel.

Scripts for the Spreadsheet Editor need to be saved in a folder in the “SVS Data Folder/user/SpreadsheetEditor” directory. Below is an example script to count the number of columns in a spreadsheet being edited. This script could be written using the Python Editor or a text editor and saved in this directory and used from the Spreadsheet Editor window by going to the Scripts menu.

’’’  
This script counts the number of columns in a spreadsheet being edited using the Spreadsheet Editor.  
’’’  
 
def editData(dataEditModel):  
    numberCols = dataEditModel.numCols()  
    ghi.message(’’The number of columns currently in the editor is ’’ + str(numberCols))

Get the Column Type

Syntax

new variable = dataEditModel.getColType(column number)

Example

myVariable = dataEditModel.getColType(3)

Returns an integer value corresponding to the column type. See the paragraph Commands for Spreadsheet Objects.

Get the Number of Columns

Syntax

new variable = dataEditModel.numCols()

Example

numberCols = dataEditModel.numCols()

Returns the number of columns in the spreadsheet being edited.

Get the Number of Rows

Syntax

new variable = dataEditModel.numRows()

Example

numberRows = dataEditModel.numRows()

Returns the number of rows in the spreadsheet being edited.

Get a Column

Syntax

new variable = dataEditModel.col(column number)

Example

myColumn = dataEditModel.col(3)

Returns a column from the spreadsheet as a Python list.

Get a Row

Syntax

new variable = dataEditModel.row(row number)

Example

myRow = dataEditModel.row(4)

Returns a row from the spreadsheet as a Python list.

Get a Cell

Syntax

new variable = dataEditModel.cell(row number, column number)

Example

myCell = dataEditModel.cell(4,3)

Returns a data value corresponding to the entry from the intersection of the specified row and column.

Add a Column at a Specified Column Number

Syntax

dataEditModel.addColAt(column number, new column header, column type, fill value)

Example

dataEditModel.addColAt(3, “New Binary Column”, ghi.const.TypeBinary, 0)

This command adds a column at the specified location and shifts all columns including the existing column at this position to the right and fills the new column with the specified value. These values can later be changed.

Move a Column to a Specified Column Number

Syntax

dataEditModel.moveColToPosition(column number, column position)

Example

dataEditModel.moveColToPosition(3, 10)

To move a column to a new position, use this command. There are two required parameters, the first parameter is the number of the column to move, and the second is the column number to move the column to.

Copy a Column to a Specified Column Number

Syntax

dataEditModel.copyColToPosition(column number, column position)

Example

dataEditModel.copyColToPosition(3, 10)

To copy a column to a specified column number, use this command. There are two required parameters, the number of the column to copy, and the position number to insert the copied column.

Delete a Column

Syntax

dataEditModel.deleteColAt(column number)

Example

dataEditModel.deleteColAt(5)

This command will delete the column corresponding to the specified column number.

Determine if a Column was Created in the Spreadsheet Editor

Syntax

new variable = dataEditModel.isColNew(column number)

Example

myVariable = dataEditModel.isColNew(3)

Returns a 1 if the column was added in the spreadsheet editor, and returns a 0 if the column existed before the data edit model was created.

Revert a Column to its Original State

Syntax

dataEditModel.revertColAt(column number)

Example

dataEditModel.revertColAt(3)

Revert a specified column to the column that existed before the data edit model was created. A column created in the Spreadsheet Editor or by using the dataEditModel can not be reverted.

Change the Data in the Spreadsheet Cells

Syntax

dataEditModel.setData(row number, column number, new value, check to see if the value is new indicator)

Example

dataEditModel.setData(312, 54, “A_A”, 1)

This command changes the data in a specified cell. There are four required parameters, the row number, the column number, the new value for the cell, and an indicator parameter specified below:

  • 0: Do not check to see if the specified value is new
  • 1 (default): Check to see if the specified value is new

Change Column Headers

Syntax

dataEditModel.setHeader(column number, new column header string)

Example

dataEditModel.setHeader(4, “New Col Header”)

This command changes the column header for the specified column. There are two required parameters, the column number and the new column header.

Change a Particular Row Label

Syntax

dataEditModel.setRowLabel(row number, new row label)

Example

dataEditModel.setRowLabel(56, “Sample56_R”)

This command changes the row label for the specified row. There are two required parameters, the row number and the new row label.

Change the Row Label Header

Syntax

dataEditModel.setRowLabelHeader(new row label header, check to see if row label header has changed)

Example

dataEditModel.setRowLabelHeader(“New Row Label Header”, 1)

Change the row label header and check to make sure the new row label header is different from the old header. There is one required parameter and one optional parameter for this command. The first (required) parameter is the new row label header. The second (optional) parameter indicates if the new row label header should be checked against the old row label header to see if the two labels are different.

  • 0: Don’t check to see if the two labels are different
  • 1 (default): Check to see if the two labels are different

Change the Row Label Column

Syntax

dataEditModel.setRowLabelColumn(column number)

Example

dataEditModel.setRowLabelColumn(4)

Copies the values at that column to the row labels. The specified column is retained, and not deleted.

Revert Row Labels to the Original Values

Syntax

dataEditModel.revertRowLabels()

Example

dataEditModel.revertRowLabels()

Revert the row labels to the labels that existed before the data edit model was created.

Make Generic Row Labels

Syntax

dataEditModel.makeGenericRowLabels()

Example

dataEditModel.makeGenericRowLabels()

Change the current row labels to integer numbers corresponding to the row number.

Expand a Categorical or a Genetic Column to Binary Columns

Syntax

dataEditModel.expandColToBool(column number, mode)

Example

dataEditModel.expandColToBool(5, 1)

This command takes a categorical or a genetic column and creates a binary column for every value in the column. There is one required parameter, the column number, and an optional second parameter to indicate how to handle missing data.

The values of the new columns are 1 if the value of the cells in the original column equal the value corresponding to the value assigned to the boolean column and 0 otherwise. For example, if a categorical column “Level” had three values “Low”, “Medium”, and “High”, three binary columns would be created “Level=Low?”, “Level=Medium?”, and “Level=High?”. Every instance of “Low” in the “Level” column would be indicated with a 1 in the column “Level=Low?”. All other values would be indicated with a 0. The other binary columns would be created in the same manner.

The allowable values for the second parameter are as follows:

  • 0 (default): Expand missing data into a separate column with a value of 1 for missing data rows.
  • 1: Missing data is NOT expanded into a separate column–instead, missing-value indicators (“?”) are placed in the missing data rows for every output column.

In the above example, if rows 3, 5, and 8 had missing data (“?”) and mode 0 is specified (or no mode is specified), a fourth binary column “Level=??” would be created, and in rows 3, 5, and 8, this fourth column would contain a 1 and the remaining columns would contain a 0.

Meanwhile, in the above example, if rows 3, 5, and 8 had missing data (“?”) and mode 1 is specified, only the three columns “Level=Low?”, “Level=Medium?”, and “Level=High?” would be created, and in rows 3, 5, and 8 of these three columns, a missing-value indicator (“?”) would be placed.

Convert a String Column to a Binary Column

Syntax

dataEditModel.convertStrColToBool(column number, list of strings to convert to 1, create new column indicator)

Example

dataEditModel.convertStrColToBool(4,[“Low”,“Medium”],0)

This command takes the indicated strings and converts them to ones and the other values in the categorical or genetic column to zeros. There are two required parameters, the column number and a Python string list of values to set as 1 in the binary column. The optional third parameter indicates if a new column should be created or not. The parameters are as follows:

  • 0 (default): Overwrite the original column with the new column
  • 1: Create a new column to the right of the original column

Convert an Integer or Real Column to Binary By Threshold

Syntax

dataEditModel.convertColToBool(column number, threshold value, create new column indicator)

Example

dataEditModel.convertColToBool(5, 1.0, 0)

This command takes an integer or real column and converts the values to binary values as indicated by the threshold. There are two required parameters, the column number and the threshold value (that must have at least one decimal place). All values greater than or equal to the threshold will be converted into ones, all values less than the threshold will be converted into zeros. The optional third parameter indicates if a new column should be created or not. The parameters are as follows:

  • 0 (default): Overwrite the original column with the new column
  • 1: Create a new column to the right of the original column

Convert a Binary or Real Column to an Integer Column

Syntax

dataEditModel.convertColToInt(column number, round or truncated values indicator, create new column indicator)

Example

dataEditModel.convertColToInt(6, 1, 1)

This command takes a binary or a real valued column and converts the values to integers by either rounding or truncation. There is one required value, the column number. There are two optional parameters described below:

  • Optional second parameter: Convert values by rounding or truncation (does not apply to binary values).
    • 0: Truncate real values
    • 1 (default): Round real values to nearest integer
  • Optional third parameter: Indicates if a new column should be created or not.
    • 0 (default): Overwrite the original column with the new column
    • 1: Create a new column to the right of the original column

Convert a Binary or Integer Column to a Real Column

Syntax

dataEditModel.convertColToReal(column number, create new column indicator)

Example

dataEditModel.convertColToReal(11, 0)

This command takes a binary or integer column and converts it to a real valued column. The only required parameter is the column number. The optional second parameter indicates if a new column should be created or not. The parameters are as follows:

  • 0 (default): Overwrite the original column with the new column
  • 1: Create a new column to the right of the original column

Convert a Numerical or Genetic Column to a Categorical(Nominal) Column

Syntax

dataEditModel.convertColToNominal(column number, create new column indicator)

Example

dataEditModel.convertColToNominal(7, 1)

This command takes a numerical or genetic column and converts the values to strings. The only required parameter is the column number. The optional second parameter indicates if a new column should be created or not. The parameters are as follows:

  • 0 (default): Overwrite the original column with the new column
  • 1: Create a new column to the right of the original column

Convert a Categorical Column to a Genetic Column

Syntax

new variable = dataEditModel.convertColToGenotypes(column number, create a new column indicator)

Example

myVariable = dataEditModel.convertColToGenotypes(53,1)

Converts a categorical column to a genetic column. There is one required parameter and one optional parameter. The first (required) parameter is the column number corresponding to the column to convert. The third optional parameter indicates if a new column should be created or not. The parameters are as follows:

  • 0 (default): Overwrite the original column with the new column
  • 1: Create a new column to the right of the original column

Determine if a Cell has been Edited

Syntax

new variable = dataEditModel.isEdited(row number, column number)

Example

myVariable = dataEditModel.isEdited(5,6)

Returns a 1 if the indicated cell has been edited, and returns a 0 if the indicated cell has not been edited.

Determine if a Specific Row Label has been Edited

Syntax

new variable = dataEditModel.isRowLabelEdited(row number)

Example

myVariable = dataEditModel.isRowLabelEdited(5)

Returns a 1 if the indicated row label has been edited, and returns a 0 if the row label has not been edited.

Determine if a Specific Column Header has been Edited

Syntax

new variable = dataEditModel.isColHeaderEdited(column number)

Example

myVariable = dataEditModel.isColHeaderEdited(6)

Returns a 1 if the indicated column header has been edited, and returns a 0 if the column header has not been edited.

Determine if the Row Label Header has been Edited

Syntax

new variable = dataEditModel.isRowLabelHeaderEdited()

Example

myVariable = dataEditModel.isRowLabelHeaderEdited()

Returns a 1 if the row label header has been edited, and 0 otherwise.