‹‹ Back to SVS Home

Scripting Reference

5.6 Scripting Reference

5.6.1 Project Related Commands

5.6.1.1 Creating a New Project

To create a new project that can later be viewed in GUI mode, use the following command. Once you create a project, all new HelixTree objects will be added to the project as if you were doing the same operations in GUI mode.

EXAMPLE

ghi.newProject(’Discovery’, ’/projects’ )

SYNTAX

ghi.newProject(project name, project path)

Note that the ’project path’ must be an existing folder on your file system and ’project name’ will be a new folder in the ’project path’ directory.

5.6.1.2 Creating a Temporary Project

To create a temporary project, use the following command. Projects created with this command will not be saved to disk, and will not be available once the script has completed. This command cannot be used in GUI mode.

Note: Projects created with this command cannot be saved using the saveProject() command.

EXAMPLE

ghi.newTempProject()

SYNTAX

ghi.newTempProject()

5.6.1.3 Open an Existing Project

To open a project previously created in either GUI mode or script mode use this command.

EXAMPLE

ghi.openProject(’/projects/Discovery/Discovery.ghp’)

SYNTAX

ghi.openProject(path and name of project)

5.6.1.4 Saving a Project

When you are at a point in your workflow where you want to save the state of the current project use this command.

EXAMPLE

ghi.saveProject()

SYNTAX

ghi.saveProject()

5.6.1.5 Closing a Project

The following command will close the current project without saving the state of the project first. If you want to save the project state first use the saveProject() command.

EXAMPLE

ghi.closeProject()

SYNTAX

ghi.closeProject()

5.6.2 General GHI Commands

5.6.2.1 Allowing Viewers to Display

There may be times when you are running a script and you do not want to see viewers, such as progress dialogs, during the running of script commands. The following command will either suppress or allow the display of GUI viewers while executing scripts. Note that you can turn viewers on and off at any time while running a script, and this command only affects scripts that are run from the Scripts menu of a viewer. If viewers are turned off in a script, they will be turned on again upon completion of the script so that new scripts will always start with viewers turned on.

There are two possible settings: 0 = false, 1 = true.

EXAMPLE

ghi.enableNewViewers(1)

SYNTAX

ghi.enableNewViewers(viewer setting)

5.6.2.2 Allowing Log Messages to Be Created

There might be times when you do not want to have logging take place during the execution of script commands, but other times when you do want logging. The following command will either suppress or allow the logging of actions while executing scripts. Note that you can turn logging on and off at any time while running a script.

There are two possible settings: 0 = false, 1 = true.

EXAMPLE

ghi.enableLogging(1)

SYNTAX

ghi.enableLogging(logging setting)

5.6.2.3 Display a GUI Message

Sometimes you may want to pop up a GUI based message to report status or other information. This command will take the text parameter and display it in a standard message dialog.

EXAMPLE

ghi.message("my important message")

SYNTAX

ghi.message(message string)

5.6.2.4 Display a GUI Error Message

When you create a script that uses try/except syntax, you can put this command in the except clause and any exception message will be displayed in a GUI error dialog.

EXAMPLE

ghi.error()

SYNTAX

ghi.error()

5.6.2.5 Getting a Specific Navigator Node

When you know a navigator node display name or a navigator node ID, you can retrieve an object representing that navigator node. The following command takes either an integer for the node ID or a string for the node name. When asking for a navigator node by ID a single object is returned. When asking for a navigator node by name a list of objects is returned because names are not guaranteed to be unique.

EXAMPLE

myList = ghi.getObject(’name’)

SYNTAX

object list = ghi.getObject(navigator node display name)

EXAMPLE

myObject = ghi.getObject(ID)

SYNTAX

object variable = ghi.getObject(navigator node ID)

5.6.2.6 Getting the Current Navigator Node

Another way to get access to navigator nodes is to ask for the currently highlighted node. If no node is highlighted an error will be displayed. Otherwise, an object representing the current node will be returned.

EXAMPLE

myObject = ghi.getCurrentObject()

SYNTAX

object variable = ghi.getCurrentObject()

5.6.2.7 Choosing a File

This method will display a dialog window for browsing and selecting a file(s). If a file(s) is selected then a tuple with the complete path to the file(s) is returned. If the dialog is canceled then an empty tuple is returned. There are two required parameters. The first parameter defines a file extension mask. For example, if you pass in "*.txt" the dialog will only display files that have the .txt extension. The second parameter is the title to be displayed in the dialog’s title bar. The third argument is optional. If you put 1 for the third argument the file chooser will allow multiple files to be selected. If the third argument is omitted or anything but a one is set the chooser will default to selecting only a single file. This command returns the file path(s) as a list.

EXAMPLE

myFilePaths = ghi.chooseFile("*.txt", "Choose A File Please", 1)

SYNTAX

file path list = ghi.chooseFile(file extension mask, dialog title, [allow multiple selection])

5.6.2.8 Choosing a Directory

The following method can be used to create a file browser for browsing to and selecting directories. This method has one required parameter and one optional parameter. The first parameter is the title to be displayed in the dialog’s title bar. The second parameter is optional and specifies the initial working directory of the browser. If this parameter is omitted, then the HelixTree application directory will be used as the initial working directory. If the dialog is cancelled, an empty string will be returned. Otherwise, the path of the selected directory will be returned.

EXAMPLE

myDirectory = ghi.chooseDirectory("Choose a Directory Please", "C:/HelixTree/example" )

SYNTAX

directory path = ghi.chooseDirectory(dialog title, [initial working directory])

5.6.2.9 Creating a Progress Bar

This method will create a progress bar which can be used to display the progress of a certain task and to signal the cancellation of a process. There are two required arguments for this method. The first argument specifies the text to be displayed on the progress bar. The second argument defines the number of progress increments for the progress bar. An object for the progress bar will be returned.

EXAMPLE

myProgressBar = ghi.progressBar(“Please Wait”, 100)

SYNTAX

progress bar object = ghi.progressBar(dialog caption, total number of progress increments )

5.6.2.10 Setting the Progress Bar’s Progress

The following method allows you to set the progress displayed by the progress bar to the value passed in. This value will be displayed on the progress bar as a percentage based on the proportion of the specified progress to the total number or progress increments. For example, if a progress bar is defined as having 50 progress increments, setting the progress to 10 will cause the progress bar to display 20 percent completion.

EXAMPLE

myProgressBar.setProgress(10)

SYNTAX

progress bar object.setProgress(progress value)

5.6.2.11 Checking if the Progress Bar Has Been Cancelled

The following method allows you to check if a user has pressed the cancel button on the progress bar. This information may prove useful when trying to determine whether to stop a process prior to it’s completion. If the progress bar has been cancelled the method returns 1. Otherwise, the method returns 0.

EXAMPLE

myProgressBar.wasCancelled()

SYNTAX

integer variable = progress bar object.wasCancelled()

5.6.2.12 Disposing of a Progress Bar When Done

It is good practice to make sure that a progress bar is disposed of when the task is complete. After this method is called, the progress bar will no longer show itself and calling methods on the script object will have no effect.

EXAMPLE

myProgressBar.finish()

SYNTAX

progress bar object.finish()

5.6.2.13 Creating a Status Dialog

This method will create a status dialog which can display messages for a task that can not incrementally update a progress bar. This method is also useful for brief tasks that do not require the full weight of a progress bar. There is one argument for this method: the message to be displayed by the status dialog.

EXAMPLE

myStatusDialog = ghi.statusDialog(“Doing something brief”)

SYNTAX

status dialog object = ghi.statusDialog(dialog message )

5.6.2.14 Setting the Status Dialog’s Message

The following method allows you to change the message displayed by the status dialog. This may be useful when you have a series of tasks and you would like to inform the user which task is currently being worked on. The only argument is the message to update the dialog with.

EXAMPLE

myStatusDialog.setMessage(“Now working on a very hard problem.”)

SYNTAX

status dialog object.setMessage(new message )

5.6.2.15 Closing the Status Dialog When Done

To close the status dialog, simply call this method. You should always remember to finish the status dialog that you start and only use one at a time.

EXAMPLE

myStatusDialog.finish()

SYNTAX

status dialog object.finish()

5.6.3 Commands Common to All Objects

Some commands are available for all the HelixTree objects that you can access from the Python shell. These commands allow you to control GUI aspects of objects you create in scripting.

5.6.3.1 Change a Navigator Node Name

During the course of a script you could be creating navigator node objects that will appear in the Navigator Window next time you open the project in GUI mode. If the generic names assigned to new navigator nodes are not the desired behavior you can change the name of the object with this command.

EXAMPLE

myNodeObject.setName(’my node name’)

SYNTAX

node object.setName(new navigator node name)

5.6.3.2 Getting a Navigator Node Name

If you need to know the name of a navigator node use this command with any Python object that corresponds to a navigator node.

EXAMPLE

myNodeName = myNodeObject.getName()

SYNTAX

navigator node name = node object.getName()

5.6.3.3 Getting a Navigator Node Type

If needed, you can get the navigator node type from an object with this command. The command returns a string displaying the object’s type.

EXAMPLE

myNodeType = myNodeObject.getType()

SYNTAX

navigator node type = node object.getType()

5.6.3.4 Getting a Navigator Node ID

If needed, you can get the navigator node ID from an object with this command. The command returns an integer representing a node’s ID.

EXAMPLE

myNodeID = myNodeObject.getID()

SYNTAX

navigator node ID = node object.getID()

5.6.3.5 Deleting a Navigator Node

To delete a navigator node enter this command in the Python Shell window. If a node can not be deleted, such as the project node or a node that is used to create another node, then a message will be displayed and the node will not be deleted. After entering this command in the Python Shell, the variable that represented the node will no longer be valid and any attempt to use it will display a message saying it is no longer valid.

EXAMPLE

myNodeObject.deleteObject()

SYNTAX

node object.deleteObject()

5.6.3.6 Closing a Navigator Viewer

To cause the viewer for a navigator node to be shut down you can enter this command in the Python Shell window.

EXAMPLE

myNodeObject.close()

SYNTAX

node object.close()

5.6.3.7 Showing a Navigator Viewer

To cause the viewer for a navigator node to be displayed you can enter this command in the Python Shell window.

EXAMPLE

myNodeObject.show()

SYNTAX

node object.show()

5.6.3.8 Finding a Node’s Parent

To get an object that represents a node’s parent enter this command in the Python Shell window and it will return an object representing the parent node. You can use the getType() command to test what type of object is returned.

EXAMPLE

newObject = myNodeObject.getParent()

SYNTAX

new node object = node object.getParent()

5.6.3.9 Finding a Node’s Secondary Parent

This command returns an object representing the secondary parent of a node. A secondary parent is another node that was used in combination with the current node’s parent to create the current node. If there is no secondary parent then nothing is returned. You can check the type of secondary parent returned by using the getType() command.

EXAMPLE

newObject = myNodeObject.getParentSecondary()

SYNTAX

new node object = node object.getParentSecondary()

5.6.3.10 Getting a Node’s Annotations

This command will returned a string with the current contents of the annotations window.

EXAMPLE

myAnnotations = myNodeObject.getAnnotations()

SYNTAX

annotations string = node object.getAnnotations()

5.6.3.11 Appending to a Node’s Annotations

This command will append a string to the end of the current contents of the annotations window.

EXAMPLE

myNodeObject.appendAnnotations("some text")

SYNTAX

node object.appendAnnotations(new annotations text)

5.6.4 Importing and Loading Data

The following commands allow you to import datasets into your open project.

5.6.4.1 Importing GHD-format Data Sets

This command can be used to import a (“Legacy”) GHD format dataset. The resulting spreadsheet is returned and may be assigned to a variable.

EXAMPLE

mySS = ghi.importGHD(’/home/mydata.ghd’)

SYNTAX

new spreadsheet object = ghi.importGHD(path and filename of GHD file)

5.6.4.2 Importing DSF Files

This command can be used to import a Dataset Storage Format (DSF) dataset. The resulting spreadsheet is returned and may be assigned to a variable.

EXAMPLE

mySS = ghi.importDSF(’/home/mydata.dsf’)

SYNTAX

new spreadsheet object = ghi.importDSF(path and filename of DSF file)

5.6.4.3 Importing Various File Formats

Correlating to the Import Wizard (4.3.1), this command allows the importing of various file types into the project.

EXAMPLE

mySS = ghi.importData(’/home/mydata.txt’)

EXAMPLE

mySS = ghi.importData(’/home/mydata.txt’, 1, 1, ’-’ , 2)

SYNTAX

new spreadsheet object = ghi.importData(path and filename of file, [optional column number to be used as spreadsheet row labels], [optional row to use as column headers], [the character to use as the allele delimiter], [the worksheet to use when applicable] )

5.6.4.4 Importing ASCII files

To specifically import a text based file you may use one of the following commands. For either case, the resulting spreadsheet is returned and may be assigned to a variable.

To import a space-separated text file, use

EXAMPLE

mySS = ghi.importASCII(’/home/mydata.txt’)

EXAMPLE

mySS = ghi.importASCII(’/home/mydata.txt’, 1)

SYNTAX

new spreadsheet object = ghi.importASCII(path and filename of text file, [optional column number to be used as spreadsheet row labels])

To import a comma-separated-variable (CSV) text file, use

EXAMPLE

mySS = ghi.importCSV(’/home/mydata.txt’)

EXAMPLE

mySS = ghi.importCSV(’/home/mydata.txt’, 1)

SYNTAX

new spreadsheet object = ghi.importCSV(path and filename of CSV file, [optional column number to be used as spreadsheet row header])

In either case, the resulting spreadsheet is returned, and may be assigned to a variable.

5.6.4.5 Importing Family-Indexed Data

HelixTree can import pedigree and family-indexed data in a number of formats. (Currently, only the FBAT/PBAT pedigree and phenotype formats are supported. In the future, we plan to support other pedigree formats, along with the formats for their supporting data.)

To import an FBAT/PBAT Pedigree file, use the following command. The resulting spreadsheet is returned and may be assigned to a variable.

EXAMPLE

mySS = ghi.importFBATPedigree(’/home/mydata.ped’)

SYNTAX

new spreadsheet object = ghi.importFBATPedigree(path and filename of pedigree file)

To import an FBAT/PBAT Pedigree file, use the following command. The resulting spreadsheet is returned and may be assigned to a variable.

EXAMPLE

mySS = ghi.importFBATPhenotype(’/home/mydata.phe’)

SYNTAX

new spreadsheet object = ghi.importFBATPhenotype(path and filename of phenotype file)

5.6.4.6 Importing PED files

PED files can be imported using the importPED(...) scripting command. This command takes two string parameters for the PED and MAP file paths, as well as two optoinal keyword arguments to specify missing values for genotypes and phenotypes. Note: The missing value for phenotypes defaults to -9, and the missing value for genotypes defaults to "0".

The optional keyword arguments are as follows:

  • missingPhenotype – This argument allows you to specify an integer value that should be treated as missing for phenotypes
  • missingGenotype – This argument allows you to specify a single character string that should be treated as missing for genotypes

EXAMPLE

mySS = ghi.importPED(’C:/data/myData.ped’, ’C:/data/myData.map’, missingPhenotype=-9, missingGenotype="0")

SYNTAX

new spreadsheet object = ghi.importPED(path and filename of PED file, path and filename of MAP file [, missingPhenotype=missing phenotype value, missingGenotype=missing genotype value])

5.6.4.7 Importing BED files

Use the following command to import BED (binary PED) files. This command takes 3 arguments: the BED file path, the FAM file path, and the BIM file path.

EXAMPLE

mySS = ghi.importBED(’C:/data/myData.bed’, ’C:/data/myData.fam’, ’C:/data/myData.bim’)

SYNTAX

new spreadsheet object = ghi.importBED(path and filename of BED file, path and filename of FAM file, path and file name of BIM file)

5.6.4.8 Extracting Copy Number Values From Affymetrix CEL Files

Using the convertCelToCopyNumberDsf(...) command, you can create a DSF file containing the normalized copy number intensity values contained in the input CEL files. The resulting DSF file will be stored on disk, and may then be included in subsequent operations using HelixTree.

This command requires the following parameters in this order:

  • Input file list – This list should contain the paths and names for all CEL files that you want to use. This must be a Python List.
  • Output DSF file – Specifies the save location and file name of the output DSF file.
  • Ref status spreadsheet – An integer representing the node ID of the spreadsheet object that will contain the reference column.
  • Ref status column – An integer that represents the column number of the actual reference status column within the ref status spreadsheet.
  • Marker map spreadsheet – An integer representing the node ID for the marker map spreadsheet.

This command also allows you to specify other, optional parameters by using keyword arguments as follows:

  • affyLibDir – Allows you to specify the path to a directory containing Affymetrix library files.
  • tempDir – Allows you to specify a directory to use for temporary file creation.
  • mappingId – NSP/STY mapping spreadsheet.

Note: If you omit the affyLibDir keyword argument, you must have the appropriate .gcdf files in the AffyLibraryFiles directory of your HelixTree directory. This file will be created from the corresponding Affymetrix .cdf file the first time it is used during the import process.

Note: The temporary directory used by the import process must be on a local disk. If your project is on a network drive, be sure to specify tempDir as a directory on your local disk.

Note: If you are using Affymetrix 500k data, and are including both NSP and STY CEL files, you must choose an NSP/STY mapping spreadsheet in order for the import to be successful.

See 4.4.1.1 for more information on the import process, including the requirements for the reference status and mapping spreadsheets.

EXAMPLE

resultVal = ghi.convertCelToCopyNumberDsf([’C:/file1.cel’,’C:/file2.cel’], ’C:/DsfOutput/CopyNumberVals.dsf’, 3, 1, 6, affyLibDir=’C:/GeneChip/Library’, tempDir=’C:/workSpace’, mappingId=9)

SYNTAX

int = ghi.convertCelToCopyNumberDsf(CelFileList, outputDsfFile, CaseControlId, CaseControlColumn, markerMapId [, affyLibDir, tempDir, mappingId])

5.6.4.9 Importing a Marker Map File

To import a Genetic Marker Map, use the following command. The first two arguments are mandatory followed by a third optional argument. The first is the file path to the map file, the second is a format indicator and should be the string “S” for space, “C” for comma and “T” for tab delimited ASCII files. Optionally you can add 1 as the third argument to specify that the first line of the map file should be ignored. These positional parameters are followed by optional “keyword” parameters you can specify in any combination. The values assigned to these parameters should be the column in the import file to associate with each field.

  1. markerID - The column number of SNP/Marker ID, default=1
  2. distance - The column number of Distance, default=2
  3. chromosome - The optional column number of Chromosome
  4. region - The optional column number of Region
  5. rsid - The optional column number of RSID
  6. gene - The optional column number of Gene

The importMarkerMap() command returns a spreadsheet object.

EXAMPLE

mySS = ghi.importMarkerMap(’/home/mydata.txt’,’S’)

EXAMPLE

mySS = ghi.importMarkerMap(’/home/mydata.txt’, ’C’, markerID=1, distance=2, chromosome=3, region=5, rsid=4, gene=6)

EXAMPLE

mySS = ghi.importMarkerMap(’/home/mydata.txt’, ’T’, 1, distance=2, markerID=1)

SYNTAX

new spreadsheet object = ghi.importMarkerMap(path and filename of map file, the format indicator, [ignore the firest line=0], [markerID column number], [distance column number], [chromosome column number], [region column number], [rsid column number], [gene column number])

5.6.5 Creating a New Data Set with Scripting

As you manipulate data in scripting there may be times when you would like to add a new dataset and its corresponding spreadsheet to a project. The following set of commands allows you to construct a dataset from Python lists and add the dataset to a project.

5.6.5.1 Getting a Dataset Builder Object

This command returns an object for use in building new datasets. The first parameter is the display name for the dataset when it is added to the Navigator Window. The next two parameters are the number of rows and columns respectively. The last column indicates whether or not you want to add a column of row labels. Note that if you want a column of row labels you must use the addRowLabels() command before adding any of you data columns.

EXAMPLE

myBuilderObject = ghi.startSpreadsheetBuilder("datasetName", 10, 10, 1)

SYNTAX

ss builder object = ghi.startSpreadsheetBuilder(dataset name, number of rows, number of columns, add a row labels 1=yes 0=no)

5.6.5.2 Adding Row Labels

If you specified that your dataset will have row labels you must use the following command to add the row label column before you add the data columns. There are two parameters for this command the first is the column header and the second is a list of strings that are the row labels.

EXAMPLE

myBuilderObject.addRowLabels("myLabels", [label1, label2, label3, ...])

SYNTAX

ss builder object.addRowLabels(column header, list of strings)

5.6.5.3 Adding a Column of Boolean Values

The following command adds a column of boolean values to the new data set. Note the values should be either 0’s or 1’s.

EXAMPLE

myBuilderObject.addBoolColumn("myBools", [1, 1, 0, ...])

SYNTAX

ss builder object.addBoolColumn(column header, list of 0’s and 1’s)

5.6.5.4 Adding a Column of Integer Values

The following command adds a column of integer values to the new data set.

EXAMPLE

myBuilderObject.addIntColumn("myInts", [10, 12, 20, ...])

SYNTAX

ss builder object.addIntColumn(column header, list of integers)

5.6.5.5 Adding a Column of Double Values

The following command adds a column of double values to the new data set.

EXAMPLE

myBuilderObject.addDoubleColumn("myDoubles", [1.14, 2.5, 1.8, ...])

SYNTAX

ss builder object.addDoubleColumn(column header, list of doubles)

5.6.5.6 Adding a Column of Nominal Values

The following command adds a column of nominal values.

EXAMPLE

myBuilderObject.addNominalColumn("myNominals", ["green", "blue", "brown", ...])

SYNTAX

ss builder object.addNominalColumn(column header, list of strings)

5.6.5.7 Adding a Column of Genetic Values

The following command adds a column of genetic values.

EXAMPLE

myBuilderObject.addGeneticColumn("myGenetics", ["1_0", "1_1", "1_0", ...])

SYNTAX

ss builder object.addGeneticColumn(column header, list of strings)

5.6.5.8 Creating the Data Set

After you have added all the columns you desire to the spreadsheet builder object, you are ready to add the data set to the current project. This command will add the data set as a child of the node ID you pass in as a parameter and return a spreadsheet object representing the new data set. Note if no parameter is passed in the builder will default to placing the data set under the project root node.

EXAMPLE

myNewSpreadsheet = myBuilderObject.finishSpreadsheet(5)

SYNTAX

new spreadsheet object = ss builder object.finishSpreadsheet(node ID)

5.6.6 Spreadsheet Access and Manipulation

Once you have created a scripting spreadsheet object you can use the following commands to manipulate the spreadsheet.

5.6.6.1 Getting the Spreadsheet as a Dictionary

This function returns the entire spreadsheet as a dictionary of key value pairs, where the key is a string containing the spreadsheet column label, and the value is a list containing the contents of the spreadsheet column. If there a label column, it is also incorporated as a dictionary entry with its associated column label as its key.

EXAMPLE

myDict = mySS.asDict()

SYNTAX

new dictionary = spreadsheet object.asDict()

5.6.6.2 Getting the Spreadsheet as a List of Lists

This function returns the entire spreadsheet as a list of column lists. The columns will be listed in spreadsheet column number order. If there is a label column it will be the first column.

EXAMPLE

myList = mySS.asList()

SYNTAX

new list of lists = spreadsheet object.asList()

5.6.6.3 Getting a Spreadsheet Cell

This function returns the spreadsheet entry found at the intersection of the specified row and column. Row 0 is the row headers and column 0 is the column headers (if they exist). An invalid row or column index throws a RunTimeError exception.

EXAMPLE

myVariable = mySS.cell(1, 4)

SYNTAX

new variable = spreadsheet object.cell(row number, column number)

5.6.6.4 Getting a Spreadsheet Column by Column Number

This function returns the spreadsheet column values for the selected column. Column 0 is the column headers (if they exist). An invalid column index throws an exception.

EXAMPLE

myList = mySS.col(3)

SYNTAX

new list = spreadsheet object.col(column number)

5.6.6.5 Getting a Spreadsheet Column by Column Name

This function returns the spreadsheet column values for the column with the specified name. An invalid name throws an exception.

NOTE: Calling mySS.col("Name") performs a linear search across the columns until a match is found or not. This can be extremely slow with high-dimensional datasets. If you would like to perform look-ups of a columns position by its name, a hash table mapping labels to indexes might be appropriate. For example:

labelDict = {}  
for idx, label in enumerate(data.row(0)):  
    labelDict[label] = idx + 1

EXAMPLE

myList = mySS.col("Name")

SYNTAX

new list = spreadsheet object.col(column name)

5.6.6.6 Determining if a Spreadsheet is a Marker Map

This function returns 1 if a spreadsheet is a marker map spreadsheet or 0 if it is not.

EXAMPLE

mySS.isMarkerMap()

SYNTAX

new variable = spreadsheet object.isMarkerMap()

5.6.6.7 Get a Spreadsheet Column Type

This function returns the column type as one of the following values.

  • 0 is Binary
  • 1 is Integer
  • 2 is Double
  • 3 is Categorical
  • 4 is Genetic

EXAMPLE

myVariable = mySS.getColType(3)

SYNTAX

new variable = spreadsheet object.getColType(column number)

5.6.6.8 Get a Spreadsheet Column State

This function returns the column state as one of the following values.

  • 0 is Inactive
  • 1 is Independent
  • 2 is Dependent

EXAMPLE

myVariable = mySS.getColState(4)

SYNTAX

new variable = spreadsheet object.getColState(column number)

5.6.6.9 Selecting Columns by Chromosome, Region, or Gene Ranges

This method brings up the GUI dialog for selection of columns within one or more desired ranges of chromosome(s), region(s), or gene(s). This method can only be used on a marker mapped spreadsheet, otherwise an error will be returned.

EXAMPLE

mySS.selectGeneticRegion()

SYNTAX

spreadsheet object.selectGeneticRegion()

5.6.6.10 Obtain Marker Map Information from a Spreadsheet

The following method returns a list of dictionaries containing marker map information for all columns or for the column passed in:

EXAMPLE

myMarkerMapDict = mySS.getMarkerMap()

EXAMPLE

myCol5MarkerMap = mySS.getMarkerMap(5)

SYNTAX

list of dictionaries = spreadsheet object.getMarkerMap([optional column number])

The following methods each return a list containing values of the appropriate type for all columns or a list containing the one value of the appropriate type for the column passed in:

EXAMPLE

myDistanceList = mySS.getMarkerMapDistance()
myChromosomeList = mySS.getMarkerMapChromosome()
myRegionList = mySS.getMarkerMapRegion()
myRsidList = mySS.getMarkerMapRsid()
myGeneList = mySS.getMarkerMapGene()

EXAMPLE

myCol2DistanceList = mySS.getMarkerMapDistance(2)
myCol3ChromosomeList = mySS.getMarkerMapChromosome(3)
myCol4RegionList = mySS.getMarkerMapRegion(4)
myCol5RsidList = mySS.getMarkerMapRsid(5)
myCol6GeneList = mySS.getMarkerMapGene(6)

SYNTAX

numeric distance list = spreadsheet object.getMarkerMapDistance([optional column number])
chromosome name list = spreadsheet object.getMarkerMapChromosome([optional column number])
region name list = spreadsheet object.getMarkerMapRegion([optional column number])
rsid (string) list = spreadsheet object.getMarkerMapRsid([optional column number])
gene name list = spreadsheet object.getMarkerMapGene([optional column number])

5.6.6.11 Export a Spreadsheet to CSV File

This method writes the entire contents of the spreadsheet out to the specified comma-separated file. If an empty string is passed in, then the user is prompted for a file. If an error occurs in writing to the file in GUI mode, HelixTree shows an error message to the user.

EXAMPLE

mySS.exportCSV("results.csv")

SYNTAX

spreadsheet object.exportCSV(comma separated value can be set to either 0 = use even spacing or 1 = use marker map spacing.file name)

5.6.6.12 Export a Spreadsheet to a DSF File

This method writes the entire contents of the spreadsheet out to the specified Dataset Storage Format (DSF) file. If an empty string is passed in, then the user is prompted for a file. If an error occurs in writing to the file in GUI mode, HelixTree shows an error message to the user.

EXAMPLE

mySS.exportDSF("results.dsf")

SYNTAX

spreadsheet object.exportDSF(DSF file name)

5.6.6.13 Export a Spreadsheet to FBAT Pedigree File

This method writes the entire contents of the spreadsheet out to the specified FBAT-format pedigree file.

EXAMPLE

mySS.exportAsFBATPedigree("results.ped")

SYNTAX

spreadsheet object.exportAsFBATPedigree(pedigree file name)

5.6.6.14 Export a Spreadsheet to FBAT Phenotype File

This method writes the entire contents of the spreadsheet out to the specified FBAT-format phenotype file.

EXAMPLE

mySS.exportAsFBATPhenotype("results.phe")

SYNTAX

spreadsheet object.exportAsFBATPhenotype(phenotype file name)

5.6.6.15 Export a Spreadsheet PED/MAP File

To export a marker mapped pedigree spreadsheet to a PED file and the corresponding MAP file, use the savePED(...) scripting command. This command takes one string parameter which specifies the save location and name of the PED file. The MAP file name will automatically be created to match the PED file name, and will be exported to the same directory.

EXAMPLE

mySS.savePED("myPedFile.ped")

SYNTAX

spreadsheet object.savePED(PED file name)

5.6.6.16 Finding a Column by Name

This method searches for a column in the spreadsheet whose column label is specified. It returns the index of that column, or throws an exception if no such column is found.

EXAMPLE

myColNum = mySS.findCol("name")

SYNTAX

column number = spreadsheet object.findCol(column name)

5.6.6.17 Finding a Row by Name

This method searches for a row in the spreadsheet whose row label is specified. It returns the index of that row, or throws an exception if no such row is found. The spreadsheet must have row labels otherwise this routine will throw an exception.

EXAMPLE

myRowNum = mySS.findRow("name")

SYNTAX

row number = spreadsheet object.findRow(row name)

5.6.6.18 Invert Row States

Calling this function causes state of all rows to be inverted. That is, rows that were formerly active are made inactive, and rows that were formerly inactive are made active. This routine is useful in creating training and test sets.

EXAMPLE

mySS.invertRowState()

SYNTAX

spreadsheet object.invertRowState()

5.6.6.19 Getting the Number of Spreadsheet Columns

This method returns the number of columns in the spreadsheet (not including the label column).

EXAMPLE

myNum = mySS.numCols()

SYNTAX

number of columns = spreadsheet object.numCols()

5.6.6.20 Get the Number of Columns in a State

This method returns the number of columns in the given state. There are three states: 0=Inactive, 1=Independent, 2=Dependent.

EXAMPLE

myNumIndependant = mySS.numColsState(1)

SYNTAX

number of columns in state = spreadsheet object.numColsState(state)

5.6.6.21 Get the Number of Spreadsheet Rows

This method returns the number of rows in the spreadsheet (not including the column header row).

EXAMPLE

myNumRows = mySS.numRows()

SYNTAX

number of rows = spreadsheet object.numRows()

5.6.6.22 Get the Number of Rows in a State

This method returns the number of rows in the given state. There are two states: 0=Inactive, 1=Active.

EXAMPLE

myNumActive = mySS.numRowsState(1)

SYNTAX

number of rows in state = spreadsheet object.numRowsState(state)

5.6.6.23 Randomly Shuffle Rows

This method randomly permutes the rows in the spreadsheet by modifying the sort order at random. Subsequent calls to this method will give new permutations, based on the current random seed.

EXAMPLE

mySS.permuteRows()

SYNTAX

spreadsheet object.permuteRows()

5.6.6.24 Getting a Row of Data

This method returns a list of elements in a row given by the specified row number. Row 0 is the header row. All other rows contain the data elements of the spreadsheet. Note that row access is generally slower than column access. An exception is thrown if an invalid row number is specified.

EXAMPLE

myRowData = mySS.row(3)

SYNTAX

list of row elements = spreadsheet object.row(row number)

5.6.6.25 Change the State of a Single Column

This method sets the specified column to the specified state. There are three states: 0=Inactive, 1=Independent, 2=Dependent. Other column states remain unchanged.

EXAMPLE

mySS.setColState(1, 2)

SYNTAX

spreadsheet object.setColState(column number, state)

5.6.6.26 Change the State of a Range of Columns

This method sets a range of columns (inclusively) to the specified state. There are three states: 0=Inactive, 1=Independent, 2=Dependent. The states of other columns remain unchanged.

EXAMPLE

mySS.setColState(1, 50, 1)

SYNTAX

spreadsheet object.setColState(first column, last column, state)

5.6.6.27 Setting the State of a Single Row

This method sets a row to the specified state. There are two states: 0=Inactive, 1=Active. The states of other rows remain unchanged.

EXAMPLE

mySS.setRowState(3, 0)

SYNTAX

spreadsheet object.setRowState(row number, state)

5.6.6.28 Getting the State of a Single Row

This method returns the state of a row. There are two states: 0=Inactive, 1=Active.

EXAMPLE

mySS.getRowState(3)

SYNTAX

spreadsheet object.getRowState(row number)

5.6.6.29 Setting the State of a Range of Rows

This method sets a range of rows (inclusively) to the specified state. There are two states: 0=Inactive, 1=Active. Other row states remain unchanged.

EXAMPLE

mySS.setRowState(1, 50, 0)

SYNTAX

spreadsheet object.setRowState(first row number, last row number, state)

5.6.6.30 Setting the State of a Defined Set of Rows

This method of setting row state allows you to specify which rows to set to the specified state using a list of row numbers. Rows which correspond to the row numbers contained in the list will have their state changed to the specified state. There are two states: 0=Inactive, 1=Active. Rows with numbers that do not appear in the list will not be affected.

EXAMPLE

mySS.setRowState([1, 2, 4, 6], 0)

SYNTAX

spreadsheet object.setRowState(row number list, state)

5.6.6.31 Randomly set a Number of Rows to a State

This method will set a number of randomly selected rows to the specified state. There are two states: 0=Inactive, 1=Active. The other rows will be set to the opposite state.

EXAMPLE

mySS.setRowStateRandom(25, 0)

SYNTAX

spreadsheet object.setRowStateRandom(number of rows, state)

5.6.6.32 Randomly Set a Percentage of Rows to a State

This method will at random set a fraction of the total number of rows to be to the specified state. This is useful for selecting a certain percentage of the data irregardless of its size. There are two states: 0=Inactive, 1=Active. The other rows will be set to the opposite state.

EXAMPLE

mySS.setRowStateRandom(.5,0)

SYNTAX

spreadsheet object.setRowStateRandom(fraction of total rows, state)

5.6.6.33 Sort a Column in Ascending Order

This method sorts the spreadsheet by arranging the specified column in ascending order.

EXAMPLE

mySS.sortByColAscending(3)

SYNTAX

spreadsheet object.sortByColAscending(column number)

5.6.6.34 Sort a Column in Descending Order

This method sorts the spreadsheet by arranging the specified column in descending order.

EXAMPLE

mySS.sortByColDescending(3)

SYNTAX

spreadsheet object.sortByColDescending(column number)

5.6.6.35 Remove Column Sorting

To return the spreadsheet row sort order to the original, unsorted order, use this method.

EXAMPLE

mySS.unsort()

SYNTAX

spreadsheet object.unsort()

5.6.6.36 Remembering a Spreadsheet Page

When you make a change to a spreadsheet which has another viewer dependent on it, such as a tree model, the spreadsheet will be copied to a new sheet first. The change will then be made on the copy, rather than the original. For convenience’s sake, your spreadsheet variable will always catch up with the spreadsheet change. However, there are times when, after making such a change, you will want to reference the original spreadsheet page.

To be able to recall the original spreadsheet, use the following command before making the spreadsheet change (to “mySS” in this example):

EXAMPLE

myOriginalSS = mySS.thisPage()

SYNTAX

new spreadsheet object = spreadsheet object.thisPage()

Alternatively, you may wish to make the changes using the new spreadsheet variable (“myNewSS” in the following example) after executing this command the following way:

EXAMPLE

myNewSS = mySS.thisPage()

5.6.6.37 Joining Two Spreadsheets

Spreadsheets can be joined as long as some of the rows match in each spreadsheet. This is useful for adding additional columns to a spreadsheet. To join spreadsheets you get a spreadsheet object in the Python shell. Then specify the node ID of the second spreadsheet as the parameter of the join spreadsheet command. The spreadsheet object command can be a useful aid in specifying the second spreadsheet. A new spreadsheet object will be returned that represents the joined spreadsheets. The joined spreadsheet will be added to the navigator window as a child of the spreadsheet object that was used to issue the join command.

NOTE: Two family-indexed spreadsheets may also be joined using this command.

EXAMPLE

myNewSS = mySS.joinSpreadsheet(7)

SYNTAX

new spreadsheet object = spreadsheet object.joinSpreadsheet(node ID of second spreadsheet)

5.6.6.38 Creating a Linkage Disequilibrium Plot

A linkage disequilibrium (LD) computational object and plot can be created as long as there are active genetic columns in the spreadsheet. Once an LD object is created, several functions may be used with it, including one to show the display. The first argument for the method is an options object and is required. The LD options will be garnered from the options object. The second argument can be set to either 0 = use even spacing or 1 = use marker map spacing. If the second argument is omitted it will default to 1 (use marker map spacing).

EXAMPLE

myLDPlot = mySS.ldPlot(ghi.getTreeOptions(), 0)

SYNTAX

LD plot object = spreadsheet object.ldPlot(options object, [plot spacing])

NOTE: When an LD plot is displayed, computation starts for all of the points on the plot. If this plot covers a large number of markers (more than a few hundred), you may find it advantageous to use

ghi.enableNewViewers(False)

before using the ldPlot() command. This will avoid automatic computation of values only for the sake of completing a display.

See 5.6.7 for a complete list of methods available from the Linkage Disequilibrium object.

5.6.6.39 Creating Hardy-Weinberg Plots

A Hardy-Weinberg equilibrium (HWE) plot and computational object can be created as long as there are active genetic columns in the spreadsheet. The one argument to this method is optional and can be set to either 0 = use even spacing or 1 = use marker map spacing. This argument defaults to 1 (use marker map spacing). Once an HWE object is created, several functions may be used with it. See 5.6.8 for a complete listing of methods available from the Hardy Weinberg object.

EXAMPLE

myHWEPlot = mySS.hwePlot(0)

SYNTAX

HWE plot object = spreadsheet object.hwePlot([plot spacing])

5.6.6.40 Creating Two-Loci Genetic Plots

A two-loci genetic (TLG) computational object and plot can be created as long as there are active genetic columns in the spreadsheet. There are two parameters to this method. The first is an options object. The parameters for genetic splits will be garnered from the options object, and used for the tentative splits determined by this object. The second parameter specifies the spacing method to use and can be set to either 0 = use even spacing or 1 = use marker map spacing. If no second parameter is provided the method will default to 1 (use marker map spacing).

EXAMPLE

myTLGPlot = mySS.tlgPlot(ghi.getTreeOptions(), 0)

SYNTAX

TLG plot object = spreadsheet object.tlgPlot(options object, [plot spacing])

NOTE: When a TLG plot is displayed, computation starts for all of the points on the plot. If this plot covers a large number of markers (more than a few hundred), you may find it advantageous to use

ghi.enableNewViewers(False)

before using the tlgPlot() command. This will avoid automatic computation of values only for the sake of completing a display.

See 5.6.9 for a complete listing of the methods available from the Two-Loci Genetic object.

5.6.6.41 Creating an Allele Test Plot

An Allele Test P-Value plot may be created from a spreadsheet which contains genetic data. This function takes one parameter specifying the type of plot to create: 1 = plot ordered by variable number or 2 = plot ordered by adjusted P-value.

EXAMPLE

myPVPlot = mySS.alleleTestPlot(1)

SYNTAX

p-value plot object = spreadsheet object.alleleTestPlot(type of plot)

5.6.6.42 Plotting Numeric Spreadsheet Columns

You can use the plotNumericColumns(...) command to create a plot object from which you can view plots for numeric columns in the spreadsheet. This function does not take any parameters.

EXAMPLE

myPlot = mySS.plotNumericColumns()

SYNTAX

plot object = spreadsheet object.plotNumericColumns()

5.6.6.43 Plotting a Spreadsheet Column

The plotColumnValues(...) command allows you to create a plot showing the values of a spreadsheet column. This function may only be used for numeric columns. This function takes one argument, an integer specifying the column to plot.

EXAMPLE

myPlot = mySS.plotColumnValues(2)

SYNTAX

plot object = spreadsheet object.plotColumnValues(column number)

5.6.6.44 Performing Genetic Association Tests

The scripting command: associationTests(...) can be used to perform genetic association tests on any spreadsheet that contains genetic columns. This function requires two parameters and has 10 optional keyword arguments to specify testing correction and other parameters. Note: when using this command on a spreadsheet with a boolean response, the use of missing values as predictors precludes the Armitage Trend Test, the Exact form of Armitage Test, and Odds Ratio when including marker statistics.

Required arguments:

  • The genetic model or test to use
    • 1 = Basic Allele Tests
    • 2 = Genotypic Tests
    • 3 = Additive model
    • 4 = Dominant model
    • 5 = Recessive model
  • Whether or not to include missing values as predictors or drop them from analysis. 0 = drop missing values, 1 = use missings as predictors.

Optional parameters:

  • bonferroni – Bonferroni adjustment. 0 = no, 1 = yes.
  • fdr – False Discovery Rate. 0 = no, 1 = yes.
  • singlePerm – Single Value Permutations. 0 = no, 1 = yes.
  • fullPerm – Full Scan Permutations. 0 = no, 1 = yes.
  • numPerm – Number of permutations.
  • genoCounts – Genotype Counts. 0 = no, 1 = yes.
  • alleleCounts – Allele Counts. 0 = no, 1 = yes.
  • usePCA – Use principle component analysis. 0 = no, 1 = yes.
  • numComponents – The maximum number of components to find.
  • markerStatistics – Whether to include minor allele frequency, call rate, HWE P Value, Fisher’s Exact P Value, and signed HWE R. 0 = no, 1 = yes.

If analysis is successful, this function will return a reference to the newly created spreadsheet object.

EXAMPLE

mySpreadsheet = mySS.associationTests(1, 0, bonferroni=1, fdr=1, singlePerm=1, fullPerm=1, numPerm=1000, genoCounts=1, alleleCounts=1, usePCA=1, numComponents=10, markerStatistics=1)

SYNTAX

spreadsheet object = spreadsheet object.associationTests(genetic model, treatment of missings[, optional parameters])

5.6.6.45 Finding Runs of Homozygosity

If you have a marker mapped spreadsheet, the runsOfHomozygosity(...) command can be used to find runs of homozygosity within that spreadsheet.

This function takes the following parameters:

  • Minimum run length in SNPs: must be >= 2
  • Minimum number of subjects that must contain a run: must be at least 1
  • Create a spreadsheet with the incidence of runs per patient: 1 = yes, 0 = no
  • Create a spreadsheet describing each run found per patient: 1 = yes, 0 = no

This function will return a Python tuple. The tuple will contain the spreadsheets that were created during execution, or ’None’ for each spreadsheet that either was omitted in the function parameters, or could not be created during execution. The result tuple is as follows:

  • tuple[0] = Clusters of runs used for association analysis data set
  • tuple[1] = Incidence of common runs per SNP data set
  • tuple[2] = Description of runs per patient data set

EXAMPLE

myResultTuple = mySS.runsOfHomozygosity(10, 20, 1, 1)

SYNTAX

tuple = spreadsheet object.runsOfHomozygosity(minimum run length, minimum number of patients, create patient spreadsheet, create runs spreadsheet)

5.6.6.46 Inferring Missing Values

If you have a spreadsheet containing genetic data with missing values, the following function may be used to attempt to infer the missing values. This function will return a new spreadsheet with the inferred values.

This function takes the following parameters:

  • Window size: must be greater >= 2
  • Number of markers: must be >= 2, <= window size
  • Diplotype threshold: must be between 0 and 1 (inclusive)
  • Maximum EM iterations: must be >= 1
  • Convergence tolerance: must be between 0 and 1
  • Use all spreadsheet rows and columns: 0 = no, use only active rows and columns; 1 = yes, use all rows and columns

EXAMPLE

myNewSS = mySS.inferMissingValues(15, 5, .7, 20, 0.001, 1)

SYNTAX

new spreadsheet object = spreadsheet object.inferMissingValues(window size, number of markers, diplotype threshold, maximum EM iterations, convergence tolerance, use all spreadsheet rows(0 = false, 1 = true))

5.6.6.47 Creating Haplotype Frequency Tables

The command emHaplotypes(...) uses the Expectation/Maximization (EM) method to estimate haplotype frequencies for a selected set of columns, and then outputs the desired table or tables based on these estimates. This command returns the table(s) as a tuple. The four possible tables are:

  • Haplotype Frequency Table (all patients)
  • Diplotype Table (by patient and diplotype)
  • EM Haplotype Probability Table (by patient)
  • CHM Haplotype Probability Table (by patient)

The first example creates a haplotype frequency spreadsheet (and no other spreadsheets), showing estimates of the haplotypes of the markers in columns 2, 3, and 4, using defaults for all of the parameters that may be omitted.

The second example, which uses columns 2, 3, 4, and 5, outputs all four possible spreadsheet tables.

EXAMPLE

myHaplotypeTuple = mySS.emHaplotypes([2, 3, 4], "C", 100, 0.001, 0.001)

EXAMPLE

myHaplotypeTuple = mySS.emHaplotypes([2, 3, 4, 5], "E", 100, 0.0001, 0.0001, 1, 1, 90, 0.002, 1, 1, 1)

SYNTAX

spreadsheet tuple = spreadsheet object. emHaplotypes(list of markers, initialization type, number of iterations, tolerance for convergence, display threshold, [use data with missing values], [create haplotype frequency table], [confidence interval percentage], [confidence interval threshold], [create diplotype table], [create EM table], [create CHM table] )

Parameter Details:

  1. List of (Spreadsheet) marker column numbers
  2. Initialization type – ’C’ (CHM), ’E’ (equal), or ’R’ (random)
  3. Maximum number of iterations to use (must be positive)
  4. Tolerance for convergence (must be positive)
  5. Display (output) threshold (must be nonnegative)
  6. 1 = Use patient data with missing values, 0 = don’t use patient data with missing values (default = 0)
  7. 1 = if you want the All-Patient Haplotype Frequency table 0 = don’t create All-Patient Haplotype Frequency Table (default = 1)
  8. Confidence interval percentage (90, 95, or 99). Default is 95 percent Enter zero to avoid showing confidence intervals. (Applies only to the above table.)
  9. Confidence interval computation threshold (defaults to .001) (Applies only to the above table)
  10. 1 = if you (also) want the Diplotype table (default NOT created)
  11. 1 = if you (also) want the EM Haplotype Table generated (default NOT created)
  12. 1 = if you (also) want the CHM Haplotype Table generated (default NOT created)

5.6.6.48 Applying a Marker Map

To apply a marker map to a spreadsheet, you must have first imported a marker map spreadsheet and created an object for that spreadsheet. Apply the marker map to a spreadsheet by using this command. The marker map object should be used as the one parameter. This method will verify compatibility and return the new mapped spreadsheet. The mapped spreadsheet will also be added to the project as a child of the spreadsheet that is being mapped.

EXAMPLE

newMappedSS = mySS.applyMarkerMap(mapSS)

SYNTAX

new marker mapped sheet = spreadsheet object. applyMarkerMap(marker map object)

5.6.7 Using the Linkage Disequilibrium Object

Once you have created an LD object, there are a number of functions available which use that object.

EXAMPLE

myLDPlot.show()

Shows the plot display. Computation and value displaying will start at the diagonal and work outward as would normally be the case.

5.6.7.1 Getting All LD Values in a Dictionary

Use this command to get a Python dictionary containing LD values for two markers. The arguments are the spreadsheet columns for the markers. The dictionary contains the following keys:

  • -log10 P
  • LD Correlation, R
  • P Value (Chi sq)
  • Chi Squared
  • D Prime

EXAMPLE

myLDValues = myLDPlot.ld(5, 7)

SYNTAX

new dictionary = LD object.ld(first column number, second column number)

If you want to have LD values for a range you can use this method and provide two ranges. This usage will return a two dimensional matrix of dictionaries.

EXAMPLE

myLDValues = myLDPlot.ld(5, 7, 8, 9)

SYNTAX

matrix of dictionaries = LD object.ld(first range start column, first range end column, second range start column, second range end column)

5.6.7.2 Getting R Squared for Specified Markers

The following function computes the LD value R Squared for two markers or a range of markers. The markers are represented by the column numbers in the spreadsheet from which the LD object was derived. When only two markers are used the methods return one value. When two ranges are used then a matrix of values is returned.

The first example computes the value for two markers and returns that value. The second example computes the value for a range of markers and returns a matrix of those values.

EXAMPLE

myRSquared = myLDPlot.rSquared(5, 7)

SYNTAX

new variable = LD object.rSquared(first column number, second column number)

EXAMPLE

myRSquareds = myLDPlot.rSquared(5, 7, 8, 9)

SYNTAX

new matrix variable = LD object.rSquared(first range start column, first range end column, second range start column, second range end column)

5.6.7.3 Getting D Prime for Specified Markers

The following function computes the LD value D Prime for two markers or a range of markers. The markers are represented by the column numbers in the spreadsheet from which the LD object was derived. When only two markers are used the methods return one value. When two ranges are used then a matrix of values is returned.

The first example computes the value for two markers and returns that value. The second example computes the value for a range of markers and returns a matrix of those values.

EXAMPLE

myDPrime = myLDPlot.dPrime(5, 7)

SYNTAX

new variable = LD object.dPrime(first column number, second column number)

EXAMPLE

myDPrimes = myLDPlot.dPrime(5, 7, 8, 9)

SYNTAX

new matrix variable = LD object.dPrime(first range start column, first range end column, second range start column, second range end column)

5.6.7.4 Getting Neg Log10 P for Specified Markers

The following function computes the LD value Neg Log10 P for two markers or a range of markers. The markers are represented by the column numbers in the spreadsheet from which the LD object was derived. When only two markers are used the methods return one value. When two ranges are used then a matrix of values is returned.

The first example computes the value for two markers and returns that value. The second example computes the value for a range of markers and returns a matrix of those values.

EXAMPLE

myLog10P = myLDPlot.negLog10P(5, 7)

SYNTAX

new variable = LD object.negLog10P(first column number, second column number)

EXAMPLE

myLog10P = myLDPlot.negLog10P(5, 7, 8, 9)

SYNTAX

new matrix variable = LD object.negLog10P(first range start column, first range end column, second range start column, second range end column)

5.6.7.5 Getting Chi Squared for Specified Markers

The following function computes the LD value Chi Squared for two markers or a range of markers. The markers are represented by the column numbers in the spreadsheet from which the LD object was derived. When only two markers are used the methods return one value. When two ranges are used then a matrix of values is returned.

The first example computes the value for two markers and returns that value. The second example computes the value for a range of markers and returns a matrix of those values.

EXAMPLE

myChiSquared = myLDPlot.chiSquared(5, 7)

SYNTAX

new variable = LD object.chiSquared(first column number, second column number)

EXAMPLE

myChiSquareds = myLDPlot.chiSquared(5, 7, 8, 9)

SYNTAX

new matrix variable = LD object.chiSquared(first range start column, second range end column, second range start column, second range end column)

5.6.7.6 Getting P Value for Specified Markers

The following function computes the LD value D Prime for two markers or a range of markers. The markers are represented by the column numbers in the spreadsheet from which the LD object was derived. When only two markers are used the methods return one value. When two ranges are used then a matrix of values is returned.

The first example computes the value for two markers and returns that value. The second example computes the value for a range of markers and returns a matrix of those values.

EXAMPLE

myPValue = myLDPlot.pValue(5, 7)

SYNTAX

new variable = LD object.pValue(first column number, second column number)

EXAMPLE

myPValues = myLDPlot.pValue(5, 7, 8, 9)

SYNTAX

new matrix variable = LD object.pValue(first range start column, first range end column, second range start column, second range end column)

5.6.7.7 Getting Carlson SNP Tags

Based on the LD values that may be found in the LD Plot object, tagging SNPs for a range of markers may be found using Carlson’s method. The following function will put these SNP tags into a new spreadsheet and return that spreadsheet.

The first two parameters in this function are the first and last column markers in the range. The third through fifth parameters are optional, and reasonable default values (.1 for MAF threshold, .8 for LD R-squared threshold, and -1 (examine all marker pairs) for the marker separation window) will be inserted.

EXAMPLE

myCarlsonSS = myLDPlot.carlsonTags(5,102,.1,.8,20)

SYNTAX

new spreadsheet object = LD object.carlsonTags(first column number, second column number, [minor allele frequency threshold], [LD R-squared threshold], [marker separation window])

5.6.8 Using the Hardy Weinberg Object

Once you have created a HWE object, there are a number of functions available which use that object.

EXAMPLE

myHWEPlot.show()

shows the plot display.

5.6.8.1 Getting All HWE Values in a Dictionary

Use this command to get a Python dictionary containing HWE values. The dictionary contains the following keys:

  • -log10 P
  • HWE Correlation, R
  • P Value
  • Chi Squared

EXAMPLE

myHWEValues = myHWEPlot.hwe(5)

SYNTAX

new dictionary = HWE object.hwe(column number)

5.6.8.2 Getting Neg Log10 P for a Marker

The following function computes the HWE value Neg Log10 P for one marker. The marker is represented by its column number in the spreadsheet from which the HWE object was derived.

EXAMPLE

myLog10P = myHWEPlot.negLog10P(5)

SYNTAX

new variable = HWE object.negLog10P(column number)

5.6.8.3 Getting R Squared for a Marker

The following function computes the HWE value R Squared for one marker. The marker is represented by its column number in the spreadsheet from which the HWE object was derived.

EXAMPLE

myRSquared = myHWEPlot.rSquared(5)

SYNTAX

new variable = HWE object.rSquared(column number)

5.6.8.4 Getting P Value for a Marker

The following function computes the HWE P Value for one marker. The marker is represented by its column number in the spreadsheet from which the HWE object was derived.

EXAMPLE

myPValue = myHWEPlot.pValue(5)

SYNTAX

new variable = HWE object.pValue(column number)

5.6.8.5 Getting Chi Squared for a Marker

The following function computes the HWE value Chi Squared for one marker. The marker is represented by its column number in the spreadsheet from which the HWE object was derived.

EXAMPLE

myChiSquared = myHWEPlot.chiSquared(5)

SYNTAX

new variable = HWE object.chiSquared(column number)

5.6.9 Using the Two-Loci Object

Once a TLG object is created, several functions may be used with it, including one to show the display.

EXAMPLE

myTLGPlot.show()

Computation and value displaying will start at the diagonal and work outward as would normally be the case.

5.6.9.1 Getting All Two-Loci P-Values in a Dictionary

Use this command to get a Python dictionary containing Two-Loci p-values. Each marker is represented by its column number in the spreadsheet from which the TLG object was derived. The dictionary contains the following keys:

  • P
  • Adjusted P
  • Bonferroni P

EXAMPLE

myTLGValues = myTLGPlot.allPValues(5, 7)

SYNTAX

new dictionary = TLG object.allPValues(first column number, second column number)

5.6.9.2 Getting P for Two Markers

The following function returns the genetic-split P Value for one combination of two markers. The markers are represented by their column numbers in the spreadsheet from which the TLG object was derived.

EXAMPLE

myPValue = myTLGPlot.pValue(5, 7)

SYNTAX

new variable = TLG object.pValue(first column number, second column number)

5.6.9.3 Getting Adjusted P for Two Markers

The following function returns the genetic-split Adjusted P Value for one combination of two markers. The markers are represented by their column numbers in the spreadsheet from which the TLG object was derived.

EXAMPLE

myAdjPValue = myTLGPlot.adjPValue(5, 7)

SYNTAX

new variable = TLG object.adjPValue(first column number, second column number)

5.6.9.4 Getting Bonferonni P for Two Markers

The following function returns the genetic-split Bonferonni P Value for one combination of two markers. The markers are represented by their column numbers in the spreadsheet from which the TLG object was derived.

EXAMPLE

myBonfPValue = myTLGPlot.bonfPValue(5, 7)

SYNTAX

new variable = TLG object.bonfPValue(first column number, second column number)

5.6.9.5 Getting Neg Log10 of All Two-Loci P Values in a Dictionary

Use this command to get a python dictionary containing Two-Loci neg log10 p values. Each marker is represented by its column number in the spreadsheet from which the TLG object was derived. The dictionary contains the following keys:

  • -log10 P
  • -log10 Adjusted P
  • -log10 Bonferroni P

EXAMPLE

myTLGValues = myTLGPlot.allLogPValues(5, 7)

SYNTAX

new dictionary = TLG object.allLogPValues(first column number, second column number)

5.6.9.6 Getting Neg Log10 P for Two Markers

The following function returns the genetic-split neg log10 P Value for one combination of two markers. The markers are represented by their column numbers in the spreadsheet from which the TLG object was derived.

EXAMPLE

myLog10P = myTLGPlot.logPValue(5, 7)

SYNTAX

new variable = TLG object.logPValue(first column number, second column number)

5.6.9.7 Getting Neg Log10 Adjusted P for Two Markers

The following function returns the genetic-split neg log10 Adjusted P Value for one combination of two markers. The markers are represented by their column numbers in the spreadsheet from which the TLG object was derived.

EXAMPLE

myLog10AdjP = myTLGPlot.logAdjPValue(5, 7)

SYNTAX

new variable = TLG object.logAdjPValue(first column number, second column number)

5.6.9.8 Getting Neg Log10 Bonferonni P for Two Markers

The following function returns the genetic-split neg log10 Bonferonni P Value for one combination of two markers. The markers are represented by their column numbers in the spreadsheet from which the TLG object was derived.

EXAMPLE

myLog10BonfP = myTLGPlot.logBonfPValue(5, 7)

SYNTAX

new variable = TLG object.logBonfPValue(first column number, second column number)

5.6.9.9 Getting Detailed Split Information for Two Markers

This function creates a new spreadsheet that shows the details of the genetic-split information for a given combination of two markers. The spreadsheet will contain the split partitions detailed in the header and the partitioning of the data among the splits detailed in the spreadsheet body.

EXAMPLE

myDetailSpread = myTLGPlot.splitDetails(5,7)

SYNTAX

new spreadsheet object = TLG object.splitDetails(first column number, second column number)

5.6.10 Using the P-Value plot

Once you have created a P-Value plot object, there are a number of functions which can be run with that object.

5.6.10.1 Getting P-Values

Use this command to get a Python dictionary containing P, aP, and bP values. The dictionary contains the following keys:

  • P
  • aP
  • bP

EXAMPLE

myPValues = myPVPlot.pValue(9)

SYNTAX

new dictionary = P-Value object.pValue(column number)

5.6.10.2 Getting Simes P-Values

Use this command to get a Python dictionary containing Simes P, and Simes aP values from a P-Value plot which is ordered by variable number. The dictionary contains the following keys:

  • Simes P
  • Simes aP

EXAMPLE

mySimesPValues = myPVPlot.simesValue(9)

SYNTAX

new dictionary = P-Value object.simesValue(column number)

5.6.10.3 Setting the Simes Window

The following command can be used to change the window size used in calculating Simes P-Values. The new window size must be greater than 0, odd, and less than or equal to the number of plot columns.

EXAMPLE

myPVPlot.setSimes(3)

SYNTAX

P-Value object.setSimes(new window size)

5.6.10.4 Getting FDR (aP)

To find the false discovery rate for a specific column in a P-Value plot ordered by aP, use the following command.

EXAMPLE

myVariable = myPVPlot.FDRValue(9)

SYNTAX

new variable = P-Value object.FDRValue(column number)

5.6.10.5 Getting all P-Values as a Spreadsheet

Use this command to get a spreadsheet object which contains P, aP, bP, Simes P, and Simes aP for all columns represented in the current P-Value plot.

EXAMPLE

mySpreadsheet = myPVPlot.pvalueSpreadsheet()

SYNTAX

new spreadsheet object = P-Value object.pvalueSpreadsheet()

5.6.10.6 Getting an Allele Frequency Spreadsheet for a Specific Column

To create a spreadsheet containing the allele frequencies for a specific column, use the following command:

EXAMPLE

mySpreadsheet = myPVPlot.alleleFreqTable(9)

SYNTAX

new spreadsheet object = P-Value object.alleleFreqTable(column number)

5.6.11 Getting and Setting Tree Options

In order to set parameters that affect how trees are built and what values are shown in GUI mode, you must first get a tree options object. This object works like a Python dictionary. Each setting is accessed using subscript notation where the name of the setting is put inside the subscript brackets. Each setting is described below with an example showing the subscript notation. To get a tree options object use the following command.

EXAMPLE

myOptionsObject = ghi.getTreeOptions()

SYNTAX

tree options object = ghi.getTreeOptions()

5.6.11.1 Setting the Minimum Elements for Splitting

Using the following command, you can get or change the minimum split size used when creating trees. The first example demonstrates getting the current setting, and the second example demonstrates changing the setting. The new split size must be greater than or equal to 1.

EXAMPLE

myMinElements = myOptionsObject[’minelements’]

SYNTAX

new variable = tree options object[’minelements’]

EXAMPLE

myOptionsObject[’minelements’] = 2

SYNTAX

tree options object[’minelements’] = desired split size

5.6.11.2 Setting the Number of Threads

Using the following command, you can get or change the number of threads used when creating trees. The first example demonstrates getting the current setting, and the second example demonstrates changing the setting. The new number of threads must be greater than or equal to 1.

EXAMPLE

myNumThreads = myOptionsObject[’numthreads’]

SYNTAX

new variable = tree options object[’numthreads’]

EXAMPLE

myOptionsObject[’numthreads’] = 2

SYNTAX

tree options object[’numthreads’] = desired number of threads

5.6.11.3 Setting the P Value Threshold

Using the following command, you can get or change the P value threshold. The first example demonstrates getting the current setting, and the second example demonstrates changing the setting. The new P threshold must be greater than or equal to 0.

EXAMPLE

myPThreshold = myOptionsObject[’pthreshold’]

SYNTAX

new variable = tree options object[’pthreshold’]

EXAMPLE

myOptionsObject[’pthreshold’] = 0.01

SYNTAX

tree options object[’pthreshold’] = desired threshold

5.6.11.4 Setting the Pairwise Threshold

Using the following command, you can get or change the pairwise threshold. The first example demonstrates getting the current setting, and the second example demonstrates changing the setting. The new pairwise threshold must be greater than or equal to 0.

EXAMPLE

myOptionsObject[’pairwisepthreshold’]

SYNTAX

new variable = tree options object[’pairwisepthreshold’]

EXAMPLE

myOptionsObject[’pairwisepthreshold’] = 0.01

SYNTAX

tree options object[’pairwisepthreshold’] = desired pairwise threshold

5.6.11.5 Setting the P Threshold Type

Using the following command, you can get or change the P threshold type. The first example demonstrates getting the current setting, and the second example demonstrates changing the setting. The new P threshold type must be one of three types: 0 = Raw P, 1 = Adjusted P, 2 = Bonferonni Adjusted P.

EXAMPLE

myOptionsObject[’pthresholdtype’]

SYNTAX

new variable = tree options object[’pthresholdtype’]

EXAMPLE

myOptionsObject[’pthresholdtype’] = 2

SYNTAX

tree options object[’pthresholdtype’] = desired threshold type

5.6.11.6 Setting the Minimum Haplotype Frequency

Using the following command, you can get or change the minimum haplotype frequency. The first example demonstrates getting the current setting, and the second example demonstrates changing the setting. The new minimum haplotype frequency must be in the range of 0 to 1.

EXAMPLE

myOptionsObject[’minhapfrequency’]

SYNTAX

new variable = tree options object[’minhapfrequency’]

EXAMPLE

myOptionsObject[’minhapfrequency’] = 0.01

SYNTAX

tree options object[’minhapfrequency’] = desired min haplotype frequency

5.6.11.7 Setting the Haplotype Estimation Method

Using the following command, you can get or change the haplotype estimation method. The first example demonstrates getting the current setting, and the second example demonstrates changing the setting. The new haplotype estimation method must be one of two types: 0 = EM, 1 = CHM.

EXAMPLE

myOptionsObject[’hapestmethod’]

SYNTAX

new variable = tree options object[’hapestmethod’]

EXAMPLE

myOptionsObject[’hapestmethod’] = 0

SYNTAX

tree options object[’hapestmethod’] = desired min haplotype estimation method

5.6.11.8 Setting the Segmenting Algorithm

Using the following command, you can get or change the segmenting algorithm. The first example demonstrates getting the current setting, and the second example demonstrates changing the setting. The new segmenting algorithm must be one of two types: 0 = exact, 1 = approximate.

EXAMPLE

myOptionsObject[’segalgorithm’]

SYNTAX

new variable = tree options object[’segalgorithm’]

EXAMPLE

myOptionsObject[’segalgorithm’] = 0

SYNTAX

tree options object[’segalgorithm’] = desired algorithm

5.6.11.9 Setting the Maximum Segments

Using the following command, you can get or change the maximum segments. The first example demonstrates getting the current setting, and the second example demonstrates changing the setting. The new maximum segments must be greater than or equal to 2.

EXAMPLE

myOptionsObject[’maxsegments’]

SYNTAX

new variable = tree options object[’maxsegments’]

EXAMPLE

myOptionsObject[’maxsegments’] = 3

SYNTAX

tree options object[’maxsegments’] = desired setting

5.6.11.10 Setting Resample Iterations

Using the following command, you can get or change the number of resample iterations. The first example demonstrates getting the current setting, and the second example demonstrates changing the setting.

EXAMPLE

myOptionsObject[’resample_iterations’]

SYNTAX

new variable = tree options object[’resample_iterations’]

EXAMPLE

myOptionsObject[’resample_iterations’] = 0

SYNTAX

tree options object[’resample_iterations’] = desired setting

5.6.11.11 Setting Linear Regression

Using the following command, you can get or change the Linear Regression setting. The first example demonstrates getting the current setting, and the second example demonstrates changing the setting. The new linear regression setting must be one of two settings: 0 = off, 1 = on.

EXAMPLE

myOptionsObject[’linearregression’]

SYNTAX

new variable = tree options object[’linearregression’]

EXAMPLE

myOptionsObject[’linearregression’] = 0

SYNTAX

tree options object[’linearregression’] = desired setting

5.6.11.12 Setting Use Missing Values Option

Using the following command, you can get or change the Use Missing Values setting. The first example demonstrates getting the current setting, and the second example demonstrates changing the setting. The new missing values setting must be one of two settings: 0 = off, 1 = on.

EXAMPLE

myOptionsObject[’usemissing’]

SYNTAX

new variable = tree options object[’usemissing’]

EXAMPLE

myOptionsObject[’usemissing’] = 0

SYNTAX

tree options object[’usemissing’] = desired setting

5.6.11.13 Setting Non-Genetic Splits

Using the following command, you can get or change the non-genetic splits setting. The first example demonstrates getting the current setting, and the second example demonstrates changing the setting. The non-genetic splits setting must be one of two settings: 0 = off, 1 = on.

EXAMPLE

myOptionsObject[’nongenetic’]

SYNTAX

new variable = tree options object[’nongenetic’]

EXAMPLE

myOptionsObject[’nongenetic’] = 0

SYNTAX

tree options object[’nongenetic’] = desired setting

5.6.11.14 Setting Genotype Splits

Using the following command, you can get or change the genotype splits setting. The first example demonstrates getting the current setting, and the second example demonstrates changing the setting. The genotype splits setting must be one of two settings: 0 = off, 1 = on.

EXAMPLE

myOptionsObject[’genotype’]

SYNTAX

new variable = tree options object[’genotype’]

EXAMPLE

myOptionsObject[’genotype’] = 0

SYNTAX

tree options object[’genotype’] = desired setting

5.6.11.15 Setting Haplotype Splits

Using the following command, you can get or change the haplotype splits setting. The first example demonstrates getting the current setting, and the second example demonstrates changing the setting. The haplotype splits setting can be one of three settings: 0 = off, 1 = allele test, any value greater than 2 = haplotype window size.

EXAMPLE

myOptionsObject[’haplotype’]

SYNTAX

new variable = tree options object[’haplotype’]

EXAMPLE

myOptionsObject[’haplotype’] = 0

SYNTAX

tree options object[’haplotype’] = desired setting

5.6.12 Creating a Tree Model

To build tree model you must first set the options you want by using the getTreeOptions() command and then setting the options to desired values. See(5.6.11). The options object is then passed as the first parameter to the buildTreeModel() command. In addition to the options object there are three optional parameters you can specify in any combination.

  1. numtrees, 100 is default.
  2. randseed, 12345678 is default.
  3. numsplitters, 10 is default.

The buildTreeModel() command will return a tree model object.

EXAMPLE

myTreeModel = mySS. buildTreeModel(ghi.getTreeOptions(), numtrees=50, randseed=7839743, numsplitters=6)

SYNTAX

tree model object = spreadsheet object. buildTreeModel(tree options object [, numtrees=val, randseed=val, numsplitters=val])

5.6.13 Importing a Legacy Tree Model

You can import an existing tree model using the following command. The tree model will be imported into the project as a child node of the current spreadsheet.

EXAMPLE

myTreeModel = mySS.importLegacyTreeModel(”C:/HelixTree/myProject/trees/myTree.ght”)

SYNTAX

tree model object = spreadsheet object. importLegacyTreeModel(path and filename of tree model)

5.6.14 Tree Model Commands

5.6.14.1 Get Variable Frequencies

This command creates a new spreadsheet with the variable frequencies of a multi-tree model. The new spreadsheet is added to the project as a child of the tree model and is returned as a spreadsheet object in the Python shell.

EXAMPLE

mySpreadsheetVariable = myTreeModel.variableFrequencies()

SYNTAX

new spreadsheet object = tree model object.variableFrequencies()

5.6.14.2 Get Tree Predictions

This command creates a new spreadsheet with the tree predictions. The spreadsheet is added to the project as a child of the tree model and is returned as a spreadsheet object in the Python shell.

EXAMPLE

mySpreadsheetVariable = myTreeModel.averageTreePredictions()

SYNTAX

new spreadsheet object = tree model object.averageTreePredictions()

5.6.14.3 Get Tree Variables

This command returns a list of variables used as splitters in creating the trees in a multi-tree model.

EXAMPLE

myVariable = myTreeModel.getTreeVariables()

SYNTAX

new list = tree model object.getTreeVariables()

5.6.14.4 Get Correlation Table

This command builds a spreadsheet of correlation interactions for a given tree model’s variables. The spreadsheet is added to the project as a child of the tree model and is returned as a spreadsheet object in the Python shell.

EXAMPLE

mySpreadsheetVariable = myTreeModel.correlationTable()

SYNTAX

new spreadsheet object = tree model object.correlationTable()

5.6.14.5 Get Correlation Plot

This command creates a new correlation interaction plot using the variables of a given tree model. The plot is added to the project as a child of the tree model, and a correlation plot object is returned in the Python shell.

EXAMPLE

myCorrelationPlot = myTreeModel.correlationPlot()

SYNTAX

new plot object = tree model object.correlationPlot()

5.6.14.6 Get Observation Distance Matrix Unsorted

This command creates an unsorted observation distance matrix plot using the tree model. The plot is added to the navigator window as a child of the tree model, and a distance matrix object is returned in the Python shell.

EXAMPLE

myDMPlot = myTreeModel.distMatrixUnsorted()

SYNTAX

new plot object = tree model object.distMatrixUnsorted()

5.6.14.7 Get Observation Distance Matrix Sorted by First Principal Component

This command creates an observation distance matrix plot, sorted by the first principal component, using the tree model. The plot is added to the navigator window as a child of the tree model and a distance matrix object is returned in the Python shell.

EXAMPLE

myDMPlot = myTreeModel.distMatrixSorted()

SYNTAX

new plot object = tree model object.distMatrixSorted()

5.6.14.8 Get Observation Distance Sorted by Similarity to One Observation

This command creates an observation distance matrix plot of observations most similar to selected observations, sorted by the distance to those observations, using the tree model. The plot is added to the navigator window as a child of the tree model and a distance matrix object is returned in the Python shell.

EXAMPLE

myDMPlot = myTreeModel.distMatrixSimSorted(37, 5)

SYNTAX

new plot object = tree model object.distMatrixSimSorted(number of similar observations, row (observation) number)

5.6.15 Using the Distance Matrix Object

Once you have created a distance matrix object, there are a number of functions available which use that object.

The functions translate between the ranking, or position, within the distance matrix plot, and the spreadsheet row name or number.

NOTE: For distance matrices sorted by first principal component, the earliest ranking (number one) corresponds to the eigenvector component with the greatest magnitude, the second ranking corresponds with the eigenvector component with the second-greatest magnitude, and so forth. For unsorted distance matrices, the ranking will be the same as the row number. For distance matrices sorted by similarity to a given observation, that observation will be number one, the closest observation to that one will be number two, and so forth.

5.6.15.1 Get the Observation Label for a Distance Matrix Plot

This command returns the observation label for a desired rank in an observation distance matrix plot. This command takes one parameter, the desired rank index.

EXAMPLE

myLabel = myDM.getObsLabel(1)

SYNTAX

new label string = distance matrix object.getObsLabel(rank index)

5.6.15.2 Get the Observation Number for a Distance Matrix Plot

This command returns the spreadsheet row number for a desired rank in an observation distance matrix plot. This command takes one parameter, the desired rank index.

EXAMPLE

myObsNumber = myDM.getObsNumber(2)

SYNTAX

row number variable = distance matrix object.getObsNumber(rank index)

5.6.15.3 Get the Rank Index for a Distance Matrix Plot

This command returns the rank index for a specified spreadsheet row used in an observation distance matrix plot. This command takes one parameter, the desired row number.

EXAMPLE

myRankIndex = myDM.getRankIndex(5)

SYNTAX

rank index variable = distance matrix object.getRankIndex(row number)

5.6.15.4 Get the Distance Values by Row Number for a Distance Matrix Plot

The following function can be used with two parameters or two ranges to output the computed distance value between two observations, input by spreadsheet row number. If two parameters are specified a single distance value is returned. If a range of parameters are specified then a matrix of distance values is returned.

EXAMPLE

myDistance = myDM.distance(5, 7)

SYNTAX

distance variable = distance matrix object.distance(first row number, second row number)

EXAMPLE

myMatrix = myDM.distance(5, 7, 8, 9)

SYNTAX

distance matrix = distance matrix object.distance(first range start row, first range end row, second range start row, second range end row)

5.6.15.5 Get the Distance Values by Rank Index for a Distance Matrix Plot

The following function can be used with two parameters or two ranges to output the computed distance value between two observations, input by rank index. If two parameters are specified a single distance value is returned. If a range of parameters are specified then a matrix of distance values is returned.

EXAMPLE

myDistance = myDM.distanceByRank(1, 3)

SYNTAX

distance variable = distance matrix object.distanceByRank(rank index of first observation, rank index of second observation)

EXAMPLE

myMatrix = myDM.distanceByRank(1, 3, 4, 6)

SYNTAX

distance matrix = distance matrix object.distanceByRank(first range start rank index, first range end rank index, second range start rank index, second range end rank index)

5.6.16 Applying a Tree Model

Use this command to apply a previously built tree model to a spreadsheet. This command takes one parameter, an existing tree model object, and returns the new tree model.

EXAMPLE

myAppliedTreeModel = mySS.applyTreeModel(treeModelObject)

SYNTAX

new tree model object = spreadsheet object. applyTreeModel(existing tree model object)

5.6.17 Copy Number Analysis

To import and segment copy number data, use the copyNumberAnalysis(...) command. This function takes 1 required parameter, which specifies the LogR DSF file from which to import, and various optional, keyword parameters to specify other segmenting options:

  • chromosomes – Python list of chromosomes to be imported
  • algorithm – 0 = univariate, 1 = multivariate
  • windowSize – Specifies the moving window size
  • maxSegments – Max segments per window
  • minSnps – Minimum number of SNPs per segment
  • maxP – Max pairwise permuted p-value
  • numThreads – Number of threads
  • wigFile – optional wiggle (WIG) output file for Genome Browser import
  • exclusionList – The path to a SNP exclusion file

NOTE: The wiggle file parameter must be a directory if using the univariate algorithm, but if using the multivariate algorithm it must represent a file.

This function will return a Python tuple containing references to both the covariates and segments spreadsheets created during analysis:

  • tuple[0] = covariates spreadsheet
  • tuple[1] = segments spreadsheet

EXAMPLE

returnTuple = ghi.copyNumberAnalysis(’C:/myData/copyNumber.dsf’, chromosomes=[’1’,’2’,’3’], exclusionList=’C:/myData/SNPExclusion.txt’,algorithm=0, windowSize=10000, maxSegments=20, minSnps=1, maxP=0.01, numThreads=2, wigFile=’C:/data/wiggleDir’)

SYNTAX

tuple = ghi.copyNumberAnalysis(path and filename of dsf file [chromosome list, exclusion list file name, algorithm, window size, max segments, min snps, max pairwise p, number of threads, wiggle file])

5.6.18 Importing Raw Copy Number Values

You can import the raw copy number values from a LogR DSF file using the importRawCopyNumber(...) command. As parameters, this function takes a LogR DSF file name, and optional keyword arguments for chromosome and a SNP exclusion lists. The chromosome keyword argument allows you to specify which chromosomes you would like to import in the form of a Python list. If the chromosome argument is used, only the chromosomes in the list will be imported, that is, SNPs that exist in chromosomes that are not listed will be omitted from the import. The exclusionList argument allows you to specify a SNP exclusion file to use with the import.

This function will return a reference to the newly imported spreadsheet.

EXAMPLE

mySS = ghi.importRawCopyNumber(’C:/myData/copyNumber.dsf’, chromosomes=[’1’,’2’,’3’], exclusionList=’C:/myData/SNPExclusion.txt’)

SYNTAX

new spreadsheet object = ghi.importRawCopyNumber(path and filename of dsf file [chromosome list, exclusion list file name])

5.6.19 Performing LogR Association Tests and PCA

The LogR Association Tests and PCA Window’s functionality (see 25.7) can be accessed through scripting by using the command:

  • ghi.logRAssociationTests( inputDsfName, ... )

This command returns the association test result spreadsheet if association tests are done or the PCA correction output if only PCA correction is done.

While the inputDsfName is the only required parameter, you must have at least one association test selected or have PCA correction enabled.

The full list of parameters are as follows:

  • inputDsfFile - The name of the dsf file to read from.
  • chrs - A list of strings representing which chromosomes to use in analysis. Defaults to using all chromosomes in the dsf.
  • corrTrend - Whether to use the Correlation/Trend association test. Default is 0 (off).
  • tTest - Whether to use the T-Test test. Default is 0 (off).
  • regression - Whether to use the Regression test. Default is 0 (off).
  • phenotypeId - The Node ID of the phenotype spreadsheet to use for association tests. Note that this is required if you are doing association testing.
  • trait - The column number of the trait to use as the dependent variable. Note that this is required if you are doing association testing.
  • usePCA - Whether to PCA correct the data. Default is 0 (off)
  • numComponents - The number of principal components to compute in PCA correction. Default is 10.
  • outputDsfFile - The name of the dsf file to write the corrected data to. The file will be overwritten if it exists.
  • useUncorrected -Whether to output data based on an uncorrected dependent variable. Default is 1 (on).
  • useCorrected - Whether to output data based on a corrected dependent variable. Default is 1 (on).

The logRAssociationTests command should be used in the following manner:

EXAMPLE

myOutputSpreadsheet = ghi.logRAssociationTests( ’myLogR.dsf’, chrs=[’1’,’2’,’X’], corrTrend=1, tTest=1, regression=1, phenotypeId=50, trait=2, usePCA=1, numComponets=100, outputDsfFile=’output.dsf’)

SYNTAX

new spreadsheet = ghi.logRAssociationTests( inputDsfFile, keyword1=value1, keyword2=value2...)

For more details on the parameters used in this command, see chapter 25.7.

5.6.20 Performing Regression

There are two scripting commands which allow you to perform various regressions. These commands are:

  • performRegression(...) - to perform linear or logistic regression.
  • performStepwiseRegression(...) - to perform stepwise linear or logistic regression

In order to be able to use these commands, the spreadsheet object must have exactly one non-categorical column set as dependent.

These commands will return a tuple of result objects which will vary based on the options used. The results tuple will contain:

  • A P-value plot object at the first index, and ’None’ at the second index if parameters are set to run HTR on a moving window.
  • A text viewer object at the first index, and ’None’ at the second index if not running HTR for a moving window.
  • A text viewer at the first index, and a residual spreadsheet at the second index if parameters are not set to use a moving window, and the parameter for whether or not to create a residual spreadsheet is set to 1.

The performRegression command requires one parameter which represents whether or not to perform HTR (0 = No, 1 = Yes).

The performStepwiseRegression command requires 2 parameters: whether or not to perform HTR(0 = No, 1 = Yes), and a p-value cutoff for the stepwise procedure.

Additionally, these commands can take a number of keyword arguments, each of which may be required or prohibited based on the other options used. These keywords, and the parameters they represent are as follows:

  • numPermutations - represents the number of permutations to perform for permutation tests. To omit permutation testing, set this value to 0. Default value = 0
  • windowType - allows you to choose how markers will be used in the regression. 0 = fixed size moving window, 1 = moving window using genetic distance and window size, 2 = run for selected markers only. This parameter is only applicable if performing HTR. Default = Project options value, or 0 if the spreadsheet is not marker mapped.
  • ignoreMarkerMapping - allows you to chose whether to use spreadsheet ordering or marker map ordering for the moving window. 1 = use spreadsheet ordering, 0 = use marker map ordering(this is only applicable for marker mapped spreadsheets). Default = Project options value.
  • windowGeneticDistance - the maximum genetic distance that the moving window will span (Only applicable for marker mapped spreadsheets). Default = Project options value.
  • windowSize - the maximum number of markers that will be considered within the moving window. Default = Project options value.
  • imputeMissingValues - if this is set to 0, samples that have missing values for haplotypes are exlcuded. If this parameter is set to 1, haplotype probabilities for missing values will be imputed. Default = Project options value
  • minHaplotypeFrequency - the minimum haplotype frequency allows you to exclude haplotypes which have frequencies below this value. Default = Project options value.
  • estimationMethod - Specifies whether to use EM, or CHM to estimate haplotype frequencies. 0 = CHM, 1 = EM. Default = Project options value
  • maxEmIterations - The maximum number of EM iterations(only applicable if you are using EM). Default = Project options value.
  • emConvergenceTolerance - EM convergence tolerance(only applicable if you are using EM). Default = Project options value.
  • fullVsReduced - this parameter allows you to specify whether to use the full model, or, if you are performing HTR and have included non-genetic covariates or first order interactions, the full vs. reduced model as a test of significance. The full model will include the haplotypes plus any covariates and interaction terms, while the reduced model contains just the non-genetic and interactions themselves. 0 = use full model, 1 = use full vs. reduced model. Default = 1 if running HTR with non-genetic covariates, 0 otherwise.
  • selectedMarkers - if you have chosen to run HTR for a specific set of markers, this parameter is required and allows you to specify the set of markers as a python list of spreadsheet column numbers; i.e. selectedMarkers = [1,2,5,7,9] would denote that column numbers 1, 2, 5, 7, and 9 should be used in the regression. All column numbers in this list must represent active, genetic columns.
  • nonGeneticCovariates - This parameter represents the non-genetic covariates that should be included in the regression. This parameter must be specified as a python list of spreadsheet column numbers, i.e. nonGeneticCovariates = [3, 4, 6] would include columns 3, 4 and 6 in the regression. All column numbers must represent active, non-genetic columns.
  • firstOrderInteractions - defines the first-order interactions between non-genetic covariates which will be used in the regression. These interactions must be specified in the form of a python list of tuples which contain non-genetic spreadsheet column numbers, i.e. firstOrderInteractions = [(3, 4), (4, 4), (2, 3)]. All column numbers must represent active, non-genetic columns.
  • createResidualSpreadsheet - if set to 1, a residual spreadsheet will be created. Default value = 0.

The performRegression command should be used in the following manner:

EXAMPLE

myTuple = mySpreadsheet.performRegression( 1, windowType=0, windowSize=3, estimationMethod=1, maxEmIterations=50, nonGeneticCovariates=[3,5,6], firstOrderInteractions=[(3, 6),(5, 5)])

SYNTAX

new tuple = spreadsheet object.performRegression( doHTR, keyword1=value1, keyword2=value2...)

The performStepwiseRegression command can be used as follows:

EXAMPLE

myTuple = mySpreadsheet.performStepwiseRegression( 1, 0.01, windowType=0, windowSize=3, estimationMethod=1, maxEmIterations=50, nonGeneticCovariates=[3,5,6], firstOrderInteractions=[(3, 6),(5, 5)])

SYNTAX

new tuple = spreadsheet object.performStepwiseRegression( doHTR, pvalueCutoff, keyword1=value1, keyword2=value2...)

For further description of the parameters used in these commands please see 24.2.

5.6.21 Output a C File

To create a C program with the prediction rules of a tree model, use this command.

EXAMPLE

myTreeModel.outputCFile(’/tmp/mycfile’)

SYNTAX

tree model object.outputCFile(path and filename of C file)

5.6.22 Prompting the User for Input

In interactive scripts, it is often useful to prompt the user for data or parameters to be used in the analysis. We have provided an interface to create a simple modal dialog that prompts the user for a number of values, and provides error checking on the input. The main ghi object provides a convenient python function for this purpose. The function takes a set of dictionaries as the argument and returns a list of user inputs.

EXAMPLE

myUserInput = ghi.promptUser([{"label":"Enter string:", "type":"string", "tooltip":"Any string will do"},
{"label":"Enter integer:", "type":"integer","min":0, "max":100},
{"label":"Enter double:", "type":"double","min":-1},
{"label":"Select method:", "type":"combobox", 
"list":["method 1", "method 2", "method 3"]}])

SYNTAX

new list = ghi.promptUser(list of dictionaries)

This example would construct the following dialog:


[Picture]
Figure 5.6: An example promptUser() dialog

The promptUser() function takes a list of Python dictionary objects, with each object defining one data entry field. The fields are included in list order from top to bottom within the dialog. When the user hits the OK button, if the entries are valid, they are returned in a list.

There are a number of data entry methods available. Each entry method is defined using a dictionary. Every entry must have a "label" attribute and a "type" attribute. Each entry may have an optional tooltip attribute, which is a message that appears when the user hovers the mouse over the field. Labels are listed at the left of the data field. The "type" attribute defines which type of data entry field is to be constructed. The available types are as follows:

  • integer: prompts user for an integer. Does error checking to see that a valid integer has been entered.

    Optional attributes:

    • "min":<integer> specifies that integer must be greater than or equal to the specified minimum value.
    • "max":<integer> specifies that integer must be less than or equal to the specified maximum value.
  • double: prompts user for a double precision (64-bit) number. Does error checking to see that a valid double has been entered.

    Optional attributes:

    • "min":<double> specifies that user-entered double must be greater than or equal to the specified minimum value.
    • "max":<double> specifies that user-entered double must be less than or equal to the specified maximum value.
  • float: same as double, provided for convenience.
  • real: same as double, provided for convenience.
  • string: prompts user for a string. Does error checking to see that the string entered is not blank.
  • combobox: Takes an additional non-optional attribute, "list", which contains a list of strings to form a list of choices for the user to choose from. For example, the dictionary entry "list":["item1", "item2"] specifies a combobox with two possible values to choose from, "item1" and "item2". The first item is specified by default.

If the user cancels, a Python exception is returned. If there is an error in syntax, a Python exception is returned with a description of the error. If the user hits OK, and there is any error in the input, the user is told by the dialog what the problem is, which may be remedied from within the dialog. If there are no errors, a list of the user inputs is returned in the same order as the dictionaries that are passed in.

5.6.23 Text Viewer

5.6.23.1 Getting the text

This function will return a Python string with the contents of the text viewer.

EXAMPLE

myVeiwerContents = myTextViewer.getText()

SYNTAX

new string = text viewer object.getText()

5.6.23.2 Saving text to a file

This function will save the contents of the text viewer to a .txt file. “.txt” will be appended to the end of the file name if it is not already there.

EXAMPLE

myTextViewer.saveToFile("filename.txt")

SYNTAX

text viewer object.saveToFile( file name string )

5.6.24 Regression Results

5.6.24.1 Getting the text

This function will return a Python string with the contents of the regression results.

EXAMPLE

myResultsString = myRegressionResults.getText()

SYNTAX

new string = regression results object.getText()

5.6.24.2 Saving text to a file

This function will save the contents of the Regression Results to a .txt file. “.txt” will be appended to the end of the file name if it is not already there.

EXAMPLE

myRegressionResults.saveToFile("filename.txt")

SYNTAX

regression results object.saveToFile( file name string )

5.6.24.3 Getting the covariates

This function will return the covariates which were used in the regression as a Python list of spreadsheet column numbers.

EXAMPLE

myCovariates = myRegressionResults.nonGeneticCovariates()

SYNTAX

new list = regression results object.nonGeneticCovariates()

5.6.24.4 Getting the markers

This function will return the markers which were used in the regression as a Python list of spreadsheet column numbers.

EXAMPLE

myMarkers = myRegressionResults.selectedMarkers()

SYNTAX

new list = regression results object.selectedMarkers()

5.6.24.5 Getting the interactions

This function will return the interactions which were used in the regression as a Python list of tuples, which contain a column number for each of the regression terms.

EXAMPLE

myInteractions = myRegressionResults.interactionTerms()

SYNTAX

new list of tuples = regression results object.interactionTerms()

5.6.25 Navigator Object Selection

5.6.25.1 Selecting a Spreadsheet

This function constructs a dialog that lists all of the navigator nodes, with all the spreadsheet nodes highlighted in white. The navigator node ID is returned for the spreadsheet that the user selects. Nothing is returned if the user cancels.

EXAMPLE

mySSID = ghi.promptSpreadsheet()

SYNTAX

new variable = ghi.promptSpreadsheet()

5.6.25.2 Selecting a Tree model

This function constructs a dialog that lists all of the navigator nodes, with all the tree model nodes highlighted in white. The navigator node ID is returned for the tree model that the user selects. Nothing is returned if the user cancels.

EXAMPLE

myTreeID = ghi.promptTree()

SYNTAX

new variable = ghi.promptTree()