Tutorial 2: Performing the Basic Workflow in Scripting Mode

In this tutorial we perform exactly the same operations as in tutorial one except all of the operations will be in scripting mode. To eliminate the need for you to type, we have provided a completed script file called tutorial2_script.py which is contained in the example folder. You can view this file with your favorite text editor. We reproduce the contents of the script file here:

[Picture]

(Note: The numbers in the left column are line numbers that make it easy to refer to parts of the program. They are not part of the program. If you look at the actual script file with your text editor, you will not see those numbers.)

2.5.1 Scripting Language Basics

In order to make sense of the script file, we need to explain some of the constructs of the scripting language. The primary elements of the HelixTree scripting language are built in objects, such as spreadsheets and tree models, and methods that are used to manipulate the objects. One object called ’ghi’ is created for you and contains methods for manipulating projects.

In order to identify a specific object, you must use the name you assign it in your script. For example, ss refers to the spreadsheet object that was created when you imported GSIM.csv, treeModel refers to the random tree model object you created from the spreadsheet data, and so on.

Methods are identified by their name and a (possibly empty) list of parameters contained in parentheses. Parameters are separated by commas.

To invoke a method on a specific object, you must identify the specific object that you want to manipulate. This must be followed by a period. The period must be followed by the method name and its list of parameters. For example, ghi.closeProject() will close the currently open project.

2.5.2 The Script Explained

As you can see from the script file above, the scripting language tends to be fairly readable and intuitive. One reason for this is that the names of the methods mirror the name of the equivalent action you would perform in GUI mode. So, at the risk of appearing a bit redundant, let’s explain the above script.

The first and third lines are comments. Line 4 is the first executable statement. It uses the ghi object to set a flag that will prevent any viewers or progress bars from displaying while the script is executing. Line 7 demonstrates how to create a new project using the ghi object again. You provide the name of the project and the location where you want to create the project directory and files.

Line 10 uses the ghi object to import data into the current project and assign the results of the method to ss, where ss is equivalant to a spreadsheet in gui mode. The importCSV method has two parameters.

The first parameter of the importCSV method is the name of the data file that is being imported. The name can either be an absolute or relative path name. Relative path names are always relative to the directory where the script is executing. In our script, the path is an absolute path.

The second parameter of the importCSV method identifies column one as being a label column for the rows.

In the GUI base tutorial, we identified PATID as the label column. The PATID is the first column of the data set, so we set the second parameter of the importCSV method to the integer value 1.

Lines 13 through 15 uses the spreadsheet object, ’ss’, to inactivate (exclude) two columns from the analysis and select one column as the dependent variable just as we did in the GUI mode.

Line 18 uses the ghi object to get a tree options object. On line 19 this tree options object us used as input to the spreadsheet buildTreeModel method. This method has three parameters the first being an options object, which contains parameters needed to control the tree building process. The second and third parameters are optional parameters. The second one is numtrees and the third is randseed. If you don’t specify these values they default to 100 and 12345678 respectively. In order to mirror the steps we performed in the GUI tutorial, we are going to force the buildTreeModel method to not be so random. The second parameter numtrees, is therefore set to 1.

After the random trees are generated they are assigned to the name treeModel for later reference.

On line 22 we use the treeModel object to get the tree predictions and these predictions are saved to predictions which is a spreadsheet object. Using the spreadsheet object predictions we then export to a CSV file the contents of the spreadsheet object on line 23.

Line 25 wraps up the script by closing the current project. Note that you could later go into GUI mode and open this project and it would contain the imported dataset, generated tree model and the various spreadsheets you created in the script.

2.5.3 Executing the Script

It is easy to execute the script. Open a console window and change directories to the example directory where the script is located. From the command prompt, execute the following line:

..\HelixTree.com -s tutorial2_script.py

The "-s" option is followed by the script that HelixTree is going to execute.

2.5.4 Comparing the Results

We now have resulting output files, tutorial1_results.csv and predictions.csv. Open both files in Excel or a text editor. Note that they are identical. We point this out only to reinforce the idea that what can be done from the GUI can also be done exactly the same way through scripting.