Contents

I  Installing Optimus RP and Acquiring Data
1 Installing and Initializing Optimus RP
 1.1 Installation Overview
  1.1.1 Installation Under Windows
  1.1.2 Installation Under Apple
  1.1.3 Installation Under Linux
  1.1.4 Software Registration and Activation
   1.1.4.1 Manually Setting Proxy Settings
   1.1.4.2 Adding Personal Information
  1.1.5 Alternate Software Activation via Email
 1.2 Release Notes
  1.2.1 Improvements in version 4.2
  1.2.2 Bug Fixes
  1.2.3 Known Bugs
2 Welcome to Optimus RP
 2.1 Goals for this Chapter
 2.2 Recursive Partitioning Primer
 2.3 The Optimus RP Basic Workflow
 2.4 Tutorial 1: Performing the Basic Workflow in GUI Mode
  2.4.1 Create a New Project
  2.4.2 Import A Data Set
  2.4.3 View Imported Data in a Spread Sheet Viewer
  2.4.4 Identify the Data Columns to be Used in Analysis
  2.4.5 Perform the Analysis
  2.4.6 Export Analysis Results for Publishing
 2.5 Tutorial 2: Performing the Basic Workflow in Scripting Mode
  2.5.1 Scripting Language Basics
  2.5.2 The Script Explained
  2.5.3 Executing the Script
  2.5.4 Comparing the Results
3 Navigating the Main Screen
 3.1 Main Screen Overview
  3.1.1 The Menu Bar
  3.1.2 New Project
 3.2 Project Viewer Window
  3.2.1 The Menu Bar
  3.2.2 The Tool Bar
  3.2.3 The Project Navigator Window
  3.2.4 The Node Change Log Window
  3.2.5 The Edit Annotations Window
 3.3 Navigator Nodes
  3.3.1 Viewer Windows
  3.3.2 Types of Navigator Nodes and Their Associated Viewers
   3.3.2.1 The Project Node
   3.3.2.2 The Dataset Node
   3.3.2.3 The Spreadsheet Node
   3.3.2.4 Tree Analysis Node
   3.3.2.5 Multitree Model Node
   3.3.2.6 Applied Tree Node
   3.3.2.7 Histogram Node
   3.3.2.8 Observation Distance Matrix Node
   3.3.2.9 Correlation Interaction Node
   3.3.2.10 P Value Node
   3.3.2.11 Text Node
 3.4 The File Menu
  3.4.1 Open Project
  3.4.2 Save Project
  3.4.3 Close Project
  3.4.4 Import Data
  3.4.5 Recent Project Files
  3.4.6 Quit
 3.5 Tools Menu
  3.5.1 Options for Updates and New Projects
  3.5.2 Current Project’s Options
  3.5.3 Project Option Settings Available
   3.5.3.1 Tree Options
   3.5.3.2 Node View Options
   3.5.3.3 Appearance
   3.5.3.4 Other
  3.5.4 View Project Log - Sorted by Node
  3.5.5 View Project Log - Chronological
  3.5.6 Run Script
  3.5.7 Run Python Shell
  3.5.8 Update Optimus RP
 3.6 The Help Menu
4 Importing Your Data Into Optimus RP
 4.1 General Considerations
 4.2 Mathematical Considerations
  4.2.1 P-Values and Dependent Variables
  4.2.2 Regression and Independent Variables
  4.2.3 P-Value Algorithms
  4.2.4 Preparing Data Example
 4.3 Importing Data
  4.3.1 The Import Wizard
   4.3.1.1 Importing Files
   4.3.1.2 Importing ODBC Data
  4.3.2 Importing ASCII Data Files
  4.3.3 Importing Legacy GHD Files
  4.3.4 Importing DSF Files
5 Scripting and Other Integrated Statistical Tools
 5.1 Integrated Tools Overview
 5.2 The Python Shell Window
  5.2.1 Using Shell Objects
  5.2.2 Using the Dir Command
  5.2.3 Using Python for Getting Help
 5.3 Running Scripts
  5.3.1 Command Line Invocation
  5.3.2 Python Shell Invocation
  5.3.3 Navigator Menu Invocation
  5.3.4 Script Server Invocation
 5.4 Selecting a Script Server
 5.5 Example Scripts
 5.6 Scripting Reference
  5.6.1 Project Related Commands
   5.6.1.1 Creating a New Project
   5.6.1.2 Creating a Temporary Project
   5.6.1.3 Open an Existing Project
   5.6.1.4 Saving a Project
   5.6.1.5 Closing a Project
  5.6.2 General GHI Commands
   5.6.2.1 Allowing Viewers to Display
   5.6.2.2 Allowing Log Messages to Be Created
   5.6.2.3 Display a GUI Message
   5.6.2.4 Display a GUI Error Message
   5.6.2.5 Getting a Specific Navigator Node
   5.6.2.6 Getting the Current Navigator Node
   5.6.2.7 Choosing a File
   5.6.2.8 Choosing a Directory
   5.6.2.9 Creating a Progress Bar
   5.6.2.10 Setting the Progress Bar’s Progress
   5.6.2.11 Checking if the Progress Bar Has Been Cancelled
   5.6.2.12 Disposing of a Progress Bar When Done
   5.6.2.13 Creating a Status Dialog
   5.6.2.14 Setting the Status Dialog’s Message
   5.6.2.15 Closing the Status Dialog When Done
  5.6.3 Commands Common to All Objects
   5.6.3.1 Change a Navigator Node Name
   5.6.3.2 Getting a Navigator Node Name
   5.6.3.3 Getting a Navigator Node Type
   5.6.3.4 Getting a Navigator Node ID
   5.6.3.5 Deleting a Navigator Node
   5.6.3.6 Closing a Navigator Viewer
   5.6.3.7 Showing a Navigator Viewer
   5.6.3.8 Finding a Node’s Parent
   5.6.3.9 Finding a Node’s Secondary Parent
   5.6.3.10 Getting a Node’s Annotations
   5.6.3.11 Appending to a Node’s Annotations
  5.6.4 Importing and Loading Data
   5.6.4.1 Importing GHD-format DataSets
   5.6.4.2 Importing DSF Files
   5.6.4.3 Importing Various File Formats
   5.6.4.4 Importing ASCII files
  5.6.5 Creating a New Data Set with Scripting
   5.6.5.1 Getting a Dataset Builder Object
   5.6.5.2 Adding Row Labels
   5.6.5.3 Adding a Column of Boolean Values
   5.6.5.4 Adding a Column of Integer Values
   5.6.5.5 Adding a Column of Double Values
   5.6.5.6 Adding a Column of Nominal Values
   5.6.5.7 Creating the Data Set
  5.6.6 Spreadsheet Access and Manipulation
   5.6.6.1 Getting the Spreadsheet as a Dictionary
   5.6.6.2 Getting the Spreadsheet as a List of Lists
   5.6.6.3 Getting a Spreadsheet Cell
   5.6.6.4 Getting a Spreadsheet Column by Column Number
   5.6.6.5 Getting a Spreadsheet Column by Column Name
   5.6.6.6 Determining if a Spreadsheet is a Marker Map
   5.6.6.7 Get a Spreadsheet Column Type
   5.6.6.8 Get a Spreadsheet Column State
   5.6.6.9 Export a Spreadsheet to CSV File
   5.6.6.10 Export a Spreadsheet to a DSF File
   5.6.6.11 Finding a Column by Name
   5.6.6.12 Finding a Row by Name
   5.6.6.13 Invert Row States
   5.6.6.14 Getting the Number of Spreadsheet Columns
   5.6.6.15 Get the Number of Columns in a State
   5.6.6.16 Get the Number of Spreadsheet Rows
   5.6.6.17 Get the Number of Rows in a State
   5.6.6.18 Randomly Shuffle Rows
   5.6.6.19 Getting a Row of Data
   5.6.6.20 Change the State of a Single Column
   5.6.6.21 Change the State of a Range of Columns
   5.6.6.22 Setting the State of a Single Row
   5.6.6.23 Getting the State of a Single Row
   5.6.6.24 Setting the State of a Range of Rows
   5.6.6.25 Randomly set a Number of Rows to a State
   5.6.6.26 Randomly Set a Percentage of Rows to a State
   5.6.6.27 Sort a Column in Ascending Order
   5.6.6.28 Sort a Column in Descending Order
   5.6.6.29 Remembering a Spreadsheet Page
   5.6.6.30 Joining Two Spreadsheets
  5.6.7 Using the P-Value plot
   5.6.7.1 Getting P-Values
   5.6.7.2 Getting Simes P-Values
   5.6.7.3 Setting the Simes Window
   5.6.7.4 Getting FDR (aP)
   5.6.7.5 Getting all P-Values as a Spreadsheet
  5.6.8 Getting and Setting Tree Options
   5.6.8.1 Setting the Minimum Elements for Splitting
   5.6.8.2 Setting the Number of Threads
   5.6.8.3 Setting the P Value Threshold
   5.6.8.4 Setting the Pairwise Threshold
   5.6.8.5 Setting the P Threshold Type
   5.6.8.6 Setting the Segmenting Algorithm
   5.6.8.7 Setting the Maximum Segments
   5.6.8.8 Setting Resample Iterations
   5.6.8.9 Setting Linear Regression
   5.6.8.10 Setting RP Splits
   5.6.8.11 Setting Use Missing Values Option
  5.6.9 Creating a Tree Model
  5.6.10 Importing a Legacy Tree Model
  5.6.11 Tree Model Commands
   5.6.11.1 Get Variable Frequencies
   5.6.11.2 Get Tree Predictions
   5.6.11.3 Get Tree Variables
   5.6.11.4 Get Correlation Table
   5.6.11.5 Get Correlation Plot
   5.6.11.6 Get Observation Distance Matrix Unsorted
   5.6.11.7 Get Observation Distance Matrix Sorted by First Principal Component
   5.6.11.8 Get Observation Distance Sorted by Similarity to One Observation
  5.6.12 Using the Distance Matrix Object
   5.6.12.1 Get the Observation Label for a Distance Matrix Plot
   5.6.12.2 Get the Observation Number for a Distance Matrix Plot
   5.6.12.3 Get the Rank Index for a Distance Matrix Plot
   5.6.12.4 Get the Distance Values by Row Number for a Distance Matrix Plot
   5.6.12.5 Get the Distance Values by Rank Index for a Distance Matrix Plot
  5.6.13 Applying a Tree Model
  5.6.14 Performing Regression
  5.6.15 Output a C File
  5.6.16 Prompting the User for Input
  5.6.17 Text Viewer
   5.6.17.1 Getting the text
   5.6.17.2 Saving text to a file
  5.6.18 Regression Results
   5.6.18.1 Getting the text
   5.6.18.2 Saving text to a file
   5.6.18.3 Getting the covariates
   5.6.18.4 Getting the interactions
  5.6.19 Navigator Object Selection
   5.6.19.1 Selecting a Spreadsheet
   5.6.19.2 Selecting a Tree model
 5.7 S-PLUS Integration
  5.7.1 S-PLUS Desktop Integration
  5.7.2 S-PLUS Client/Server Integration
 5.8 R Integration
6 Using the Spreadsheet Viewer
 6.1 Spreadsheet Overview
 6.2 Manipulating, Filtering and Preparing Data Using the Spreadsheet
  6.2.1 Dependent or Independent Variable?
  6.2.2 Selecting a Dependent
  6.2.3 Sorting Records
  6.2.4 Deactivating Unwanted Columns
  6.2.5 Activating - Deactivating Row Data
  6.2.6 Picking Random Record Sets
 6.3 Navigating the Spreadsheet Menus
  6.3.1 File Menu
   6.3.1.1 Save As - Exporting Data
   6.3.1.2 Save As Comma-Delimited Text File
   6.3.1.3 Save As DSF File
   6.3.1.4 Join Spreadsheet on Row Labels
   6.3.1.5 Join Spreadsheets With Uneven Row Numbers
   6.3.1.6 Join Spreadsheets by Sorting
   6.3.1.7 Import a Legacy Tree Model
   6.3.1.8 Closing the File
  6.3.2 Edit Menu
   6.3.2.1 Select Row Subset
   6.3.2.2 Activate All Rows
   6.3.2.3 Inverting the Records (Rows) Selected
   6.3.2.4 Inverting the Columns Selected
   6.3.2.5 Row Subset Spreadsheet
   6.3.2.6 Column Subset Spreadsheet
   6.3.2.7 Find Column Search Tool
   6.3.2.8 Inactivate All Columns/Activate All Columns
  6.3.3 Analysis Menu
   6.3.3.1 Interactive Tree Analysis
   6.3.3.2 Create a Multiple-Tree Model
   6.3.3.3 Apply a Tree Model
  6.3.4 Help Menu
II  Recursive Partitioning
7 Interactive Tree Analysis
 7.1 Tree Analysis Overview
 7.2 Setting Options for Tree Analysis
  7.2.1 The Tree Tab
   7.2.1.1 Minimum Elements per Child:
   7.2.1.2 Segmenting Algorithm:
   7.2.1.3 Max Segments:
   7.2.1.4 Parallel Threads:
   7.2.1.5 Resampling Iterations
   7.2.1.6 P Value Threshold:
   7.2.1.7 Use Missing Values as Predictors
   7.2.1.8 Linear/Logistic Regression:
   7.2.1.9 RP Splits:
  7.2.2 The Node View Tab
   7.2.2.1 What the Node Values Mean
 7.3 Working with Nodes
  7.3.1 Node Pop-up Menu Selections
   7.3.1.1 Split Node
   7.3.1.2 Manual Split
   7.3.1.3 Collapse/Expand
   7.3.1.4 Recursive Split
   7.3.1.5 Spreadsheet
   7.3.1.6 Resample
  7.3.2 Visualize Split Data
  7.3.3 Visualize Split Data->Multiple Tree Clustering
  7.3.4 Visualize Split Data->Histogram
  7.3.5 Showing Split Data
   7.3.5.1 Linear Regression Splits
   7.3.5.2 Logistic Regression Splits
   7.3.5.3 Splits on Continuous Predictors
   7.3.5.4 Splits on Categorical Predictors
 7.4 Manually Splitting Nodes
  7.4.1 P Value Plots
  7.4.2 Define Split
  7.4.3 Don’t Split
  7.4.4 Manual Split with Regression
  7.4.5 Using the Tree and Manual Split Window Together
 7.5 Defining Splits
  7.5.1 The Split Point
  7.5.2 The Split Point Controls the Node Information
  7.5.3 Smoothing the Data Points
  7.5.4 Refined or Course Data Points
  7.5.5 Zooming into a Specific Region
  7.5.6 Adding Linear Regression
  7.5.7 Categorical Predictors
 7.6 The File Menu
  7.6.1 File->Print tree
  7.6.2 File->Save Tree Image
  7.6.3 File->View Predictions (In-Sample)
  7.6.4 File->Output C Code
  7.6.5 File->Close
 7.7 The Tree Menu
  7.7.1 Tree->Options
  7.7.2 Tree->Subset Spreadsheet
  7.7.3 Tree->Subset Tree
  7.7.4 Tree->Extend Current Tree Randomly
  7.7.5 Tree->Search Tree
   7.7.5.1 Tree->SearchTree->Find Observation
   7.7.5.2 Tree->SearchTree->Find Node
   7.7.5.3 Tree->SearchTree->Select Node by Threshold
   7.7.5.4 Tree->SearchTree->Highlight All Nodes
   7.7.5.5 Tree->SearchTree->UnHighlight All Nodes
 7.8 The Font Menu - Resizing and Formatting Tree View
  7.8.1 Font->Size
  7.8.2 Font->Family
8 Prediction Recipes
 8.1 Training and Validation Recipe
 8.2 Predicting An Unknown Response
  8.2.1 Modifying a Copy of Data
  8.2.2 Converting to a GHD File
  8.2.3 Create a Subset With BP Readings
  8.2.4 Create Random Trees From Initial Dataset
  8.2.5 Invert Data to Create Dataset With Missing BP
  8.2.6 Applying A Multitree Model To Make Predictions
9 Random Tree Generation
 9.1 Random Tree Overview
 9.2 Creating a Random Tree Model
 9.3 Multitree Model Browsing - Tree View
  9.3.1 Multitree Model – Tree List
  9.3.2 Multitree Model – Variable List
   9.3.2.1 Sorting
   9.3.2.2 Subset
   9.3.2.3 Variables->View Variable Usage
   9.3.2.4 Variables->View Variable Frequency
   9.3.2.5 Viewing Variable Correlations
  9.3.3 File->Analyze Current Tree Tools
  9.3.4 Predictions (InSample)->View Average Tree In-Sample Predictions
  9.3.5 Predictions (InSample)->Save All Tree In-Sample Predictions to CSV File
  9.3.6 Save “C” Prediction Program
  9.3.7 Close
  9.3.8 Help
10 Multivariate Tree Analysis
 10.1 Multivariate Analysis Overview
 10.2 Using More Than One Dependent Variable
  10.2.1 Continuous Multivariate Response
  10.2.2 Binary and/or Categorical Multivariate Response
  10.2.3 Multivariate Multiple Tree Clustering
  10.2.4 File->Output C Code
11 Histogram Node Analysis
 11.1 Histogram Overview
 11.2 Viewing Split Data Histograms
  11.2.1 Creating Histograms
  11.2.2 Visualizing Node Relationships
  11.2.3 Changing Bins
  11.2.4 Zooming or Rubber Banding Data
  11.2.5 Menus
  11.2.6 File->Print
12 Viewing the Observation Distance Matrix
 12.1 Observation Distance Matrix Overview
  12.1.1 Creating an Observation Distance Matrix
  12.1.2 The Observation Distance Matrix
  12.1.3 Set Axes
  12.1.4 Stop Calculation/Stop Refresh and Restore Calculation/Restore Refresh
  12.1.5 Copy to Clipboard
  12.1.6 Creating a Spreadsheet or Tree view from the Matrix Plot
  12.1.7 Zoom Mode
  12.1.8 Modify Color Scaling
  12.1.9 Effect of Clicking on the Plot
  12.1.10 Color Drop Down Menu
 12.2 Viewing Observation Distance Matrix
  12.2.1 Viewing Spreadsheets or Trees of Subsets
  12.2.2 Zooming-In on a Subset of the Distance Matrix Plot
  12.2.3 Narrowing the Distance Range
  12.2.4 Menus
 12.3 Printing and Saving the Observation Distance Matrix
  12.3.1 Save Image or Print
  12.3.2 The Menus
  12.3.3 File->Save Obs. Distance Matrix (Sorted)
13 The Correlation Interaction View
 13.1 Correlation Interaction Overview
  13.1.1 Pick Wanted Variables
 13.2 Viewing Correlation Interactions
  13.2.1 The Correlation Interaction View
  13.2.2 The Upper Triangle
  13.2.3 The Lower Triangle
14 P-Value Plot
 14.1 Plotting P-values
 14.2 P-Value plot types
  14.2.1 P-Value Plots sorted by Var #
  14.2.2 P-Value Plots sorted by adjusted P-value
 14.3 The P-value Plot
 14.4 Reset View
 14.5 Copy to Clipboard
 14.6 Axis Selector
 14.7 Zooming into the Graph
 14.8 File Menu
 14.9 Create Bitmap
 14.10 Print Image
 14.11 P-Value Spreadsheet
15 Text Viewer
 15.1 Text Viewer Overview
 15.2 Navigating the Text Viewer Menus
  15.2.1 File Menu
   15.2.1.1 Save As Text File
16 Regression Analysis (Optional Module)
 16.1 Regression Analysis Overview
 16.2 Performing Analysis
  16.2.1 Covariates
  16.2.2 Type of Regression
  16.2.3 Permutation Tests
  16.2.4 Create Residual Spreadsheet With Covariates
  16.2.5 Output and Running the Regression
III  The Science Behind Optimus RP
17 Formulas and Theories
 17.1 Split-Prediction Methodology
 17.2 Normally Distributed Response Binomial Predictor
  17.2.1 Univariate Case
  17.2.2 Multivariate Case
 17.3 Normally Distributed Response Continuous-Ordinal Predictor
 17.4 Normally Distributed Response Categorical Predictor
 17.5 Linear Regression From a Tree Node
 17.6 Linear Regression with Continuous Response (Optional Module)
  17.6.1 Methodology
  17.6.2 Stepwise Regression
 17.7 Permutation Test Methodology (Optional Module)
 17.8 Results from Linear Regression (Optional Module)
  17.8.1 Residual Spreadsheet
  17.8.2 Linear Regression Statistical Output Viewer
  17.8.3 Overall Statistics
  17.8.4 Regressor Statistics
  17.8.5 Left-Out Regressors
  17.8.6 Parameters
 17.9 Binomially Distributed Response Binary Predictor
  17.9.1 Univariate Case
  17.9.2 Multivariate Case
 17.10 Binomially Distributed Response Continuous/Ordinal Predictor
 17.11 Binomially Distributed Response Categorical Predictor
 17.12 Logistic Regression From a Tree Node
 17.13 Logistic Regression with Binomial Response (Optional Module)
  17.13.1 Methodology
  17.13.2 Stepwise Regression
 17.14 Results from Logistic Regression (Optional Module)
  17.14.1 Residual Spreadsheet
  17.14.2 Logistic Regression Statistical Output Viewer
  17.14.3 Overall Statistics
  17.14.4 Regressor Statistics
  17.14.5 Left-Out Regressors
  17.14.6 Parameters
 17.15 Caveats
 17.16 Categorical Response
 17.17 The False Discovery Rate and the Simes Method
  17.17.1 The False Discovery Rate
  17.17.2 Simes’ Method
A EULA
B REFERENCES
C BUG FIX HISTORY
 C.1 Bugs Fixed in Version 4.2.0 of Optimus RP
 C.2 Bugs Fixed in Version of 3.0.0 of Optimus RP
 C.3 Bugs Fixed in Version of 2.1.2 of Optimus RP