‹‹ Back to SVS Home

6.3 Navigating the Spreadsheet Menus

6.3 Navigating the Spreadsheet Menus

6.3.1 File Menu


[Picture]
Figure 6.6: The spreadsheet File menu

The File menu has eight menu items: Save As, Save As Comma-Delimited Text File, Save As DSF, Join Spreadsheet on Row Labels (if applicable), Join Spreadsheets With Uneven Row Numbers, Join Spreadsheets by Sorting, Import a Legacy Tree Model and Close.

6.3.1.1 Save As - Exporting Data

The menu choice File->Save as opens an Export Data file dialog. The blank areas can not be filled in (they are place holders to show the selected file and format once chosen). Click the textbfBrowse button. This opens a Save As dialog window, entitled Choose a file and format to export to, that has the Save as type setting defaulting to the CSV file type. (For more information on CSV files see the topic Data Types.) The Save as type drop down menu allows the file to be saved in any number of popular data formats. When the file has been named and the file format designated, click the Save button. This returns you to the Export Data window Fig. 6.7 where you can change the Delimiter from Comma to Space or Other. When everything is set, click the Export button.


[Picture]
Figure 6.7: The Export Data window.

6.3.1.2 Save As Comma-Delimited Text File

Allows the saving of the spreadsheet as an ASCII, comma delimited (CSV) file. This menu choice opens a Save As dialog window. Use this dialog to navigate and name the new CSV file. This “.csv” file may then be imported by other spreadsheet programs.

6.3.1.3 Save As DSF File

Allows the saving of the spreadsheet as an Dataset Storage Format (DSF) file. This menu choice opens a Save As dialog window. Use this dialog to navigate and name the new DSF file. This “.dsf” file may then be imported by other users of Optimus RP. See 4.3.4 for a discussion of DSF as a storage format.

6.3.1.4 Join Spreadsheet on Row Labels

This feature allows two spreadsheets (data sets) with the same row labels (such as patient IDs) but different data in the columns to be merged. The columns of the second data set are appended to the right of the columns in the original data set. The merged spreadsheet creates a child Navigator Node under the first spreadsheet labeled "Joined spreadsheet".

From the spreadsheet view of the data to be added to, click the File->Join Spreadsheet on row labels menu option. A Navigator Widget Chooser dialog box opens with a list of all the available spreadsheets in the project file. Click on the spreadsheet to be joined to the opened spreadsheet. A window opens describing the success of the merger– click the OK button. The program opens the merged spreadsheet.

6.3.1.5 Join Spreadsheets With Uneven Row Numbers

This feature allows you to merge two spreadsheets which have common row labels without exluding rows which have labels that appear in only one of the spreadsheets. Upon running the join, the columns of the second data set are appended to the right of the columns from the original data set, with the following exceptions: Columns with names that appear in both data sets will be “merged” so that any would-be missing values in the columns from the original data set will be filled in with the corresponding values from the column in the second data set if possible. The column from the second data set is then discarded. Additionally, spreadsheets which do not have row labels that are present in the other spreadsheet in the join will have missing values inserted in all columns for those rows.

From the spreadsheet you wish to add to, click the File->Join Spreadsheets With Uneven Row Numbers menu option. This will open a Navigator Widget Chooser dialog box with list of all availible spreadsheets in the current project. Click on the spreadsheet to be joined to the original spreadsheet, and click OK. A window will open, reporting the success of the join, as well as the number of rows which were successfully joined. Click OK to close the message, and the program will open the newly merged spreadsheet.

6.3.1.6 Join Spreadsheets by Sorting

This feature allows joining of any two spreadsheets (data sets) regardless of row labels. The columns of the second data set are appended to the right of the columns in the original data set. The number of rows joined will be the minimum of the number of rows in the two datasets. If the rows are not in a matching order they can be reordered using the join dialog that is part of this feature.

From any spreadsheet viewer, click the File->Join Spreadsheet by Sorting menu option.

A Navigator Widget Chooser dialog box opens with a list of all the available spreadsheets in the project file listed. Click on the spreadsheet to be joined to the original spreadsheet then click the OK button. This will bring up the sorting dialog window show below. Note that the number of rows in the joined spreadsheet will be the minimum of the two selected spreadsheet. For example if one sheet has 50 rows and the second has 100 the merged spreadsheet will have 50 rows.

This window shows the original spreadsheet on the left and the selected spreadsheet on the right. The row numbers are listed by default and if there is a label column then it is also displayed. There is a drop-down list at the top of each spreadsheet that contains the first five columns of the spreadsheet in case you would like to order the rows using a different column. Selecting one of the drop down columns will add the column to the display box below the drop-down so that you can use any of the sorting tools on that column.

Using the column that you want to sort you can either sort the whole column in ascending or descending order by clicking on the title bar for that column or you can move individual rows up or down using the Move Up and Move Down buttons. These operation are done on both spreadsheet until the rows on the left and right are lined up in the order that you would like them to be joined.

The final radio button at the bottom indicates where the new dataset and spreadsheet will be put in the Project Navigator window. By default a new dataset will be created as a child of the root project node. Optionally you can have the new dataset as a child of the original spreadsheet. When everything is in order click the Begin Joining button.

When the join is complete a window opens describing the success of the merger, click the OK button. The program then opens the merged spreadsheet for your viewing.

6.3.1.7 Import a Legacy Tree Model

This option opens a file browser where you can select a previously created .ght file. The tree file you are importing must have the same number of columns, the same data types for each column and the same dependent variable. After importing the tree if the values displayed do not match in all three columns it indicates that the original data does not exactly match the current spreadsheet.

6.3.1.8 Closing the File

The File->Close menu selection closes the data file and its associated spreadsheet. A dialog window prompts for the saving of unsaved data.

6.3.2 Edit Menu

Using the Edit menu we can find content, create subsets within the current spreadsheet and spin off new spreadsheets.


[Picture]
Figure 6.8: The Edit menu allows for finding and selecting elements in the spreadsheet.

6.3.2.1 Select Row Subset


[Picture]
Figure 6.9: A spreadsheet view showing the Edit->Select Row Subset menu highlighted.

The menus choices Edit->Select Row Subset give you several ways to automatically pick subset records (rows). We can choose:
Random fraction to specify what percentage of the records to use (default = 0.5);
Random selection size to specify the number of records (default = all of them),
First N items to specify the first N records to use from the spreadsheet (default = all), and
Reset random seed to change the random seed although the default of 1 will do in most cases. Fig. 6.10 shows a choice of randomly selecting 50% of the records.


[Picture]
Figure 6.10: The dialog window showing the subsets that can be selected

Resetting the random seed to 1 enables you to pick the same random subset as would have occurred if you just started the program up and no random number generation had taken place.

500 Random Records


[Picture]
Figure 6.11: Spreadsheet view of randomly deactivated rows

The result of creating a Random fraction - 0.5 subset on a dataset with 1000 records is a subset of 500 randomly selected records These 500 records can be analyzed with less concern of a bias in their selection.

6.3.2.2 Activate All Rows

The easiest way to activate all rows is to use this command.

6.3.2.3 Inverting the Records (Rows) Selected


[Picture]
Figure 6.12: Notice that the row buttons on the left are inverted from Fig. 6.11

There may be times when you want to run two exclusive groups of data from the same data set. Using the menus Edit->Invert row selection flips the selected-de selected records. By inverting, all of the de-selected records become selected and all of the selected records become de-selected. The mechanism of using the Select Row Subset and Invert Row Selection routines is often used in building a training set, and then inverting to the holdout or test set to validate the model.

6.3.2.4 Inverting the Columns Selected

For convenience, there is also a way to invert the selected columns. Selecting Edit->Invert Column Selection will activate all inactive columns, and inactivate all active columns. Using this menu item will also clear dependent column status.

6.3.2.5 Row Subset Spreadsheet


[Picture]
Figure 6.13: A subset spreadsheet.

You can create a new spreadsheet from the selected records (rows) of another. If you look at the illustration and the Patient numbers in the second label column at left you can see this spreadsheet contains some, but not all, of the patients in the original spreadsheet.

This (row) subset spreadsheet displays in a separate viewer, allowing you to close the viewer of the original spreadsheet. However, if you delete the navigator node of the original spreadsheet, this subset spreadsheet and its navigator node will also be deleted.

You can use the subset spreadsheet activity to split a spreadsheet’s data set into several smaller data sets. After selecting the records you wish to place into each subset spreadsheet, you create that subset spreadsheet and then use File->Save As to create a new file (for example, save as a .txt, or .csv file) for importing into another project.

6.3.2.6 Column Subset Spreadsheet

It is also possible to create a new spreadsheet using a subset of columns from an existing spreadsheet. To do this, select the Edit->Column Subset Spreadsheet menu item. Activating this menu item will create a new spreadsheet containing all of the active columns from the original spreadsheet, but excluding all columns which were inactivated. Only rows from the current spreadsheet will be present in the new spreadsheet, e.g., if the original spreadsheet is a row subset spreadsheet of another spreadsheet, only the rows that are present in the subset spreadsheet will be used. Also, the current sort order from the original spreadsheet will be the default sort order of the new spreadsheet.

6.3.2.7 Find Column Search Tool


[Picture]
Figure 6.14: This dialog allows you to designate which column to search for.

The menu choice Edit->Find column allows you to search for a matching column name. The found column is placed as the first column (left) in the spreadsheet. Type in the name of the column you are looking for and click the OK button. Optimus RP makes the best possible match ignoring case and will make partial matches. For example, searching on “s” would find s, S, scented, or Scented, in whichever column a match is first found. Searching on “sc” would find scented or Scented.

6.3.2.8 Inactivate All Columns/Activate All Columns

If you have many columns, it is easier to deactivate (inactivate) all of them and then just activate the few you wish to analyze. You will, of course, have to select a column for the dependent variable and activate one or more independent variables.

Similarly, use Activate all columns in data sets with various columns set to inactive and you wish to have all or most activated.

These two commands are the same as the click and shift-click across all the column headings.

6.3.3 Analysis Menu

The Analysis menu is where we do our serious work in Optimus RP. Up to this point in this documentation we have been acquiring, converting and manipulating data to get it ready for recursive partitioning analysis.


[Picture]
Figure 6.15: The Analysis menu showing the various tools.

6.3.3.1 Interactive Tree Analysis

The menu choice Analysis-Interactive Tree Analysis opens up the tree view. This is covered in depth in Chapter 7, Interactive Tree Analysis.

6.3.3.2 Create a Multiple-Tree Model

This menu choice will create a forest of random trees for analysis. This is covered in depth in Chapter 9, Random Tree Generation.

6.3.3.3 Apply a Tree Model

This menu choice allows you to make predictions using a tree model. Using tree models for predictions is demonstrated in Chapter 8, Prediction Recipes.

6.3.4 Help Menu

The menu choice Help->Spreadsheet Help opens the on-line manual for help on using the menus on the spreadsheet.