The Edit Menu
Using the Edit menu we can find content, create subsets within the current spreadsheet and spin off new spreadsheets.
6.4.1 Row
|
6.4.1.1 Activate All
The easiest way to activate all rows is to use this command.
6.4.1.2 Invert Selection
|
There may be times when you want to run two exclusive groups of data from the same data set. Using the menus Edit->Row->Invert Selection flips the selected and de-selected records. By inverting, all of the de-selected records become selected and all of the selected records become de-selected. The mechanism of using the Select Row Subset and Invert Row Selection routines is often used in building a training set, and then inverting to the holdout or test set to validate the model.
6.4.1.3 Select Subset
|
The menus choices Edit->Row->Select Subset give you several ways to automatically pick subset records (rows). We can
choose:
Random fraction to specify what percentage of the records to use (default = 0.5);
Random selection size to specify the number of records (default = all of them),
First N items to specify the first N records to use from the spreadsheet (default = all), and
Reset random seed to change the random seed although the default of 1 will do in most cases. Fig. 6.15 shows a choice of
randomly selecting 50% of the records.
|
Resetting the random seed to 1 enables you to pick the same random subset as would have occurred if you just started the program up and no random number generation had taken place.
500 Random Records
|
The result of creating a Random fraction - 0.5 subset on a dataset with 1000 records is a subset of 500 randomly selected records These 500 records can be analyzed with less concern of a bias in their selection.
6.4.1.4 Subset Spreadsheet
|
You can create a new spreadsheet from the selected records (rows) of another. If you look at the illustration and the Patient numbers in the second label column at left you can see this spreadsheet contains some, but not all, of the patients in the original spreadsheet.
This (row) subset spreadsheet displays in a separate viewer, allowing you to close the viewer of the original spreadsheet. However, if you delete the navigator node of the original spreadsheet, this subset spreadsheet and its navigator node will also be deleted.
You can use the subset spreadsheet activity to split a spreadsheet’s data set into several smaller data sets. After selecting the records you wish to place into each subset spreadsheet, you create that subset spreadsheet and then use File->Save As to create a new file (for example, save as a .txt, or .csv file) for importing into another project.
6.4.2 Column
|
6.4.2.1 Activate All
Use Activate All in data sets to activate all columns.
6.4.2.2 Inactivate All
If you have many columns, it is easier to deactivate (inactivate) all of them and then just activate the few you wish to analyze. You will, of course, have to select a column for the dependent variable and activate one or more independent variables.
This command as the click and shift-click across all the column headings.
6.4.2.3 Invert Selection
For convenience, there is also a way to invert the selected columns. Selecting Edit->Column->Invert Column Selection will activate all inactive columns, and inactivate all active columns. Using this menu item will also clear dependent column status.
6.4.2.4 Select Columns by Chromosome/Region Range
|
If you are using a genetically mapped spreadsheet, you can use the map information to quickly select columns within one or more desired ranges of chromosome(s), region(s), or gene(s). Use Select columns by chromosome/region range, and the selection dialog shown in Fig. 6.20 will pop up. All columns within the selected ranges will be activated, and the others will be inactivated. The exception is that dependent columns will be left still dependent.
This dialog is also useful simply for locating column numbers within these regions. Press “Cancel” when finished viewing and scroll to the desired column in the spreadsheet.
|
6.4.2.5 Find
|
The menu choice Edit->Column->Find allows you to search for a matching column name. The found column is placed as the first column (left) in the spreadsheet. Type in the name of the column you are looking for and click the OK button. HelixTree makes the best possible match ignoring case and will make partial matches. For example, searching on “s” would find s, S, scented, or Scented, in whichever column a match is first found. Searching on “sc” would find scented or Scented.
6.4.2.6 Subset Spreadsheet
It is also possible to create a new spreadsheet using a subset of columns from an existing spreadsheet. To do this, select the Edit->Column->Subset Spreadsheet menu item. Activating this menu item will create a new spreadsheet containing all of the active columns from the original spreadsheet, but excluding all columns which were inactivated. Only rows from the current spreadsheet will be present in the new spreadsheet, e.g., if the original spreadsheet is a row subset spreadsheet of another spreadsheet, only the rows that are present in the subset spreadsheet will be used. Also, the current sort order from the original spreadsheet will be the default sort order of the new spreadsheet.
Note: If the original spreadsheet was marker mapped, then the same marker map will be applied to the new spreadsheet. Similarly, if the original spreadsheet was a marker map, pedigree, or phenotype spreadsheet, the new spreadsheet will also carry that classification.