‹‹ Back to SVS Home
Observation Distance Matrix Overview
12.1 Observation Distance Matrix Overview
The Observation Distance Matrix is a distance matrix of each observation in the data set with every other observation. The distance matrix is based upon the idea that when two observations end up together in a small subset deep within a tree, the descriptors that drive their response are shared and, hence, the observations are similar in model space.
Within a given tree, the distance between observation (i,j) is given by the total number of observations in the deepest node in the tree where the observations are together, divided by the total number of observations at the root of the tree. The overall distance between observation (i,j) is the average of the distances calculated for this pair over all the trees in a multiple tree model.
Consider an example where there are 1000 observations in a tree, and observations i and j are split apart into daughter nodes beneath the root node. In this case, the deepest node that i and j appear together is the root node itself, so the distance between them is 1000/1000=1. Instead, suppose that i and j ended up in a node near the bottom of the tree that contained 10 observations. Then the distance would be: d(i,j)=10/1000=0.01.
Note that this similarity metric contrasts with other similarity metrics, in that the specific features that drive the response are the ones used to compute the distance between the observations.
12.1.1 Creating an Observation Distance Matrix
From the Navigator Window view, click on a project or File->Open New Project and import the data to analyze. From the spreadsheet view, click Analysis->Create a Multiple Tree Model.
A Random Tree Creation window opens. Check or change any tree creation variables and Click Go.
After the random trees are created a Multitree Model window opens (Fig. 12.1) displaying a list of the random trees.
|
|
Figure 12.2 displays the menus leading to generating an Observation Distance Matrix. From the Observations->Plot Obs.
Dist. Matrix menus, the following options are available:
Unsorted The symmetric matrix of observations is displayed ordered in the same sequence as they appear in the
spreadsheet from which they came. The first observation is at the lower left, and the last observation is at the upper
right.
Sorted by 1st Principal Component The matrix is sorted in ascending order by the 1st principal component of the distance
matrix. This is a simple clustering approach. The 1st principal component is the eigenvector corresponding to the largest
eigenvalue of the distance matrix. Because calculating this first principal component takes O(n3) operations
where n is the number of observations, you will not want to create this plot for more than a few thousand
observations.
Sorted by Similarity to One Observation This option is most useful when we are interested in seeing which observations are
most similar to given observation. For instance, a scientist may wish to see the 100 patients most similar to a given patient of
interest. A dialog pops up giving a list of all the observations, from which you can choose one. Then the distance matrix is
displayed showing the given observation in the lower left and the k most similar observations from most similar to least
similar are shown ascending along the diagonal.
|
12.1.2 The Observation Distance Matrix
|
Figure 12.4 shows the distance matrix created from the 1000 trees built from GSIM.ghd.
Following is a synopsis of the use and meaning of the pull-down menus and buttons in the Distance Matrix Plot window:
12.1.3 Set Axes
The top two pull-down menus allow you to set the axes to either Distance or Response. The Response axis plots the dependent variable, so the actual wording will reflect the column heading in the spreadsheet. For Multivariate trees, several Response headings will be available. The default settings for both axes are set to Distance so the distance matrix is first displayed symmetrically. It is also possible to select the Response as one axis, enabling the location of clusters that are not only similar, but also have a desired response range.
12.1.4 Stop Calculation/Stop Refresh and Restore Calculation/Restore Refresh
In Fig. 12.4 , the top button (shown saying Refresh) says Stop Calculation only during the initial calculation of the matrix. To stop the calculation, press the Stop Calculation button. Pressing the Stop Calculation button causes the button to change to Resume Calculation. To continue the interrupted computation of the distance matrix click Resume Calculation.The resulting calculation is stored in an internal matrix for faster re-plotting.
After the initial calculation is finished, the button shows Refresh as in the figure above. If you click the Refresh button, the matrix is recalculated. During the recalculation the button face changes to Stop Refresh. To stop the recalculation, press the Stop Refresh. Pressing the Stop Refresh button causes the button to change to Resume Refresh. Click Resume Refresh to continue the interrupted computation of the distance matrix.
12.1.5 Copy to Clipboard
The Copy to Clipboard button copies any table of numbers appearing at lower left to the clipboard for pasting into other applications. This table is updated whenever the mouse is clicked on the plot or when the arrow keys are used to move from point to point on the plot. If no table is present, then the button is greyed-out as in Fig. 12.4.
12.1.6 Creating a Spreadsheet or Tree view from the Matrix Plot
|
The center drop down menu in the lower right corner allows for two actions to be taken by left clicking and dragging the mouse diagonally over a region of the matrix plot: Left click and drag for spreadsheet or Left click and drag for tree. Depending on your choice a spreadsheet or a tree, a window opens containing the observations for the defined region. See Section 12.2.1 for detailed examples.
12.1.7 Zoom Mode
By Right-clicking and dragging diagonally across a region of matrix plot, a new window opens with a close-up plot of the defined region. A file tab appears at the bottom of the window labelling the zoomed-in view. To return to the original view, either click on the Distance Matrix Plot file tab or delete the zoomed-in window by clicking on the “X” in the lower-right corner.
12.1.8 Modify Color Scaling
The color guide at the right of and above the plot defines the mapping between the colors and the distance plot. It is possible to narrower the window of color by clicking and dragging over a region of the color guide.
If you Shift-click and drag over a region of one axis, both axes change symmetrically (i.e., use the same measure) and have an identical range.
To undo or reset the color scales, both pull down menus for Distance/Response allow you to either: reset this scale, reset symmetrically, or reset all scales
12.1.9 Effect of Clicking on the Plot
Click on a point within, or on a patient along the bottom or side of the plot. A table of statistics for the patient pair is generated in the lower left corner of the window (as seen in Fig. 12.6 .
In some plots, more than one patient fits within one pixel. In this case, a star will appear in the lower left corner display after the patient number. The arrow keys on the keyboard may be used to maneuver through the values. (When this occurs, the values are averaged over the pixel. Right-mouse-button zooming is recommended to get a better visualization under this circumstance.)
12.1.10 Color Drop Down Menu
The bottom drop down menu allows you to exercise your artistic side and choose from the following color combinations to be
applied to the Distance Matrix:
Multi-color displays the matrix with a full rainbow color scheme.
Blue-red is the default display and creates the matrix in a blue and red color continuum.
white-red is a colorful and yet conservatively monotonic color combination.
Black-white is the least likely presentation of the Distance Matrix to fool someone with color blindness.
OK, we said exercise your artistic side, not necessarily fulfill all avenues of expression, but we apologize for any artistic
frustration this menu may have engendered.