‹‹ Back to SVS Home

Using More Than One Dependent Variable

10.2 Using More Than One Dependent Variable


[Picture]
Figure 10.1: This spreadsheet shows two columns set to be dependent variables.

From the spreadsheet, it is possible to select more than one dependent variable, or one dependent categorical variable containing more than two categories, and carry out multivariate tree analysis.

For continuous response, only binary partitioning is possible using 0/1 binary predictors. Continuous responses may not be mixed with binary or categorical responses.

However, for multivariate binary response, where all dependent variables are 0/1 binary, the predictors may be binary, continuous, categorical, or genetic.

In addition, a categorical response containing more than two categories and/or multiple categorical responses may be used alone or with binary responses, and the predictors may also be binary, continuous, categorical, or genetic. This is because categorical responses are first broken down into binary responses before analysis proceeds.

Multivariate analogs are present for histograms and manual split windows.


[Picture]
Figure 10.2: A tree view showing the multivariate splits information in the nodes.

10.2.1 Continuous Multivariate Response

For continuous response, only binary partitioning is possible using 0/1 binary predictors. Continuous responses may not be mixed with binary or categorical responses.

Histograms and manual splits can also be created from trees based on this class of response.

The multivariate tree display shown above operates much the same as the univariate tree analysis, except that there is more than one dependent variable. For multivariate display, the means of each of the dependent variables are listed as u1, u2, and so forth. Similarly, the standard deviation, list standard error, and node mean squared error are listed separately for each dependent variable.

In addition to the means of the two variables, we use a glyph representation to display changes in response. In the glyph representation, the root node shows the means of the responses relative to one another. Subsequent nodes in the tree show the change in the response versus the root response, in terms of standard deviation units.

The dotted line is one standard deviation either side of zero change in response. In data with more pronounced effects, there may be 3 or 4 dotted lines for 3 to 4 standard deviations. For multivariate continuous response, only binary independent descriptors are implemented as predictors. In this case, the dependent variables are modeled as coming from a normal distribution, and the p-values are computed using a Hotelling T2 statistic, see section 26.2.2.

The node representation can be further customized from the options menu to present other statistics on the variables.


[Picture]
Figure 10.3: This tree shows the nodes with more information provided.

10.2.2 Binary and/or Categorical Multivariate Response

In multivariate binary and/or categorical response, all dependent variables actually used for analysis are binary, since categorical responses are broken down into binary responses for the sake of analysis.

A univariate categorical response with more than two categories is treated effectively as a multivariate (binary) response.

All predictor types are supported for multivariate binary and/or categorical response. In the plot above, where four binary responses are profiled, we see continuous, integer and binary splits.

10.2.3 Multivariate Multiple Tree Clustering

Multiple tree clustering is also possible in multiple dimensions, and is totally analogous to the unidimensional case. (See Section 9, Random Tree Generation. In the distance matrix plot, the user can choose to either view the pairwise observation distances versus one another in multivariate multiple tree space, or set one or both of the axes to be one of the multivariate responses.

10.2.4 File->Output C Code

The results variable is set up as an array to handle multivariate responses.