Multiscale bootstrap cluster analysis. — clusterBoot • OpenRepGrid

p-values are calculated for each branch of the cluster dendrogram to indicate the stability of a specific partition. clusterBoot will yield the same clusters as the cluster() function (i.e. standard hierarchical clustering) with additional p-values. Two kinds of p-values are reported: bootstrap probabilities (BP) and approximately unbiased (AU) probabilities (see Details section for more information).

Usage

clusterBoot(
  x,
  along = 1,
  align = TRUE,
  dmethod = "euclidean",
  cmethod = "ward.D",
  p = 2,
  nboot = 1000,
  r = seq(0.8, 1.4, by = 0.1),
  seed = NULL,
  trim = NA,
  ...
)

Arguments

x: grid object
along: Along which dimension to cluster. 1 = constructs, 2= elements.
align: Whether the constructs should be aligned before clustering (default is TRUE). If not, the grid matrix is clustered as is. See Details section for more information.
dmethod: The distance measure to be used. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". Any unambiguous substring can be given. For additional information on the different types type ?dist.
cmethod: The agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median" or "centroid".
p: Power of the Minkowski metric. Not yet passed on to pvclust!
nboot: the number of bootstrap replications. The default is 1000.
r: numeric vector which specifies the relative sample sizes of bootstrap replications. For original sample size \(n\) and bootstrap sample size \(n'\), this is defined as \(r=n'/n\).
seed: Random seed for bootstrapping. Can be set for reproducibility (see set.seed()). Usually not needed.
trim: the number of characters a construct is trimmed to. If NA (default), no trimming is done. Trimming simply saves space when displaying the output.
...: Arguments to pass on to pvclust::pvclust().

Value

A pvclust object as returned by the function pvclust::pvclust()

Details

In standard (hierarchical) cluster analysis the question arises which of the identified structures are significant or just emerged by chance. Over the last decade several methods have been developed to test structures for robustness. One line of research in this area is based on resampling. The idea is to resample the rows or columns of the data matrix and to build the dendrogram for each bootstrap sample (Felsenstein, 1985). The p-values indicates the percentage of times a specific structure is identified across the bootstrap samples. It was shown that the p-value is biased (Hillis & Bull, 1993; Zharkikh & Li, 1995). In the literature several methods for bias correction have been proposed. In clusterBoot a method based on the multiscale bootstrap is used to derive corrected (approximately unbiased) p-values (Shimodaira, 2002, 2004). In conventional bootstrap analysis the size of the bootstrap sample is identical to the original sample size. Multiscale bootstrap varies the bootstrap sample size in order to infer a correction formula for the biased p-value on the basis of the variation of the results for the different sample sizes (Suzuki & Shimodaira, 2006).

align: Aligning will reverse constructs if necessary to yield a maximal similarity between constructs. In a first step the constructs are clustered including both directions. In a second step the direction of a construct that yields smaller distances to the adjacent constructs is preserved and used for the final clustering. As a result, every construct is included once but with an orientation that guarantees optimal clustering. This approach is akin to the procedure used in FOCUS (Jankowicz & Thomas, 1982).

References

Felsenstein, J. (1985). Confidence Limits on Phylogenies: An Approach Using the Bootstrap. Evolution, 39(4), 783. doi:10.2307/2408678

Hillis, D. M., & Bull, J. J. (1993). An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis. Systematic Biology, 42(2), 182-192.

Jankowicz, D., & Thomas, L. (1982). An Algorithm for the Cluster Analysis of Repertory Grids in Human Resource Development. Personnel Review, 11(4), 15-22. doi:10.1108/eb055464.

Shimodaira, H. (2002) An approximately unbiased test of phylogenetic tree selection. Syst, Biol., 51, 492-508.

Shimodaira,H. (2004) Approximately unbiased tests of regions using multistep- multiscale bootstrap resampling. Ann. Stat., 32, 2616-2614.

Suzuki, R., & Shimodaira, H. (2006). Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics, 22(12), 1540-1542. doi:10.1093/bioinformatics/btl117

Zharkikh, A., & Li, W.-H. (1995). Estimation of confidence in phylogeny: the complete-and-partial bootstrap technique. Molecular Phylogenetic Evolution, 4(1), 44-63.

Examples

if (FALSE) { # \dontrun{

# pvclust must be loaded
library(pvclust)

# p-values for construct dendrogram
s <- clusterBoot(boeker)
plot(s)
pvrect(s, max.only = FALSE)

# p-values for element dendrogram
s <- clusterBoot(boeker, along = 2)
plot(s)
pvrect(s, max.only = FALSE)
} # }