p-values are calculated for each branch of the cluster dendrogram to indicate the stability of a specific partition.
clusterBoot
will yield the same clusters as the cluster()
function (i.e. standard hierarchical clustering) with
additional p-values. Two kinds of p-values are reported: bootstrap probabilities (BP) and approximately unbiased
(AU) probabilities (see Details section for more information).
Usage
clusterBoot(
x,
along = 1,
align = TRUE,
dmethod = "euclidean",
cmethod = "ward.D",
p = 2,
nboot = 1000,
r = seq(0.8, 1.4, by = 0.1),
seed = NULL,
...
)
Arguments
- x
grid object
- along
Along which dimension to cluster. 1 = constructs, 2= elements.
- align
Whether the constructs should be aligned before clustering (default is
TRUE
). If not, the grid matrix is clustered as is. See Details section for more information.- dmethod
The distance measure to be used. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". Any unambiguous substring can be given. For additional information on the different types type
?dist
.- cmethod
The agglomeration method to be used. This should be (an unambiguous abbreviation of) one of
"ward.D"
,"ward.D2"
,"single"
,"complete"
,"average"
,"mcquitty"
,"median"
or"centroid"
.- p
Power of the Minkowski metric. Not yet passed on to pvclust!
- nboot
the number of bootstrap replications. The default is
1000
.- r
numeric vector which specifies the relative sample sizes of bootstrap replications. For original sample size \(n\) and bootstrap sample size \(n'\), this is defined as \(r=n'/n\).
- seed
Random seed for bootstrapping. Can be set for reproducibility (see
set.seed()
). Usually not needed.- ...
Arguments to pass on to
pvclust::pvclust()
.
Value
A pvclust object as returned by the function pvclust::pvclust()
Details
In standard (hierarchical) cluster analysis the question arises which of the identified structures are significant
or just emerged by chance. Over the last decade several methods have been developed to test structures for
robustness. One line of research in this area is based on resampling. The idea is to resample the rows or columns of
the data matrix and to build the dendrogram for each bootstrap sample (Felsenstein, 1985). The p-values indicates
the percentage of times a specific structure is identified across the bootstrap samples. It was shown that the
p-value is biased (Hillis & Bull, 1993; Zharkikh & Li, 1995). In the literature several methods for bias correction
have been proposed. In clusterBoot
a method based on the
multiscale bootstrap is used to derive corrected (approximately
unbiased) p-values (Shimodaira, 2002, 2004). In conventional bootstrap analysis the size of the bootstrap sample is
identical to the original sample size. Multiscale bootstrap varies the bootstrap sample size in order to infer a
correction formula for the biased p-value on the basis of the variation of the results for the different sample
sizes (Suzuki & Shimodaira, 2006).
align: Aligning will reverse constructs if necessary to yield a maximal similarity between constructs. In a first step the constructs are clustered including both directions. In a second step the direction of a construct that yields smaller distances to the adjacent constructs is preserved and used for the final clustering. As a result, every construct is included once but with an orientation that guarantees optimal clustering. This approach is akin to the procedure used in FOCUS (Jankowicz & Thomas, 1982).
References
Felsenstein, J. (1985). Confidence Limits on Phylogenies: An Approach Using the Bootstrap. Evolution, 39(4), 783. doi:10.2307/2408678
Hillis, D. M., & Bull, J. J. (1993). An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis. Systematic Biology, 42(2), 182-192.
Jankowicz, D., & Thomas, L. (1982). An Algorithm for the Cluster Analysis of Repertory Grids in Human Resource Development. Personnel Review, 11(4), 15-22. doi:10.1108/eb055464.
Shimodaira, H. (2002) An approximately unbiased test of phylogenetic tree selection. Syst, Biol., 51, 492-508.
Shimodaira,H. (2004) Approximately unbiased tests of regions using multistep- multiscale bootstrap resampling. Ann. Stat., 32, 2616-2614.
Suzuki, R., & Shimodaira, H. (2006). Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics, 22(12), 1540-1542. doi:10.1093/bioinformatics/btl117
Zharkikh, A., & Li, W.-H. (1995). Estimation of confidence in phylogeny: the complete-and-partial bootstrap technique. Molecular Phylogenetic Evolution, 4(1), 44-63.