`getReferenceW()` generates multiple reference datasets and calculates the within-cluster dispersion \(W_b\) for each, across 1 to Kmax clusters.

Reference datasets can be generated either: - Uniformly within the original data range for each variable (`ref.gen="uniform"`) - Using a principal components transformation to preserve variance structure (`ref.gen="PC"`)

This function is used in the computation of the Gap statistic to compare the observed clustering dispersion to what is expected under a null reference.

getReferenceW(X, Kmax, B, ref.gen, ...)

Arguments

X

Numeric data matrix (observations × variables) to generate reference datasets from.

Kmax

Maximum number of clusters to compute W for.

B

Number of reference datasets to generate.

ref.gen

Reference generation method: - `"PC"`: uses PCA-based transformation for preserving variance structure - any other value: generates uniform reference data per variable.

...

Additional arguments passed on to other functions (e.g., `dist.method` for distance calculation, `cl.method` for clustering method, `linkage`, `cor.method`, `nstart`).

Value

calculated Wb