Compute Within-Cluster Dispersion for Multiple Clusterings

`findW()` calculates the within-cluster dispersion \(W_k\) for a distance matrix `dX` and cluster labels for different numbers of clusters. ***** Within-cluster dispersion tells us how ‘tight’ each cluster is. Tighter clusters are better. We use it to decide if we should split clusters further, and to figure out the optimal number of clusters. *****

For each `k = 1, …, K`, the function sums the pairwise distances between points within each cluster and scales by the cluster size:

When `k = 1`, `W_1` is simply the sum of all pairwise distances divided by twice the number of points.

This function is typically used within Gap statistic computations.

findW(dX, K, cl.lab)

Arguments

dX: A distance object (`dist`) or a symmetric matrix of pairwise distances between observations.
K: Maximum number of clusters to compute dispersions for.
cl.lab: A list of integer vectors, where `cl.lab[[k]]` contains cluster labels for `k` clusters.

Value

The within cluster distribution