The `part()` function performs clustering using the PART algorithm (Partitioning Algorithm based on Recursive Thresholding). The method begins with a global clustering of the data and then recursively evaluates potential subclusters. At each recursive step, it applies the Gap statistic and a separation threshold (`minDist`) to determine whether a cluster should be further split. The procedure is designed to detect both major cluster structure and finer substructure while guarding against spurious splits.
The function returns the estimated number of clusters, cluster labels for all observations, and a list of samples flagged as potential outliers.
part(X, Kmax = 10, minSize = 8, minDist = NULL, cl.lab = NULL, ...)A numeric data matrix with observations in rows and variables in columns. This is the dataset to be clustered.
The maximum number of clusters to consider during the *initial*, global PART run. Recursive subdivision uses its own upper limit (`Kmax.rec`) provided via `...`. Default is 10.
The minimum number of observations allowed in a cluster in order for that cluster to be considered for further recursive splitting. Clusters smaller than `minSize` are treated as terminal. Default is 8.
Minimum required separation (in dendrogram height units) between subclusters for a recursive split to be accepted. If `NULL`, a separation threshold is computed automatically using `get.threshold()` based on the user-specified value of `q` (passed via `...`). Default is `NULL`.
Optional list of precomputed cluster label vectors. If supplied, the list should contain the cluster assignments for `k = 1, …, Kmax`, typically produced by hierarchical clustering. If `NULL`, cluster labels are generated internally. Default is `NULL`.
Additional arguments controlling the PART algorithm. These override the internal defaults defined in the function. Common parameters include:
`q` — proportion of dendrogram height used to compute the stopping threshold when `minDist` is not supplied. Default: `0.25`.
`Kmax.rec` — maximum number of clusters evaluated at each recursive splitting step. Default: `5`.
`B` — number of bootstrap samples used in the Gap statistic. Default: `100`.
`ref.gen` — method used to generate reference datasets for the Gap statistic (“PC” by default).
`dist.method` — distance metric used for hierarchical clustering.
`cl.method` — clustering method (“hclust” or “kmeans”).
`linkage` — linkage method if hierarchical clustering is used.
`cor.method` — correlation type if correlation distance is selected.
`nstart` — number of random initializations if k-means is used.