`skcyto`.ConsensusCluster¶

class skcyto.ConsensusCluster(k_min: int | None = None, k_max: int = 20, n_iter: int = 100, subsample_fraction: float = 0.9, random_state: int | None = None)[source]¶

Consensus clustering

Finding the optimal number of clusters is a common problem. Consensus clustering is one technique, where one repeatedly subsamples the data and tries a range of number of clusters.

This implementation currently only supports hierarchical clustering as the algorithm of choice. This is chosen because it is used in the R FlowSOM implementation, and because only hierarchical clustering guarantees that the AUC of the CDF increases when adding more clusters.

Read more in Monti et al., Machine Learning 52, 91–118 (2003)

Limitations are described in Șenbabaoğlu et al., Sci Rep. 4:6207 (2014)

Parameters:

k_minint, optional: Lower bound of number of clusters to try, by default None If None, only k_max is evaluated.
k_max: int: Upper bound of number of clusters to try, by default 20
n_iter: int: Number of iterations, by default 100
subsample_fraction: float: Fraction how many instances to sample from original data, by default 0.9.
random_state: int, RandomState instance or None: Determines random number generation for subsampling, by default None.

Attributes:

X_NDArray: Input data
k_best_int: Optimal number of clusters
cluster_AgglomerativeClustering: Fitted cluster algorithm for k_best_
labels_NDArray: Labels for each instance of X with optimal number of clusters
AUC_dict: Dictionary with CDF AUC for each evaluated k
AUC_delta_dict: Dictionary with change in AUC compared to k-1 for each k
consensus_matrix_allk_dict: Dictionary with consensus matrix for each evaluated k
cluster_allk_dict: Dictionary with fitted cluster algorithm for each evaluated k
labels_allk_dict: Dictionary with labels for each instance for each evaluated k

Raises:

ValueError: when k_max is < 2
ValueError: When k_max is < k_min

Examples

>>> from skcyto.consensuscluster import ConsensusCluster
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
...               [10, 2], [10, 4], [10, 0]])
>>> CClust = ConsensusCluster(k_max = 2).fit(X)
>>> CClust.labels_
array([1, 1, 1, 0, 0, 0])

__init__(k_min: int | None = None, k_max: int = 20, n_iter: int = 100, subsample_fraction: float = 0.9, random_state: int | None = None)[source]¶

fit(X: ndarray[Any, dtype[ScalarType]], y=None)[source]¶

Fit multiple hierarchical clustering instances, one for each candidate k.

Parameters:

XNDArray: Training data to cluster
yIgnored: Not used, present here for API consistency by convention.

Returns:

self: object: Returns the fitted instance

fit_predict(X: ndarray[Any, dtype[ScalarType]], y=None) → ndarray[Any, dtype[ScalarType]][source]¶

Fit and return sample’s best clustering assignment.

In addition to fitting, this method also returns the results of the clustering assigned with the optimal number of clusters for each sample in the training set.

Parameters:

XNDArray: Training instances to cluster
yIgnored: Not used, present here for API consistency by convention.

Returns:

NDArray: Cluster labels

skcyto.ConsensusCluster¶

`skcyto`.ConsensusCluster¶