skcyto.FlowSOM

class skcyto.FlowSOM(nodes_x: int = 10, nodes_y: int = 10, n_iter: int = 10, learning_rate: float = 0.5, neighborhood_function: str = 'gaussian', sigma: float = 1.0, activation_distance: str = 'euclidean', k_min: int | None = None, k_max: int = 20, random_state: int | None = None)[source]

FlowSOM algorithm to cluster cytometry data

Trains a FlowSOM algorithm on the given data. Follows the original R implementation as closely as possible.

See also the original publication:

Van Gassen et al., FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry A. (2015)

Parameters:
nodes_xint

Number of SOM nodes in X direction, by default 10

nodes_yint

Number of SOM nodes in Y direction, by default 10

n_iterint

Number of iterations to train SOM, by default 10

learning_ratefloat

Learning rate of the SOM, by default 0.5

neighborhood_functionstr

Neighborhood function of the SOM, by default gaussian

sigmafloat

Standard deviation of the SOM’s neighborhood function, by default 1

activation_distancestring

Distance function for SOM, by default euclidean

k_minint, optional

Lower bound for metaclustering number of clusters, by default None

k_maxint

Upper bound for metaclustering number of clusters, by default 20

random_state: int, RandomState instance or None

Determines random number generation for SOM training and ConsensusClustering subsampling, by default None.

Attributes:
X_NDArray

Training data

labels_

Labels for each instance of X after SOM assignment and metaclustering

importance_NDArray

Importances, if specified during fit

n_nodes_int

Number of SOM nodes

som_MiniSOM

Trained SOM model

som_weights_NDArray

Weights of SOM nodes. Shape (n_nodes_, n_features)

som_labels_NDArray

Assigned SOM node for each sample in X

mst_Graph

Minimal-spanning-tree of the SOM nodes

n_clusters_int

Number of metaclusters

som_metacluster_labels_NDArray

Metacluster label of each SOM node. Shape (n_nodes,)

consensus_clustering_ConsensusCluster

ConsensusCluster object used for metaclustering

Examples

>>> from skcyto import FlowSOM
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
...               [10, 2], [10, 4], [10, 0]])
>>> fsom = FlowSOM(k_min = 2, k_max = 3, nodes_x=  2, nodes_y = 2)
>>> fsom.fit(X)
FlowSOM(k_max=3, k_min=2, nodes_x=2, nodes_y=2)
__init__(nodes_x: int = 10, nodes_y: int = 10, n_iter: int = 10, learning_rate: float = 0.5, neighborhood_function: str = 'gaussian', sigma: float = 1.0, activation_distance: str = 'euclidean', k_min: int | None = None, k_max: int = 20, random_state: int | None = None)[source]
fit(X: ndarray[Any, dtype[ScalarType]], y=None, importance: ndarray[Any, dtype[ScalarType]] | None = None)[source]

Fit FlowSOM to data

Parameters:
XNDArray

Training data to cluster

yIgnored

Not used, present here for API consistency by convention.

importanceNDArray, optional

Can be used to scale individual features to give them more importance during training, by default None

Returns:
self: object

Returns the fitted instance

Raises:
ValueError

If importance is specified and its shape is not matching X

fit_predict(X: ndarray[Any, dtype[ScalarType]], y: ndarray[Any, dtype[ScalarType]] | None = None, importance: ndarray[Any, dtype[ScalarType]] | None = None) ndarray[Any, dtype[ScalarType]][source]

Fit and return the metacluster labels of each sample

Features can be weighted by a user-specified importance, if desired.

Parameters:
XNDArray

Training instances to cluster

yNDArray, optional

Not used, present here for API consistency by convention.

importanceNDArray, optional

Weights to scale each feature by during analysis, by default None

Returns:
NDArray

Metacluster labels

get_metacluster_CV() dict[source]

Calculates coefficient of variation for all channels of each metacluster

Returns:
dict

Dictionary with CV per channel and metacluster

get_metacluster_MFI() dict[source]

Calculates mean fluorescence intensity of each metacluster

Returns:
dict

Dictionary with mean intensity per metacluster

get_metacluster_counts() dict[source]

Calculates cell counts per metacluster

Returns:
dict

Dictionary with cell count per metacluster

get_metacluster_percentages() dict[source]

Calculates percentage of total cells assigned to each metacluster

Returns:
dict

Dictionary cell percentage per metacluster

get_outliers(X: ndarray[Any, dtype[ScalarType]], n_mad: float = 5) ndarray[Any, dtype[ScalarType]][source]

Determine which cells are outliers to their assigned SOM node, based on MAD.

For each cell, the euclidean distance to its SOM node is calculated. For each SOM node, the median and mean absolute deviaton from the median is calculated. A cell is considered an outlier, if it is further away than median + n_mad * mad away from its assigned node center.

Parameters:
XNDArray

Array with measured cells

n_madfloat, optional

Factor how many times of MAD a cell can be away from the SOM center, by default 5

Returns:
NDArray

Boolean array indicating whether a cell is an outlier.

get_som_CV() dict[source]

Calculates coefficient of variation for all channels of each SOM node

If a node has no cells associated to it, it is not in the returned dict.

Returns:
dict

Dictionary with CV per channel and SOM node

get_som_MFI() dict[source]

Calculates mean fluorescence intensity of each SOM node

If a node has no cells associated to it, it is not in the returned dict.

Returns:
dict

Dictionary with mean intensity per SOM node

get_som_counts() dict[source]

Calculates cell counts per SOM node

If a node has no cells associated to it, it is not in the returned dict.

Returns:
dict

Dictionary with cell count per SOM node

get_som_percentages() dict[source]

Calculates percentage of total cells assigned to each SOM node

If a node has no cells associated to it, it is not in the returned dict.

Returns:
dict

Dictionary cell percentage per SOM node

predict(X: ndarray[Any, dtype[ScalarType]]) ndarray[Any, dtype[ScalarType]][source]

Predicts the metacluster label for each sample in X

Parameters:
XNDArray

Measurements

Returns:
NDArray

Metacluster labels

predict_som(X: ndarray[Any, dtype[ScalarType]]) ndarray[Any, dtype[ScalarType]][source]

Predicts only the SOM node label for each instance in X

Parameters:
XNDArray

Measurements

Returns:
NDArray

SOM node labels