`skcyto`.FlowSOM¶

class skcyto.FlowSOM(nodes_x: int = 10, nodes_y: int = 10, n_iter: int = 10, learning_rate: float = 0.5, neighborhood_function: str = 'gaussian', sigma: float = 1.0, activation_distance: str = 'euclidean', k_min: int | None = None, k_max: int = 20, random_state: int | None = None)[source]¶

FlowSOM algorithm to cluster cytometry data

Trains a FlowSOM algorithm on the given data. Follows the original R implementation as closely as possible.

See also the original publication:

Van Gassen et al., FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry A. (2015)

Parameters:

nodes_xint: Number of SOM nodes in X direction, by default 10
nodes_yint: Number of SOM nodes in Y direction, by default 10
n_iterint: Number of iterations to train SOM, by default 10
learning_ratefloat: Learning rate of the SOM, by default 0.5
neighborhood_functionstr: Neighborhood function of the SOM, by default gaussian
sigmafloat: Standard deviation of the SOM’s neighborhood function, by default 1
activation_distancestring: Distance function for SOM, by default euclidean
k_minint, optional: Lower bound for metaclustering number of clusters, by default None
k_maxint: Upper bound for metaclustering number of clusters, by default 20
random_state: int, RandomState instance or None: Determines random number generation for SOM training and ConsensusClustering subsampling, by default None.

Attributes:

X_NDArray: Training data
labels_: Labels for each instance of X after SOM assignment and metaclustering
importance_NDArray: Importances, if specified during fit
n_nodes_int: Number of SOM nodes
som_MiniSOM: Trained SOM model
som_weights_NDArray: Weights of SOM nodes. Shape (n_nodes_, n_features)
som_labels_NDArray: Assigned SOM node for each sample in X
mst_Graph: Minimal-spanning-tree of the SOM nodes
n_clusters_int: Number of metaclusters
som_metacluster_labels_NDArray: Metacluster label of each SOM node. Shape (n_nodes,)
consensus_clustering_ConsensusCluster: ConsensusCluster object used for metaclustering

Examples

>>> from skcyto import FlowSOM
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
...               [10, 2], [10, 4], [10, 0]])
>>> fsom = FlowSOM(k_min = 2, k_max = 3, nodes_x=  2, nodes_y = 2)
>>> fsom.fit(X)
FlowSOM(k_max=3, k_min=2, nodes_x=2, nodes_y=2)

__init__(nodes_x: int = 10, nodes_y: int = 10, n_iter: int = 10, learning_rate: float = 0.5, neighborhood_function: str = 'gaussian', sigma: float = 1.0, activation_distance: str = 'euclidean', k_min: int | None = None, k_max: int = 20, random_state: int | None = None)[source]¶

fit(X: ndarray[Any, dtype[ScalarType]], y=None, importance: ndarray[Any, dtype[ScalarType]] | None = None)[source]¶

Fit FlowSOM to data

Parameters:

XNDArray: Training data to cluster
yIgnored: Not used, present here for API consistency by convention.
importanceNDArray, optional: Can be used to scale individual features to give them more importance during training, by default None

Returns:

self: object: Returns the fitted instance

Raises:

ValueError: If importance is specified and its shape is not matching X

fit_predict(X: ndarray[Any, dtype[ScalarType]], y: ndarray[Any, dtype[ScalarType]] | None = None, importance: ndarray[Any, dtype[ScalarType]] | None = None) → ndarray[Any, dtype[ScalarType]][source]¶

Fit and return the metacluster labels of each sample

Features can be weighted by a user-specified importance, if desired.

Parameters:

XNDArray: Training instances to cluster
yNDArray, optional: Not used, present here for API consistency by convention.
importanceNDArray, optional: Weights to scale each feature by during analysis, by default None

Returns:

NDArray: Metacluster labels

get_metacluster_CV() → dict[source]¶

Calculates coefficient of variation for all channels of each metacluster

Returns:

dict: Dictionary with CV per channel and metacluster

get_metacluster_MFI() → dict[source]¶

Calculates mean fluorescence intensity of each metacluster

Returns:

dict: Dictionary with mean intensity per metacluster

get_metacluster_counts() → dict[source]¶

Calculates cell counts per metacluster

Returns:

dict: Dictionary with cell count per metacluster

get_metacluster_percentages() → dict[source]¶

Calculates percentage of total cells assigned to each metacluster

Returns:

dict: Dictionary cell percentage per metacluster

get_outliers(X: ndarray[Any, dtype[ScalarType]], n_mad: float = 5) → ndarray[Any, dtype[ScalarType]][source]¶

Determine which cells are outliers to their assigned SOM node, based on MAD.

For each cell, the euclidean distance to its SOM node is calculated. For each SOM node, the median and mean absolute deviaton from the median is calculated. A cell is considered an outlier, if it is further away than median + n_mad * mad away from its assigned node center.

Parameters:

XNDArray: Array with measured cells
n_madfloat, optional: Factor how many times of MAD a cell can be away from the SOM center, by default 5

Returns:

NDArray: Boolean array indicating whether a cell is an outlier.

get_som_CV() → dict[source]¶

Calculates coefficient of variation for all channels of each SOM node

If a node has no cells associated to it, it is not in the returned dict.

Returns:

dict: Dictionary with CV per channel and SOM node

get_som_MFI() → dict[source]¶

Calculates mean fluorescence intensity of each SOM node

If a node has no cells associated to it, it is not in the returned dict.

Returns:

dict: Dictionary with mean intensity per SOM node

get_som_counts() → dict[source]¶

Calculates cell counts per SOM node

If a node has no cells associated to it, it is not in the returned dict.

Returns:

dict: Dictionary with cell count per SOM node

get_som_percentages() → dict[source]¶

Calculates percentage of total cells assigned to each SOM node

If a node has no cells associated to it, it is not in the returned dict.

Returns:

dict: Dictionary cell percentage per SOM node

predict(X: ndarray[Any, dtype[ScalarType]]) → ndarray[Any, dtype[ScalarType]][source]¶

Predicts the metacluster label for each sample in X

Parameters:

XNDArray: Measurements

Returns:

NDArray: Metacluster labels

predict_som(X: ndarray[Any, dtype[ScalarType]]) → ndarray[Any, dtype[ScalarType]][source]¶

Predicts only the SOM node label for each instance in X

Parameters:

XNDArray: Measurements

Returns:

NDArray: SOM node labels

skcyto.FlowSOM¶

`skcyto`.FlowSOM¶