skcyto.FlowSOM¶
- class skcyto.FlowSOM(nodes_x: int = 10, nodes_y: int = 10, n_iter: int = 10, learning_rate: float = 0.5, neighborhood_function: str = 'gaussian', sigma: float = 1.0, activation_distance: str = 'euclidean', k_min: int | None = None, k_max: int = 20, random_state: int | None = None)[source]¶
FlowSOM algorithm to cluster cytometry data
Trains a FlowSOM algorithm on the given data. Follows the original R implementation as closely as possible.
See also the original publication:
Van Gassen et al., FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry A. (2015)
- Parameters:
- nodes_xint
Number of SOM nodes in X direction, by default 10
- nodes_yint
Number of SOM nodes in Y direction, by default 10
- n_iterint
Number of iterations to train SOM, by default 10
- learning_ratefloat
Learning rate of the SOM, by default 0.5
- neighborhood_functionstr
Neighborhood function of the SOM, by default gaussian
- sigmafloat
Standard deviation of the SOM’s neighborhood function, by default 1
- activation_distancestring
Distance function for SOM, by default euclidean
- k_minint, optional
Lower bound for metaclustering number of clusters, by default None
- k_maxint
Upper bound for metaclustering number of clusters, by default 20
- random_state: int, RandomState instance or None
Determines random number generation for SOM training and ConsensusClustering subsampling, by default None.
- Attributes:
- X_NDArray
Training data
- labels_
Labels for each instance of X after SOM assignment and metaclustering
- importance_NDArray
Importances, if specified during fit
- n_nodes_int
Number of SOM nodes
- som_MiniSOM
Trained SOM model
- som_weights_NDArray
Weights of SOM nodes. Shape (n_nodes_, n_features)
- som_labels_NDArray
Assigned SOM node for each sample in X
- mst_Graph
Minimal-spanning-tree of the SOM nodes
- n_clusters_int
Number of metaclusters
- som_metacluster_labels_NDArray
Metacluster label of each SOM node. Shape (n_nodes,)
- consensus_clustering_ConsensusCluster
ConsensusCluster object used for metaclustering
Examples
>>> from skcyto import FlowSOM >>> import numpy as np >>> X = np.array([[1, 2], [1, 4], [1, 0], ... [10, 2], [10, 4], [10, 0]]) >>> fsom = FlowSOM(k_min = 2, k_max = 3, nodes_x= 2, nodes_y = 2) >>> fsom.fit(X) FlowSOM(k_max=3, k_min=2, nodes_x=2, nodes_y=2)
- __init__(nodes_x: int = 10, nodes_y: int = 10, n_iter: int = 10, learning_rate: float = 0.5, neighborhood_function: str = 'gaussian', sigma: float = 1.0, activation_distance: str = 'euclidean', k_min: int | None = None, k_max: int = 20, random_state: int | None = None)[source]¶
- fit(X: ndarray[Any, dtype[ScalarType]], y=None, importance: ndarray[Any, dtype[ScalarType]] | None = None)[source]¶
Fit FlowSOM to data
- Parameters:
- XNDArray
Training data to cluster
- yIgnored
Not used, present here for API consistency by convention.
- importanceNDArray, optional
Can be used to scale individual features to give them more importance during training, by default None
- Returns:
- self: object
Returns the fitted instance
- Raises:
- ValueError
If importance is specified and its shape is not matching X
- fit_predict(X: ndarray[Any, dtype[ScalarType]], y: ndarray[Any, dtype[ScalarType]] | None = None, importance: ndarray[Any, dtype[ScalarType]] | None = None) ndarray[Any, dtype[ScalarType]][source]¶
Fit and return the metacluster labels of each sample
Features can be weighted by a user-specified importance, if desired.
- Parameters:
- XNDArray
Training instances to cluster
- yNDArray, optional
Not used, present here for API consistency by convention.
- importanceNDArray, optional
Weights to scale each feature by during analysis, by default None
- Returns:
- NDArray
Metacluster labels
- get_metacluster_CV() dict[source]¶
Calculates coefficient of variation for all channels of each metacluster
- Returns:
- dict
Dictionary with CV per channel and metacluster
- get_metacluster_MFI() dict[source]¶
Calculates mean fluorescence intensity of each metacluster
- Returns:
- dict
Dictionary with mean intensity per metacluster
- get_metacluster_counts() dict[source]¶
Calculates cell counts per metacluster
- Returns:
- dict
Dictionary with cell count per metacluster
- get_metacluster_percentages() dict[source]¶
Calculates percentage of total cells assigned to each metacluster
- Returns:
- dict
Dictionary cell percentage per metacluster
- get_outliers(X: ndarray[Any, dtype[ScalarType]], n_mad: float = 5) ndarray[Any, dtype[ScalarType]][source]¶
Determine which cells are outliers to their assigned SOM node, based on MAD.
For each cell, the euclidean distance to its SOM node is calculated. For each SOM node, the median and mean absolute deviaton from the median is calculated. A cell is considered an outlier, if it is further away than median + n_mad * mad away from its assigned node center.
- Parameters:
- XNDArray
Array with measured cells
- n_madfloat, optional
Factor how many times of MAD a cell can be away from the SOM center, by default 5
- Returns:
- NDArray
Boolean array indicating whether a cell is an outlier.
- get_som_CV() dict[source]¶
Calculates coefficient of variation for all channels of each SOM node
If a node has no cells associated to it, it is not in the returned dict.
- Returns:
- dict
Dictionary with CV per channel and SOM node
- get_som_MFI() dict[source]¶
Calculates mean fluorescence intensity of each SOM node
If a node has no cells associated to it, it is not in the returned dict.
- Returns:
- dict
Dictionary with mean intensity per SOM node
- get_som_counts() dict[source]¶
Calculates cell counts per SOM node
If a node has no cells associated to it, it is not in the returned dict.
- Returns:
- dict
Dictionary with cell count per SOM node
- get_som_percentages() dict[source]¶
Calculates percentage of total cells assigned to each SOM node
If a node has no cells associated to it, it is not in the returned dict.
- Returns:
- dict
Dictionary cell percentage per SOM node