SOMRegressor
- class susi.SOMRegressor(n_rows: int = 10, n_columns: int = 10, *, init_mode_unsupervised: str = 'random', init_mode_supervised: str = 'random', n_iter_unsupervised: int = 1000, n_iter_supervised: int = 1000, train_mode_unsupervised: str = 'online', train_mode_supervised: str = 'online', neighborhood_mode_unsupervised: str = 'linear', neighborhood_mode_supervised: str = 'linear', learn_mode_unsupervised: str = 'min', learn_mode_supervised: str = 'min', distance_metric: str = 'euclidean', learning_rate_start: float = 0.5, learning_rate_end: float = 0.05, nbh_dist_weight_mode: str = 'pseudo-gaussian', missing_label_placeholder: int | str | None = None, n_jobs: int | None = None, random_state=None, verbose: int | None = 0)[source]
Bases:
SOMEstimator
,RegressorMixin
Supervised SOM for estimating continuous variables (= regression).
- Parameters:
n_rows (int, optional (default=10)) – Number of rows for the SOM grid
n_columns (int, optional (default=10)) – Number of columns for the SOM grid
init_mode_unsupervised (str, optional (default=”random”)) – Initialization mode of the unsupervised SOM
init_mode_supervised (str, optional (default=”random”)) – Initialization mode of the supervised SOM
n_iter_unsupervised (int, optional (default=1000)) – Number of iterations for the unsupervised SOM
n_iter_supervised (int, optional (default=1000)) – Number of iterations for the supervised SOM
train_mode_unsupervised (str, optional (default=”online”)) – Training mode of the unsupervised SOM
train_mode_supervised (str, optional (default=”online”)) – Training mode of the supervised SOM
neighborhood_mode_unsupervised (str, optional (default=”linear”)) – Neighborhood mode of the unsupervised SOM
neighborhood_mode_supervised (str, optional (default=”linear”)) – Neighborhood mode of the supervised SOM
learn_mode_unsupervised (str, optional (default=”min”)) – Learning mode of the unsupervised SOM
learn_mode_supervised (str, optional (default=”min”)) – Learning mode of the supervised SOM
distance_metric (str, optional (default=”euclidean”)) – Distance metric to compare on feature level (not SOM grid). Possible metrics: {“euclidean”, “manhattan”, “mahalanobis”, “tanimoto”, “spectralangle”}. Note that “tanimoto” tends to be slow.
New in version 1.1.1: Spectral angle metric.
learning_rate_start (float, optional (default=0.5)) – Learning rate start value
learning_rate_end (float, optional (default=0.05)) – Learning rate end value (only needed for some lr definitions)
nbh_dist_weight_mode (str, optional (default=”pseudo-gaussian”)) – Formula of the neighborhood distance weight. Possible formulas are: {“pseudo-gaussian”, “mexican-hat”}.
missing_label_placeholder (int or str or None, optional (default=None)) – Label placeholder for datapoints with no label. This is needed for semi-supervised learning.
n_jobs (int or None, optional (default=None)) – The number of jobs to run in parallel.
random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
verbose (int, optional (default=0)) – Controls the verbosity.
- Variables:
node_list (np.ndarray of (int, int) tuples) – List of 2-dimensional coordinates of SOM nodes
radius_max (float, int) – Maximum radius of the neighborhood function
radius_min (float, int) – Minimum radius of the neighborhood function
unsuper_som (np.ndarray) – Weight vectors of the unsupervised SOM shape = (self.n_rows, self.n_columns, X.shape[1])
X (np.ndarray) – Input data
fitted (bool) – States if estimator is fitted to X
max_iterations (int) – Maximum number of iterations for the current training
bmus (list of (int, int) tuples) – List of best matching units (BMUs) of the dataset X
sample_weights (np.ndarray) – Sample weights.
n_regression_vars (int) – Number of regression variables. In most examples, this equals one.
n_features_in (int) – Number of input features
- fit(X: Sequence, y: Sequence | None = None)
Fit supervised SOM to the input data.
- Parameters:
X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (array-like matrix of shape = [n_samples, 1]) – The labels (ground truth) of the input samples
- Returns:
self
- Return type:
Examples
Load the SOM and fit it to your input data X and the labels y with:
>>> import susi >>> som = susi.SOMRegressor() >>> som.fit(X, y)
- fit_transform(X: Sequence, y: Sequence | None = None) ndarray
Fit to the input data and transform it.
- Parameters:
X (array-like matrix of shape = [n_samples, n_features]) – The training and prediction input samples.
y (array-like matrix of shape = [n_samples, 1]) – The labels (ground truth) of the input samples
- Returns:
Predictions including the BMUs of each datapoint
- Return type:
Examples
Load the SOM, fit it to your input data X and transform your input data with:
>>> import susi >>> som = susi.SOMClassifier() >>> tuples = som.fit_transform(X, y)
- get_bmu(datapoint: ndarray, som_array: ndarray) Tuple[int, int]
Get best matching unit (BMU) for datapoint.
- get_bmus(X: ndarray, som_array: ndarray | None = None) List[Tuple[int, int]] | None
Get Best Matching Units for big datalist.
- Parameters:
X (np.ndarray) – List of datapoints
som_array (np.ndarray, optional (default=`None`)) – Weight vectors of the SOM shape = (self.n_rows, self.n_columns, X.shape[1])
- Returns:
bmus – Position of best matching units (row, column) for each datapoint
- Return type:
Examples
Load the SOM, fit it to your input data X and transform your input data with:
>>> import susi >>> import matplotlib.pyplot as plt >>> som = susi.SOMClustering() >>> som.fit(X) >>> bmu_list = som.get_bmus(X) >>> plt.hist2d([x[0] for x in bmu_list], [x[1] for x in bmu_list]
- get_clusters(X: ndarray) List[Tuple[int, int]] | None
Calculate the SOM nodes on the unsupervised SOM grid per datapoint.
- get_estimation_map() ndarray
Return SOM grid with the estimated value on each node.
- Returns:
super_som_ – Supervised SOM grid with estimated value on each node.
- Return type:
np.ndarray
Examples
Fit the SOM on your data X, y:
>>> import susi >>> import matplotlib.pyplot as plt >>> som = susi.SOMClassifier() >>> som.fit(X, y) >>> estimation_map = som.get_estimation_map() >>> plt.imshow(np.squeeze(estimation_map,) cmap="viridis_r")
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
- get_quantization_error(X: Sequence | None = None) float
Get quantization error for X (or the training data).
- Parameters:
X (array-like matrix, optional (default=True)) – Samples of shape = [n_samples, n_features]. If None, the training data is used for the calculation.
- Returns:
Mean quantization error over all datapoints.
- Return type:
- Raises:
RuntimeError – Raised if the SOM is not fitted yet.
- get_u_matrix(mode: str = 'mean') ndarray
Calculate unified distance matrix (u-matrix).
- Parameters:
mode (str, optional (default=”mean)) – Choice of the averaging algorithm
- Returns:
u_matrix – U-matrix containing the distances between all nodes of the unsupervised SOM. Shape = (n_rows*2-1, n_columns*2-1)
- Return type:
np.ndarray
Examples
Fit your SOM to input data X and then calculate the u-matrix with get_u_matrix(). You can plot the u-matrix then with e.g. pyplot.imshow().
>>> import susi >>> import numpy as np >>> import matplotlib.pyplot as plt >>> som = susi.SOMClustering() >>> som.fit(X) >>> umat = som.get_u_matrix() >>> plt.imshow(np.squeeze(umat))
- predict(X: Sequence, y: Sequence | None = None) ndarray
Predict output of data X.
- Parameters:
X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (None, optional) – Ignored.
- Returns:
y_pred – List of predicted values.
- Return type:
Examples
Fit the SOM on your data X, y:
>>> import susi >>> som = susi.SOMClassifier() >>> som.fit(X, y) >>> y_pred = som.predict(X)
- score(X, y, sample_weight=None)
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SOMRegressor
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.- Returns:
self – The updated object.
- Return type:
- transform(X: Sequence, y: Sequence | None = None) ndarray
Transform input data.
- Parameters:
X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (None, optional) – Ignored.
- Returns:
Predictions including the BMUs of each datapoint
- Return type:
Examples
Load the SOM, fit it to your input data X and transform your input data with:
>>> import susi >>> som = susi.SOMClustering() >>> som.fit(X) >>> X_transformed = som.transform(X)