Explainers#

XKM (eXplainable K-Medoids)#

class cxplain.xkm.XkmExplainer(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs, * num_features], ~numpy.floating], cluster_centers: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_clusters, * num_features], ~numpy.floating], flavour: str, distance_metric: str, cluster_predictions: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs], ~numpy.integer], feature_names: ~typing.List[str] | None = None)#

Explain clustering results using the eXplainable K-Medoids (XKM) method, as proposed by (TBD).

Variables:
  • data (NDArray[Shape["* num_obs, * num_features"], Floating]) – Input data for clustering.

  • cluster_centers (NDArray[Shape["* num_clusters, * num_features"], Floating]) – Cluster centers.

  • flavour (str) – Flavour of the XkMedoids method, “next_best” or “all”.

  • distance_metric (str) – Distance metric used for calculating feature-wise distances, either “euclidean” or “manhatten”.

  • cluster_predictions (NDArray[Shape["* num_obs"], Int]) – Cluster predictions for the input data.

  • feature_names (Optional[List[str]]) – Optional list of feature names.

  • num_features (int) – The number of features in the input data.

  • feature_wise_distance_matrix (NDArray[Shape["* num_obs, * num_clusters,* num_features"], Floating]) – Feature-wise distance matrix.

- fit(self)

Fits the explainer by calculating the feature-wise distance matrix making it ready for use.

- _calculate_feature_wise_distance_matrix(self)

-> NDArray[Shape[”* num_obs, * num_clusters, * num_features”], Floating]: Calculates the feature-wise distance matrix.

- _calculate_pointwise_relevance(self) -> pd.DataFrame

Computes pointwise feature relevance scores.

- _calculate_cluster_relevance(self, pointwise_scores) -> pd.DataFrame

Computes cluster-wise feature relevance scores based on pointwise scores.

- _calculate_global_relevance(self, pointwise_scores) -> pd.Series

Computes global feature relevance scores based on pointwise scores.

- explain(self) -> ExplainedClustering

Explains clustering results by computing pointwise, cluster, and global feature relevance scores.

Example:

>>> # Create an XkmExplainer instance
>>> data = ...  # Input data for clustering
>>> cluster_centers = ...  # Cluster centers
>>> flavour = ...  # Flavour of the XkMedoids method
>>> distance_metric = ...  # Distance metric for calculating feature-wise distances
>>> cluster_predictions = ...  # Cluster predictions for the input data
>>> feature_names = ...  # Optional list of feature names
>>> explainer = XkmExplainer(data, cluster_centers, flavour, distance_metric,
...                          cluster_predictions, feature_names=feature_names)
>>> # Fit the explainer
>>> explainer.fit()
>>> # Explain clustering results
>>> explained_result = explainer.explain()
_calculate_feature_wise_distance_matrix() floating]#

Calculates the feature-wise distance matrix of every feature of every observation to the corresponding feature coordinate of every cluster.

Returns:

A distance tensor of shape

num_observations x num_clusters x num_features.

Return type:

NDArray[Shape[”* num_obs, * num_clusters, * num_features”], Floating]

Example:

>>> # Calculate the feature-wise distance matrix
>>> feature_wise_distance_matrix = explainer._calculate_feature_wise_distance_matrix()
_calculate_pointwise_relevance() DataFrame#

Computes pointwise feature relevance scores based on the XKM method.

Returns:

Pointwise feature relevance scores.

Return type:

pd.DataFrame

Example:

>>> # Compute pointwise feature relevance scores
>>> pointwise_relevance = explainer._calculate_pointwise_relevance()
_calculate_cluster_relevance(pointwise_scores: DataFrame) DataFrame#

Computes cluster-wise feature relevance scores based on pointwise scores.

Parameters:

pointwise_scores – Pointwise feature relevance scores.

Returns:

Cluster-wise feature relevance scores.

Return type:

pd.DataFrame

Example:

>>> # Compute cluster-wise feature relevance scores
>>> cluster_relevance = explainer._calculate_cluster_relevance(pointwise_scores)
_calculate_global_relevance(pointwise_scores: DataFrame) Series#

Computes global feature relevance scores based on pointwise scores.

Parameters:

pointwise_scores – Pointwise feature relevance scores.

Returns:

Global feature relevance scores.

Return type:

pd.Series

Example:

>>> # Compute global feature relevance scores
>>> global_relevance = explainer._calculate_global_relevance(pointwise_scores)
fit()#

Fits the explainer by calculating the feature-wise distance matrix making it ready for use.

Example:

>>> # Fit the explainer
>>> explainer.fit()
explain() ExplainedClustering#

Explains clustering results by computing pointwise, cluster, and global feature relevance scores.

Returns:

An instance of ExplainedClustering containing feature relevance scores.

Return type:

ExplainedClustering

Example:

>>> # Explain clustering results
>>> explained_result = explainer.explain()
cxplain.xkm._get_xkm_flavour(flavour: str, **kwargs)#

Factory method for getting each flavour of Xkm.

Parameters:

flavour – The desired flavour of Xkm, either ‘next_best’ or ‘all’.

Returns:

An instance of the specified Xkm flavour.

Return type:

BaseXkmFlavour

Raises:

NonExistingXkmFlavourError – If the specified flavour does not exist.

Example:

>>> # Get an instance of the "next_best" Xkm flavour
>>> xkm_flavour = _get_xkm_flavour("next_best")
>>> # Get an instance of the "all" Xkm flavour
>>> xkm_flavour = _get_xkm_flavour("all")
>>> # Attempt to get an instance of a non-existing Xkm flavour (Raises an error)
>>> xkm_flavour = _get_xkm_flavour("invalid_flavour")
class cxplain.xkm.BaseXkmFlavour#

Base class for different Xkm Flavours.

This is an abstract base class for different flavours of the eXplainable k-medoids (Xkm) method. Subclasses should implement the _calculate_pointwise_relevance method to calculate pointwise feature relevance.

- _calculate_pointwise_relevance(cls) -> pd.DataFrame

Abstract method to calculate pointwise feature relevance based on the XKM flavour.

class cxplain.xkm.XkmNextBestFlavour#

This class calculates pointwise feature relevance for k-medoids clustering by finding the “next best” alternative cluster for each feature and observation.

- _best_calc(

feature_wise_distance_matrix, cluster_predictions) -> Tuple[NDArray, NDArray]: Find the “next best” alternative clusters for each feature and observation.

- _calculate_pointwise_relevance(feature_wise_distance_matrix, cluster_predictions) -> pd.DataFrame

Calculate pointwise feature relevance using the distance to the “next best” cluster for each feature and observation.

Example:

>>> # Create an instance of XkmNextBestFlavour
>>> xkm_flavour = XkmNextBestFlavour()
>>> # Calculate pointwise feature relevance using the "Next Best" method
>>> relevance_matrix = xkm_flavour._calculate_pointwise_relevance(
...     feature_wise_distance_matrix, cluster_predictions
... )
static _best_calc(feature_wise_distance_matrix: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs, * num_clusters, * num_features], ~numpy.floating], cluster_predictions: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs], ~numpy.integer]) Tuple[NDArray[Any, Any], NDArray[Any, Any]]#

Find the “next best” alternative clusters for each feature and observation.

Parameters:
  • feature_wise_distance_matrix (NDArray[Shape["* num_obs, * num_clusters, * num_features"], Floating]) – Feature-wise distance matrix of every feature to every cluster for each observation.

  • cluster_predictions (NDArray[Shape["* num_obs"], Int]) – Assigned clusters for each observation.

Returns:

A tuple of two NDArrays, where the first contains the feature-based distances to the assigned cluster and the second contains the feature-based distances to the “next best” alternative cluster.

Return type:

Tuple[NDArray, NDArray]

_calculate_pointwise_relevance(feature_wise_distance_matrix: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs, * num_clusters, * num_features], ~numpy.floating], cluster_predictions: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs], ~numpy.integer]) DataFrame#

Calculate pointwise feature relevance using the distance to the “next best” cluster for each feature and observation.

Parameters:
  • feature_wise_distance_matrix (NDArray[Shape["* num_obs, * num_clusters, * num_features"], Floating]) – Feature-wise distance matrix of every feature to every cluster for each observation.

  • cluster_predictions (NDArray[Shape["* num_obs"], Int]) – Assigned clusters for each observation.

Returns:

A DataFrame containing pointwise feature relevance scores.

Return type:

pd.DataFrame

Example:

>>> # Create an instance of XkmNextBestFlavour
>>> xkm_flavour = XkmNextBestFlavour()
>>> # Prepare feature-wise distance matrix and cluster predictions
>>> feature_wise_distance_matrix = np.array(...)  # Replace with your data
>>> cluster_predictions = np.array(...)  # Replace with your data
>>> # Calculate pointwise feature relevance using the "Next Best" method
>>> relevance_matrix = xkm_flavour._calculate_pointwise_relevance(
...     feature_wise_distance_matrix, cluster_predictions
... )
class cxplain.xkm.XkmAllFlavour#

This class calculates pointwise feature relevances for k-medoids clustering by comparing the distances to the actual assigned cluster with the complete distances over all clusters.

- _calculate_pointwise_relevance(feature_wise_distance_matrix, cluster_predictions) -> pd.DataFrame

Calculate pointwise feature relevance using the distance to all clusters for each feature and observation.

Example:

>>> # Create an instance of XkmAllFlavour
>>> xkm_flavour = XkmAllFlavour()
>>> # Calculate pointwise feature relevance using the "All Features" method
>>> feature_wise_distance_matrix = np.array(...)  # Replace with your data
>>> cluster_predictions = np.array(...)  # Replace with your data
>>> relevance_matrix = xkm_flavour._calculate_pointwise_relevance(
...     feature_wise_distance_matrix, cluster_predictions
... )
_calculate_pointwise_relevance(feature_wise_distance_matrix: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs, * num_clusters, * num_features], ~numpy.floating], cluster_predictions: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs], ~numpy.integer]) DataFrame#

Calculate pointwise feature relevances using the distance to all clusters for each feature and observation.

Parameters:
  • feature_wise_distance_matrix (NDArray[Shape["* num_obs, * num_clusters, * num_features"], Floating]) – Feature-wise distance matrix of every feature to every cluster for each observation.

  • cluster_predictions (NDArray[Shape["* num_obs"], Int]) – Assigned clusters for each observation.

Returns:

A DataFrame containing pointwise feature relevance scores.

Return type:

pd.DataFrame

Example:

>>> # Create an instance of XkmAllFlavour
>>> xkm_flavour = XkmAllFlavour()
>>> # Prepare feature-wise distance matrix and cluster predictions
>>> feature_wise_distance_matrix = np.array(...)  # Replace with your data
>>> cluster_predictions = np.array(...)  # Replace with your data
>>> # Calculate pointwise feature relevance using the "All Features" method
>>> relevance_matrix = xkm_flavour._calculate_pointwise_relevance(
...     feature_wise_distance_matrix, cluster_predictions
... )

NEON (neuralization-propagation)#

class cxplain.neon.NeonExplainer(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs, * num_features], ~numpy.floating], cluster_centers: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_clusters, * num_features], ~numpy.floating], predictions: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs], ~numpy.integer], feature_names: ~typing.List[str] | None = None)#

A base class for explaining clustering results using the Neon (neuralization-propagation) method suggested in https://arxiv.org/abs/1906.07633. Subclasses are required to implement the abstract method _init_network for custom network initialization.

Variables:
  • data (NDArray[Shape["* num_obs, * num_features"], Floating]) – Input data for clustering.

  • cluster_centers (NDArray[Shape["* num_clusters, * num_features"], Floating]) – Cluster centers for the input data.

  • predictions (NDArray[Shape["* num_obs"], Int]) – Cluster predictions for the input data.

  • feature_names (Optional[List[str]]) – Optional list of feature names.

  • num_clusters (int) – The number of clusters.

  • num_features (int) – The number of features in the input data.

  • networks (List[NeonNetwork]) – List of NeonNetwork instances.

- _init_network(self)

Abstract method to initialize the neural network for explaining clustering results. Subclasses must implement this method.

class cxplain.neon.KMeansNetwork(index_actual: int, weights: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_clusters, * num_features], ~numpy.floating], bias: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_clusters], ~numpy.floating], hidden_layer: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_clusters], ~numpy.floating] | None = None, output: float | None = None)#

A class representing a neuralized K-Means clustering, as described in https://arxiv.org/abs/1906.07633.

Variables:
  • index_actual (int) – The index of the actual cluster.

  • weights (NDArray[Shape["* num_clusters, * num_features"], Floating]) – Weights of the neural network.

  • bias (NDArray[Shape["* num_clusters"], Floating]) – Bias terms of the neural network.

  • hidden_layer (Optional[NDArray[Shape["* num_clusters"], Floating]]) – The hidden layer of the neural network.

  • output (Optional[float]) – The output of the neural network.

- forward(self, observation) -> KMeansNetwork

Performs a forward pass of the neural network with the given observation and computes the output.

- backward(self, observation, beta) -> NDArray[Shape["* num_features"], Floating]

Performs a backward pass of the neural network and computes feature relevance scores using LRP (Layer-wise Relevance Propagation).

- _check_forward_pass(self)

Checks if a forward pass has been conducted, raising an error if not.

Example:

>>> # Create a KMeansNetwork instance
>>> index_actual = 0
>>> weights = np.random.rand(3, 2)  # Weights for the neural network
>>> bias = np.random.rand(3)  # Bias terms for the neural network
>>> network = KMeansNetwork(index_actual, weights, bias)
>>> # Perform a forward pass
>>> observation = np.random.rand(2)  # Input observation
>>> network.forward(observation)
>>> # Perform a backward pass to compute feature relevances
>>> beta = 0.5  # Beta value for the backward pass
>>> feature_relevances = network.backward(observation, beta)
>>> # Check if a forward pass has been conducted
>>> network._check_forward_pass()
forward(observation: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_features], ~numpy.floating]) KMeansNetwork#

Performs a forward pass of the neuralized K-Means network with the given observation and computes the output.

Parameters:

observation (NDArray[Shape["* num_features"], Floating]) – Input observation for the forward pass.

Returns:

The KMeansNetwork instance after the forward pass.

Return type:

KMeansNetwork

Example:

>>> # Perform a forward pass
>>> observation = np.random.rand(2)  # Input observation
>>> network.forward(observation)
backward(observation: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_features], ~numpy.floating], beta: float) floating]#

Performs a backward pass of the neuralized K-Means network and computes feature relevance scores for one observation using LRP (Layer-wise Relevance Propagation).

Parameters:
  • observation (NDArray[Shape["* num_features"], Floating]) – Input observation for the backward pass.

  • beta – Beta value for the backward pass.

Returns:

Feature relevance scores computed during the backward pass.

Return type:

NDArray[Shape[”* num_features”], Floating]

Example:

>>> # Perform a backward pass to compute feature relevances
>>> beta = 0.5  # Beta value for the backward pass
>>> feature_relevances = network.backward(observation, beta)
_check_forward_pass()#

Checks if a forward pass has been conducted, raising an error if not.

Raises:

NotFittedError – Raised if no forward pass has been conducted beforehand.

Notes

This method is intended for internal use and should not be called directly.

Example:

>>> # Check if a forward pass has been conducted
>>> network._check_forward_pass()
class cxplain.neon.NeonKMeansExplainer(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs, * num_features], ~numpy.floating], cluster_centers: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_clusters, * num_features], ~numpy.floating], predictions: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs], ~numpy.integer], feature_names: ~typing.List[str] | None = None)#

This class uses the Neon (neuralization-propagation) method, suggested in https://arxiv.org/abs/1906.07633, to explain K-Means clustering results by computing pointwise, cluster, and global feature relevance scores.

Variables:
  • data (NDArray[Shape["* num_obs, * num_features"], Floating]) – Input data for clustering.

  • cluster_centers (NDArray[Shape["* num_clusters, * num_features"], Floating]) – Cluster centers for the input data.

  • predictions (NDArray[Shape["* num_obs"], Int]) – Cluster predictions for the input data.

  • feature_names (Optional[List[str]]) – Optional list of feature names.

  • num_clusters (int) – The number of clusters.

  • num_features (int) – The number of features in the input data.

  • networks (List[KMeansNetwork]) – List of KMeansNetwork instances for explaining clustering results.

- _init_network(self, index_observation) -> KMeansNetwork

Initializes and returns a KMeansNetwork for a specific observation.

- _calculate_pointwise_relevance(self) -> pd.DataFrame

Computes pointwise feature relevance scores for the input data.

- _calculate_cluster_relevance(self, pointwise_scores) -> pd.DataFrame

Computes cluster-wise feature relevance scores based on pointwise scores.

- _calculate_global_relevance(self, pointwise_scores) -> pd.Series

Computes global feature relevance scores based on pointwise scores.

- _get_beta(self) -> float

Computes the beta value for relevance calculations.

- fit(self)

Fits the explainer by initializing KMeansNetwork instances for each observation.

- explain(self) -> ExplainedClustering

Explains clustering results by computing pointwise, cluster, and global feature relevance scores.

Example:

>>> # Create a NeonKMeansExplainer instance
>>> data = ...  # Input data for clustering
>>> cluster_centers = ...  # Cluster centers for the input data
>>> predictions = ...  # Cluster predictions for the input data
>>> feature_names = ...  # Optional list of feature names
>>> explainer = NeonKMeansExplainer(data, cluster_centers, predictions, feature_names=feature_names)
>>> # Fit the explainer
>>> explainer.fit()
>>> # Explain clustering results
>>> explained_result = explainer.explain()
_init_network(index_observation: int) KMeansNetwork#

Initializes and returns a KMeansNetwork for a specific observation.

Parameters:

index_observation – Index of the observation for which to initialize the network.

Returns:

Initialized KMeansNetwork instance.

Return type:

KMeansNetwork

Example:

>>> # Initialize a KMeansNetwork for a specific observation
>>> network = explainer._init_network(index_observation)
_calculate_pointwise_relevance() DataFrame#

Computes pointwise feature relevance scores for the input data.

Returns:

Pointwise feature relevance scores.

Return type:

pd.DataFrame

Example:

>>> # Compute pointwise feature relevance scores
>>> pointwise_relevance = explainer._calculate_pointwise_relevance()
_calculate_cluster_relevance(pointwise_scores: DataFrame) DataFrame#

Computes cluster-wise feature relevance scores based on pointwise scores.

Parameters:

pointwise_scores – Pointwise feature relevance scores.

Returns:

Cluster-wise feature relevance scores.

Return type:

pd.DataFrame

Example:

>>> # Compute cluster-wise feature relevance scores
>>> cluster_relevance = explainer._calculate_cluster_relevance(pointwise_scores)
_calculate_global_relevance(pointwise_scores: DataFrame) Series#

Computes global feature relevance scores based on pointwise scores.

Parameters:

pointwise_scores – Pointwise feature relevance scores.

Returns:

Global feature relevance scores.

Return type:

pd.Series

Example:

>>> # Compute global feature relevance scores
>>> global_relevance = explainer._calculate_global_relevance(pointwise_scores)
_get_beta() float#

Computes the beta value for relevance calculations as 1 divided by the average of the outputs of the neuralized K-Means network, as described in https://arxiv.org/abs/1906.07633.

Returns:

The beta value.

Return type:

float

Example:

>>> # Compute the beta value
>>> beta = explainer._get_beta()
fit()#

Fits the explainer by initializing KMeansNetwork instances for each observation.

Example:

>>> # Fit the explainer
>>> explainer.fit()
explain() ExplainedClustering#

Explains clustering results by computing pointwise, cluster, and global feature relevance scores.

Returns:

An instance of ExplainedClustering containing feature relevance scores.

Return type:

ExplainedClustering

Example:

>>> # Explain clustering results
>>> explained_result = explainer.explain()

Tree-based explanation_results#

class cxplain.tree.DecisionTreeExplainer(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs, * num_features], ~numpy.floating], cluster_predictions: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs], ~numpy.integer], feature_names: ~typing.List[str] | None = None, **kwargs)#

A class for explaining clustering results using decision tree-based feature importance.

Here a decision tree is fitted on a given clustering in order to extract feature importance values. Therefore the DecisionTreeClassifier from the scikit-learn package is used. You can configure the surrogate tree by handing over corresponding kwargs when initializing an instance of this explainer.

Variables:
  • data (NDArray[Shape["* num_obs, * num_features"], Floating]) – Input data for clustering.

  • cluster_predictions (NDArray[Shape["* num_obs"], Int]) – Cluster predictions for the input data.

  • feature_names (Optional[List[str]]) – Optional list of feature names.

  • tree (DecisionTreeClassifier) – Decision tree classifier for feature importance computation.

  • num_features (tuple) – A tuple containing the number of features in the input data.

- fit(self) -> DecisionTreeExplainer

Fits the decision tree classifier to the input data and cluster predictions, making the explainer ready for use.

- explain(self) -> ExplainedClustering

Computes and returns global feature relevance scores based on the fitted decision tree.

Example:

>>> # Create a DecisionTreeExplainer instance
>>> data = ...  # Input data for clustering
>>> cluster_predictions = ...  # Cluster predictions for the input data
>>> feature_names = ...  # Optional list of feature names
>>> explainer = DecisionTreeExplainer(data, cluster_predictions, feature_names=feature_names)
>>> # Fit the explainer
>>> explainer.fit()
>>> # Explain the clustering results
>>> explained_result = explainer.explain()
fit()#

Fits the decision tree classifier to the input data and cluster predictions, making the explainer ready for use.

Returns:

The fitted DecisionTreeExplainer instance.

Return type:

DecisionTreeExplainer

Example:

>>> # Fit the explainer
>>> explainer.fit()
_calculate_global_relevance() DataFrame#

Calculates and returns global feature relevance scores based on the fitted decision tree.

Returns:

A DataFrame containing global feature relevance scores.

Return type:

pd.DataFrame

Notes

This method is intended for internal use and should not be called directly.

explain() ExplainedClustering#

Computes and returns global feature relevance scores based on the fitted decision tree.

Returns:

An instance of ExplainedClustering containing global feature relevance scores.

Return type:

ExplainedClustering

Raises:

NotFittedError – If the explainer is not fitted (fit method not called beforehand).

Example:

>>> # Explain the clustering results
>>> explained_result = explainer.explain()
class cxplain.tree.RandomForestExplainer(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs, * num_features], ~numpy.floating], cluster_predictions: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs], ~numpy.integer], feature_names: ~typing.List[str] | None = None, **kwargs)#

A class for explaining clustering results using random forest-based feature importance. Here a random forest is fitted on a given clustering in order to extract feature importance values. Therefore the RandomForestClassifier from the scikit-learn package is used. You can configure the surrogate forest by handing over corresponding kwargs when initializing an instance of this explainer.

Variables:
  • data (NDArray[Shape["* num_obs, * num_features"], Floating]) – Input data for clustering.

  • cluster_predictions (NDArray[Shape["* num_obs"], Int]) – Cluster predictions for the input data.

  • feature_names (Optional[List[str]]) – Optional list of feature names.

  • forest (RandomForestClassifier) – Random forest classifier for feature importance computation.

  • num_features (int) – The number of features in the input data.

- fit(self) -> RandomForestExplainer

Fits the random forest classifier to the input data and cluster predictions, making the explainer ready for use.

- explain(self) -> ExplainedClustering

Computes and returns global feature relevance scores based on the fitted random forest.

Example:

>>> # Create a RandomForestExplainer instance
>>> data = ...  # Input data for clustering
>>> cluster_predictions = ...  # Cluster predictions for the input data
>>> feature_names = ...  # Optional list of feature names
>>> explainer = RandomForestExplainer(data, cluster_predictions, feature_names=feature_names)
>>> # Fit the explainer
>>> explainer.fit()
>>> # Explain the clustering results
>>> explained_result = explainer.explain()
fit()#

Fits the random forest classifier to the input data and cluster predictions, making the explainer ready for use.

Returns:

The fitted RandomForestExplainer instance.

Return type:

RandomForestExplainer

Example:

>>> # Fit the explainer
>>> explainer.fit()
_calculate_global_relevance() DataFrame#

Calculates and returns global feature relevance scores based on the fitted random forest.

Returns:

A DataFrame containing global feature relevance scores.

Return type:

pd.DataFrame

Notes

This method is intended for internal use and should not be called directly.

explain() ExplainedClustering#

Computes and returns global feature relevance scores based on the fitted random forest.

Returns:

An instance of ExplainedClustering containing global feature relevance scores.

Return type:

ExplainedClustering

Raises:

NotFittedError – If the explainer is not fitted (fit method not called).

Example:

>>> # Explain the clustering results
>>> explained_result = explainer.explain()
class cxplain.tree.ExKMCExplainer(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs, * num_features], ~numpy.floating], kmeans_fitted: ~sklearn.cluster._kmeans.KMeans, feature_names: ~typing.List[str] | None = None, **kwargs)#

A class for explaining clustering results using ExKMC (Explainable K-Means Clustering).

This class is designed to explain clustering results using the ExKMC approach suggested in https://arxiv.org/abs/2002.12538, using the implementation found in https://github.com/navefr/ExKMC. You can configure the method by handing over corresponding kwargs when initializing an instance of this explainer.

Variables:
  • data (NDArray[Shape["* num_obs, * num_features"], Floating]) – Input data for clustering.

  • kmeans (KMeans) – A fitted K-Means clustering model.

  • feature_names (Optional[List[str]]) – Optional list of feature names.

  • tree (Tree) – ExKMC explainer for feature importance computation.

  • num_features (int) – The number of features in the input data.

- fit(self) -> ExKMCExplainer

Fits the ExKMC explainer to the input data and the fitted K-Means model, making it ready for use.

- explain(self) -> ExplainedClustering

Computes and returns global feature relevance scores based on the fitted ExKMC explainer.

Example:

>>> # Create an ExKMCExplainer instance
>>> data = ...  # Input data for clustering
>>> kmeans_model = ...  # A fitted K-Means model
>>> feature_names = ...  # Optional list of feature names
>>> explainer = ExKMCExplainer(data, kmeans_fitted=kmeans_model, feature_names=feature_names)
>>> # Fit the explainer
>>> explainer.fit()
>>> # Explain the clustering results
>>> explained_result = explainer.explain()
fit()#

Fits the ExKMC explainer to the input data and the fitted K-Means model, making it ready for use.

Returns:

The fitted ExKMCExplainer instance.

Return type:

ExKMCExplainer

Example:

>>> # Fit the explainer
>>> explainer.fit()
_calculate_global_relevance() DataFrame#

Calculates and returns global feature relevance scores based on the fitted ExKMC explainer.

Returns:

A DataFrame containing global feature relevance scores.

Return type:

pd.DataFrame

Notes

This method is intended for internal use and should not be called directly.

explain() ExplainedClustering#

Computes and returns global feature relevance scores based on the fitted ExKMC explainer.

Returns:

An instance of ExplainedClustering containing global feature relevance scores.

Return type:

ExplainedClustering

Raises:

NotFittedError – If the explainer is not fitted (fit method not called).

Example:

>>> # Explain the clustering results
>>> explained_result = explainer.explain()

Shapley values#

class cxplain.shap.ShapExplainer(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs, * num_features], ~numpy.floating], cluster_predictions: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs], ~numpy.integer], feature_names: ~typing.List[str] | None = None, **kwargs)#

Explain clustering using SHAP (SHapley Additive exPlanations) values, currently supporting Random Forest as the base classifier as well as the TreeExplainer for explanations. You can configure the base classifier by handing over corresponding kwargs when initializing an instance of this explainer.

Variables:
  • data (NDArray[Shape["* num_obs, * num_features"], Floating]) – Input data for clustering.

  • cluster_predictions (NDArray[Shape["* num_obs"], Int]) – Cluster predictions for the input data.

  • feature_names (Optional[List[str]]) – Optional list of feature names.

  • num_features (int) – The number of features in the input data.

  • forest (RandomForestClassifier) – Random Forest classifier used for SHAP value calculation.

  • explainer (TreeExplainer) – The SHAP explainer instance.

- fit(self)

Fits the explainer by training the Random Forest classifier and initializing the SHAP explainer.

- _calculate_pointwise_relevance(self) -> pd.DataFrame

Computes pointwise feature relevance scores using SHAP values.

- _get_relevant_shap_values(shap_values, cluster_predictions)

-> NDArray[Shape[”* num_obs, * num_features”], Floating]: Extracts only the SHAP values of to the assigned cluster for each observation.

- _calculate_cluster_relevance(self, pointwise_scores) -> pd.DataFrame

Computes cluster-wise feature relevance scores based on pointwise scores.

- _calculate_global_relevance(self, pointwise_scores) -> pd.Series

Computes global feature relevance scores based on pointwise scores.

- explain(self) -> ExplainedClustering

Explains clustering results by computing pointwise, cluster, and global feature relevance scores.

Example:

>>> # Create a ShapExplainer instance
>>> data = ...  # Input data for clustering
>>> cluster_predictions = ...  # Cluster predictions for the input data
>>> feature_names = ...  # Optional list of feature names
>>> explainer = ShapExplainer(data, cluster_predictions, feature_names=feature_names)
>>> # Fit the explainer
>>> explainer.fit()
>>> # Explain clustering results
>>> explained_result = explainer.explain()
fit()#

Fits the explainer by training the Random Forest classifier and initializing the SHAP explainer.

Example:

>>> # Fit the explainer
>>> explainer.fit()
_calculate_pointwise_relevance() DataFrame#

Computes pointwise feature relevance scores using SHAP values.

Returns:

Pointwise feature relevance scores based on SHAP values.

Return type:

pd.DataFrame

Example:

>>> # Compute pointwise feature relevance scores
>>> pointwise_relevance = explainer._calculate_pointwise_relevance()
static _get_relevant_shap_values(shap_values: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_cluster, * num_obs, * num_features], ~numpy.floating], cluster_predictions: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs], ~numpy.integer])#

Extracts only the SHAP values of to the assigned cluster for each observation.

Parameters:
  • shap_values (NDArray[Shape["* num_cluster, * num_obs, * num_features"], Floating]) – SHAP values.

  • cluster_predictions (NDArray[Shape["* num_obs"], Int]) – Cluster predictions for the input data.

Returns:

Relevant SHAP values.

Return type:

NDArray[Shape[”* num_obs, * num_features”], Floating]

Example:

>>> # Extract relevant SHAP values
>>> relevant_shap_values = explainer._get_relevant_shap_values(shap_values, cluster_predictions)
_calculate_cluster_relevance(pointwise_scores: DataFrame) DataFrame#

Computes cluster-wise feature relevance scores based on pointwise scores.

Parameters:

pointwise_scores – Pointwise feature relevance scores.

Returns:

Cluster-wise feature relevance scores.

Return type:

pd.DataFrame

Example:

>>> # Compute cluster-wise feature relevance scores
>>> cluster_relevance = explainer._calculate_cluster_relevance(pointwise_scores)
_calculate_global_relevance(pointwise_scores: DataFrame) Series#

Computes global feature relevance scores based on pointwise scores.

Parameters:

pointwise_scores – Pointwise feature relevance scores.

Returns:

Global feature relevance scores.

Return type:

pd.Series

Example:

>>> # Compute global feature relevance scores
>>> global_relevance = explainer._calculate_global_relevance(pointwise_scores)
explain() ExplainedClustering#

Explains clustering results by computing pointwise, cluster, and global feature relevance scores.

Returns:

An instance of ExplainedClustering containing feature relevance scores.

Return type:

ExplainedClustering

Example:

>>> # Explain clustering results
>>> explained_result = explainer.explain()

Gradient-based explanation_results#

class cxplain.gradient.GradientExplainer(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs, * num_features], ~numpy.floating], cluster_centers: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_clusters, * num_features], ~numpy.floating], cluster_predictions: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[* num_obs], ~numpy.integer], metric: ~typing.Type[~cxplain.metrics.Metric] | None = None, enable_abs_calculation: bool = True, feature_names: ~typing.List[str] | None = None)#

Explain clustering results using pointwise gradient of the cluster dissimilarity measure.

Variables:
  • data (NDArray[Shape["* num_obs, * num_features"], Floating]) – Input data for clustering.

  • cluster_centers (NDArray[Shape["* num_clusters, * num_features"], Floating]) – Cluster centers for the input data.

  • cluster_predictions (NDArray[Shape["* num_obs"], Int]) – Cluster predictions for the input data.

  • metric (Optional[Type[Metric]]) – Metric class for distance and gradient calculations (default is EuclideanMetric).

  • enable_abs_calculation (bool) – Flag to enable/disable absolute value calculation for relevance scores.

  • feature_names (Optional[List[str]]) – Optional list of feature names.

  • num_features (int) – The number of features in the input data.

- fit(self)

Fits the explainer.

- _calculate_pointwise_relevance(self) -> pd.DataFrame

Computes pointwise feature relevance scores based on the pointwise gradient of cluster loss.

- _calculate_cluster_relevance(self, pointwise_scores) -> pd.DataFrame

Computes cluster-wise feature relevance scores based on pointwise scores.

- _calc_abs_value(self, df

pd.DataFrame, enabled) -> pd.DataFrame: Calculates absolute values for a DataFrame if enabled, or returns the original DataFrame.

- _calculate_global_relevance(self, pointwise_scores) -> pd.Series

Computes global feature relevance scores based on pointwise scores.

- explain(self) -> ExplainedClustering

Explains clustering results by computing pointwise, cluster, and global feature relevance scores.

Example:

>>> # Create a GradientExplainer instance
>>> data = ...  # Input data for clustering
>>> cluster_centers = ...  # Cluster centers for the input data
>>> cluster_predictions = ...  # Cluster predictions for the input data
>>> feature_names = ...  # Optional list of feature names
>>> explainer = GradientExplainer(data, cluster_centers, cluster_predictions, metric=EuclideanMetric,
...                               enable_abs_calculation=True, feature_names=feature_names)
>>> # Fit the explainer
>>> explainer.fit()
>>> # Explain clustering results
>>> explained_result = explainer.explain()
fit()#

Fits the explainer, making it ready for use.

Example:

>>> # Fit the explainer
>>> explainer.fit()
_calculate_pointwise_relevance() DataFrame#

Computes pointwise feature relevance scores based on the pointwise gradient of cluster loss.

Returns:

Pointwise feature relevance scores.

Return type:

pd.DataFrame

Example:

>>> # Compute pointwise feature relevance scores
>>> pointwise_relevance = explainer._calculate_pointwise_relevance()
_calculate_cluster_relevance(pointwise_scores: DataFrame) DataFrame#

Computes cluster-wise feature relevance scores based on pointwise scores.

Parameters:

pointwise_scores – Pointwise feature relevance scores.

Returns:

Cluster-wise feature relevance scores.

Return type:

pd.DataFrame

Example:

>>> # Compute cluster-wise feature relevance scores
>>> cluster_relevance = explainer._calculate_cluster_relevance(pointwise_scores)
_calc_abs_value(df: DataFrame, enabled: bool) DataFrame#

Calculates absolute values for a DataFrame if enabled, or returns the original DataFrame.

Parameters:
  • df – The input DataFrame.

  • enabled – Flag to enable/disable absolute value calculation.

Returns:

The original or absolute value DataFrame.

Return type:

pd.DataFrame

Example:

>>> # Calculate absolute values
>>> df_abs = explainer._calc_abs_value(df, enabled=True)
_calculate_global_relevance(pointwise_scores: DataFrame) Series#

Computes global feature relevance scores based on pointwise scores.

Parameters:

pointwise_scores – Pointwise feature relevance scores.

Returns:

Global feature relevance scores.

Return type:

pd.Series

Example:

>>> # Compute global feature relevance scores
>>> global_relevance = explainer._calculate_global_relevance(pointwise_scores)
explain() ExplainedClustering#

Explains clustering results by computing pointwise, cluster, and global feature relevance scores.

Returns:

An instance of ExplainedClustering containing feature relevance scores.

Return type:

ExplainedClustering

Example:

>>> # Explain clustering results
>>> explained_result = explainer.explain()