Explanation Results#

class cxplain.base_explainer.GlobalExplainedClustering(global_relevance: Series)#

A class for representing global clustering explanations.

Variables:: global_relevance (pd.Series) – A Pandas Series containing global feature relevance scores.

- __eq__(self, other) -> bool: Checks if two instances of GlobalExplainedClustering are equal.

- show_global_relevance(self): Visualizes global feature relevance using a bar plot.

Example:

>>> import pandas as pd
>>> # Create a GlobalExplainedClustering instance with global feature relevance
>>> global_relevance = pd.Series([0.3, 0.5, 0.2], index=["feature_A", "feature_B", "feature_C"])
>>> global_explanation = GlobalExplainedClustering(global_relevance)
>>> # Check if two instances of GlobalExplainedClustering are equal
>>> another_global_relevance = pd.Series([0.5, 0.5, 0.2], index=["feature_A", "feature_B", "feature_C"])
>>> another_global_explanation = GlobalExplainedClustering(another_global_relevance)
>>> global_explanation == another_global_explanation
... False
>>> # Visualize global feature relevance
>>> global_explanation.show_global_relevance()

show_global_relevance()#

Visualizes global feature relevance using a bar plot.

Example:

>>> # Create a GlobalExplainedClustering instance with global feature relevance
>>> global_relevance = pd.Series([0.3, 0.5, 0.2], index=["A", "B", "C"])
>>> global_explanation = GlobalExplainedClustering(global_relevance)
>>> # Visualize global feature relevance
>>> global_explanation.show_global_relevance()

class cxplain.base_explainer.ClusterExplainedClustering(cluster_relevance: DataFrame)#

A class for representing cluster-specific clustering explanations.

Variables:: cluster_relevance (pd.DataFrame) – A Pandas DataFrame containing cluster feature relevance scores.

- __eq__(self, other) -> bool: Checks if two instances of ClusterExplainedClustering are equal.

- show_cluster_relevance(self, subset_index): Visualizes cluster-wise feature relevance using a heatmap.

- show_single_feature_relevance(self, feature, subset_index): Visualizes feature importance scores for a single feature across clusters using a bar plot.

- show_single_cluster_relevance(self, cluster_index): Visualizes feature importance scores for a single cluster using a bar plot.

Example:

>>> # Create a ClusterExplainedClustering instance with cluster feature relevance
>>> cluster_relevance_data = pd.DataFrame({
...     'feature_A': [0.3, 0.5, 0.2],
...     'feature_B': [0.2, 0.4, 0.6],
...     'feature_C': [0.4, 0.2, 0.5],
...     'assigned_clusters': [0, 1, 2]
... })
>>> cluster_relevance_data.set_index('assigned_clusters', inplace=True)
>>> cluster_explanation = ClusterExplainedClustering(cluster_relevance_data)
>>> # Check if two instances of ClusterExplainedClustering are equal
>>> another_cluster_relevance_data = pd.DataFrame({
...     'feature_A': [0.4, 0.4, 0.2],
...     'feature_B': [0.3, 0.3, 0.5],
...     'feature_C': [0.2, 0.1, 0.6],
...     'assigned_clusters': [0, 1, 2]
... })
>>> another_cluster_relevance_data.set_index('assigned_clusters', inplace=True)
>>> another_cluster_explanation = ClusterExplainedClustering(another_cluster_relevance_data)
>>> cluster_explanation == another_cluster_explanation
... False
>>> # Visualize cluster-wise feature relevance
>>> cluster_explanation.show_cluster_relevance()
>>> # Visualize feature importance scores for a single feature across clusters
>>> cluster_explanation.show_single_feature_relevance("feature_A")
>>> # Visualize feature importance scores for a single cluster
>>> cluster_explanation.show_single_cluster_relevance(2)

show_cluster_relevance(subset_index: str | List[str] | List[int] | Index | None = None)#

Visualizes cluster-wise feature relevance using a heatmap.

Parameters:: subset_index – Optional list of cluster indices to subset the data.

Example:

>>> # Visualize cluster-wise feature relevance
>>> cluster_explanation.show_cluster_relevance([0, 1])

show_single_feature_relevance(feature: str, subset_index: str | List[str] | List[int] | Index | None = None)#

Visualizes feature importance scores for a single feature across clusters using a bar plot.

Parameters:

feature – The name of a feature.
subset_index – Optional list of cluster indices to subset the data.

Example:

>>> # Visualize feature importance scores for a single feature across clusters
>>> cluster_explanation.show_single_feature_relevance("feature_A", [0, 1, 2])

show_single_cluster_relevance(cluster_index: int)#

Visualizes feature importance scores for a single cluster using a bar plot.

Parameters:: cluster_index – The index of the cluster.

Example:

>>> # Visualize feature importance scores for a single cluster
>>> cluster_explanation.show_single_cluster_relevance(2)

class cxplain.base_explainer.PointwiseExplainedClustering(pointwise_relevance: DataFrame)#

A class for representing pointwise clustering explanations.

Variables:: pointwise_relevance (pd.DataFrame) – A Pandas DataFrame containing pointwise feature relevance scores.

- __eq__(self, other) -> bool: Checks if two instances of PointwiseExplainedClustering are equal.

- show_pointwise_relevance(self, subset_index): Visualizes pointwise feature relevance using a heatmap.

- show_single_feature_relevance(self, feature: str, subset_index): Visualizes feature importance scores for a single feature across observations using a bar plot.

- show_single_observation_relevance(self, observation_index): Visualizes feature importance scores for a single observation using a bar plot.

Example:

>>> # Create a PointwiseExplainedClustering instance with pointwise feature relevance
>>> pointwise_relevance_data = pd.DataFrame({
...     'feature_A': [0.3, 0.5, 0.2],
...     'feature_B': [0.2, 0.4, 0.6],
...     'feature_C': [0.4, 0.2, 0.5]
... })
>>> pointwise_explanation = PointwiseExplainedClustering(pointwise_relevance_data)
>>> # Check if two instances of PointwiseExplainedClustering are equal
>>> another_pointwise_relevance_data = pd.DataFrame({
...     'feature_A': [0.4, 0.4, 0.2],
...     'feature_B': [0.3, 0.3, 0.5],
...     'feature_C': [0.2, 0.1, 0.6]
... })
>>> another_pointwise_explanation = PointwiseExplainedClustering(another_pointwise_relevance_data)
>>> pointwise_explanation == another_pointwise_explanation
... False
>>> # Visualize pointwise feature relevance
>>> pointwise_explanation.show_pointwise_relevance()
>>> # Visualize feature importance scores for a single feature across observations
>>> pointwise_explanation.show_single_feature_relevance("feature_A")
>>> # Visualize feature importance scores for a single observation
>>> pointwise_explanation.show_single_observation_relevance(2)

show_pointwise_relevance(subset_index: str | List[str] | List[int] | Index | None = None)#

Visualizes pointwise feature relevance using a heatmap.

Parameters:: subset_index – Optional list of observation indices to subset the data.

Example:

>>> # Visualize pointwise feature relevance
>>> pointwise_explanation.show_pointwise_relevance([0, 1, 2])

show_single_feature_relevance(feature: str, subset_index: str | List[str] | List[int] | Index | None = None)#

Visualizes feature importance scores for a single feature across observations using a bar plot.

Parameters:

feature – The name of the feature.
subset_index – Optional list of observation indices to subset the data.

Example:

>>> # Visualize feature importance scores for a single feature across observations
>>> pointwise_explanation.show_single_feature_relevance("FeatureA", [0, 1, 2])

show_single_observation_relevance(observation_index: int)#

Visualizes feature importance scores for a single observation using a bar plot.

This method generates a bar plot to visualize feature importance scores for a single observation.

Parameters:: observation_index – The index of the observation.

Example:

>>> # Visualize feature importance scores for a single observation
>>> pointwise_explanation.show_single_observation_relevance(2)

class cxplain.base_explainer.ExplainedClustering(global_relevance: Series, pointwise_relevance: DataFrame | None = None, cluster_relevance: DataFrame | None = None)#

This class is used to represent clustering explanations, including global, pointwise, and cluster feature relevance.

- __eq__(self, other) -> bool: Checks if two instances of ExplainedClustering are equal.

- pointwise_relevance(self) -> Optional[PointwiseExplainedClustering]: Returns the pointwise feature relevance, if available.

- cluster_relevance(self) -> Optional[ClusterExplainedClustering]: Returns the cluster feature relevance, if available.

- global_relevance(self) -> GlobalExplainedClustering: Returns the global feature relevance.

- pointwise_relevance_df(self) -> Optional[pd.DataFrame]: Returns the pointwise feature relevance as a DataFrame, if available.

- cluster_relevance_df(self) -> Optional[pd.DataFrame]: Returns the cluster feature relevance as a DataFrame, if available.

- global_relevance_df(self) -> pd.Series: Returns the global feature relevance as a Series.

- show_pointwise_relevance(self, subset_index): Visualizes pointwise feature relevance using a heatmap.

- show_pointwise_relevance_for_feature(self, feature: str, subset_index): Visualizes feature importance scores for a single feature across observations using a bar plot.

- show_pointwise_relevance_for_observation(self, observation_index): Visualizes feature importance scores for a single observation using a bar plot.

- show_cluster_relevance(self, subset_index): Visualizes cluster-wise feature relevance using a heatmap.

- show_cluster_relevance_for_feature(self, feature, subset_index): Visualizes feature importance scores for a single feature across clusters using a bar plot.

- show_cluster_relevance_for_cluster(self, cluster_index): Visualizes feature importance scores for a single cluster using a bar plot.

- show_global_relevance(self): Visualizes global feature relevance using a bar plot.

Example:

>>> # Create an ExplainedClustering instance with global and pointwise feature relevance
>>> global_relevance = pd.Series([0.3, 0.5, 0.2], index=["feature_A", "feature_B", "feature_C"])
>>> pointwise_relevance_data = pd.DataFrame({
...     'feature_A': [0.3, 0.5, 0.2],
...     'feature_B': [0.2, 0.4, 0.6],
...     'feature_C': [0.4, 0.2, 0.5]
... })
>>> explained_clustering = ExplainedClustering(global_relevance, pointwise_relevance_data)
>>> # Check if two instances of ExplainedClustering are equal
>>> another_global_relevance = pd.Series([0.4, 0.5, 0.2], index=["feature_A", "feature_B", "feature_C"])
>>> another_pointwise_relevance_data = pd.DataFrame({
...     'feature_A': [0.4, 0.4, 0.2],
...     'feature_B': [0.3, 0.3, 0.5],
...     'feature_C': [0.2, 0.1, 0.6]
... })
>>> another_explained_clustering = ExplainedClustering(another_global_relevance, another_pointwise_relevance_data)
>>> explained_clustering == another_explained_clustering
... False
>>> # Visualize pointwise feature relevance
>>> explained_clustering.show_pointwise_relevance()
>>> # Visualize feature importance scores for a single feature across observations
>>> explained_clustering.show_pointwise_relevance_for_feature("feature_A")
>>> # Visualize feature importance scores for a single observation
>>> explained_clustering.show_pointwise_relevance_for_observation(2)

property pointwise_relevance: PointwiseExplainedClustering | None#: Returns PointwiseExplainedClustering if it exists.

property cluster_relevance: ClusterExplainedClustering | None#: Returns ClusterExplainedClustering if it exists.

property global_relevance: GlobalExplainedClustering#: Returns GlobalExplainedClustering.

property pointwise_relevance_df: DataFrame | None#: Returns a dataframe containing the pointwise feature importances if they exist.

property cluster_relevance_df: DataFrame | None#: Returns a dataframe containing the cluster-wise feature importances if they exist.

property global_relevance_df: Series#: Returns a dataframe containing the global feature importances.

static _check_relevance_exists(explained_clustering: PointwiseExplainedClustering | ClusterExplainedClustering | None = None)#

Check whether the provided clustering expalantion exists

Parameters:: explained_clustering – Explained clustering object, which should be checked to exist.
Raises:: NonExistingRelevanceError – Raised if provided explained clustering does not exist.

show_pointwise_relevance(subset_index: str | List[str] | List[int] | Index | None = None)#

Visualizes pointwise feature relevance using a heatmap.

Parameters:: subset_index – Optional list of observation indices to subset the data.

Example:

>>> # Visualize pointwise feature relevance
>>> explained_clustering.show_pointwise_relevance([0, 1, 2])

show_pointwise_relevance_for_feature(feature: str, subset_index: str | List[str] | List[int] | Index | None = None)#

Visualizes feature importance scores for a single feature across observations using a bar plot.

Parameters:

feature – The name of the feature.
subset_index – Optional list of observation indices to subset the data.

Example:

>>> # Visualize feature importance scores for a single feature across observations
>>> explained_clustering.show_pointwise_relevance_for_feature("FeatureA", [0, 1, 2])

show_pointwise_relevance_for_observation(observation_index: int)#

Visualizes feature importance scores for a single observation using a bar plot.

Parameters:: observation_index – The index of the observation.

Example:

>>> # Visualize feature importance scores for a single observation
>>> explained_clustering.show_pointwise_relevance_for_observation(2)

show_cluster_relevance(subset_index: str | List[str] | List[int] | Index | None = None)#

Visualizes cluster-wise feature relevance using a heatmap.

Parameters:: subset_index – Optional list of cluster indices to subset the data.

Example:

>>> # Visualize cluster-wise feature relevance
>>> explained_clustering.show_cluster_relevance([0, 1, 2])

show_cluster_relevance_for_feature(feature: str, subset_index: str | List[str] | List[int] | Index | None = None)#

Visualizes feature importance scores for a single feature across clusters using a bar plot.

Parameters:

feature – The name of the feature.
subset_index – Optional list of cluster indices to subset the data.

Example:

>>> # Visualize feature importance scores for a single feature across clusters
>>> explained_clustering.show_cluster_relevance_for_feature("FeatureA", [0, 1, 2])

show_cluster_relevance_for_cluster(cluster_index: int)#

Visualizes feature importance scores for a single cluster using a bar plot.

Parameters:: cluster_index – The index of the cluster.

Example:

>>> # Visualize feature importance scores for a single cluster
>>> explained_clustering.show_cluster_relevance_for_cluster(2)

show_global_relevance()#

Visualizes global feature relevance using a bar plot.

Example:

>>> # Visualize global feature relevance
>>> explained_clustering.show_global_relevance()

class cxplain.base_explainer.BaseExplainer#

This is the base class for all cluster explainers, providing a common interface and functionality for clustering explanation.

- fit(self): Abstract method for fitting the explainer. Subclasses must implement this method.

- explain(self): Abstract method for generating cluster explanations. Subclasses must implement this method.

- fit_explain(self): Convenience method that fits the explainer and immediately generates explanations.

Variables:: is_fitted (-) – Indicates whether the explainer has been fitted.

static _rename_feature_columns(df: DataFrame, num_features: int, feature_names: List[str] | None = None) → DataFrame#

This method renames the feature columns in a DataFrame, providing more informative names when feature names are provided.

If no feature names are provided every column is renamed to ‘R<column number>’.

Parameters:

df – The DataFrame to rename columns.
num_features – The number of feature columns.
feature_names – A list of feature names (if provided).

Returns:

The DataFrame with renamed columns.

Return type:

pd.DataFrame

Raises:

InconsistentNamingError – If the number of provided feature names does not match the number of features.

Example:

>>> df = pd.DataFrame({0: [1, 2, 3], 1: [4, 5, 6]})
>>> num_features = 2
>>> feature_names = ["feature_A", "feature_B"]
>>> renamed_df = BaseExplainer._rename_feature_columns(df, num_features, feature_names)
>>> renamed_df
...      feature_A  feature_B
... 0        1         4
... 1        2         5
... 2        3         6