Explanation Results#
- class cxplain.base_explainer.GlobalExplainedClustering(global_relevance: Series)#
A class for representing global clustering explanations.
- Variables:
global_relevance (pd.Series) – A Pandas Series containing global feature relevance scores.
- - __eq__(self, other) -> bool
Checks if two instances of GlobalExplainedClustering are equal.
- - show_global_relevance(self)
Visualizes global feature relevance using a bar plot.
Example:
>>> import pandas as pd >>> # Create a GlobalExplainedClustering instance with global feature relevance >>> global_relevance = pd.Series([0.3, 0.5, 0.2], index=["feature_A", "feature_B", "feature_C"]) >>> global_explanation = GlobalExplainedClustering(global_relevance) >>> # Check if two instances of GlobalExplainedClustering are equal >>> another_global_relevance = pd.Series([0.5, 0.5, 0.2], index=["feature_A", "feature_B", "feature_C"]) >>> another_global_explanation = GlobalExplainedClustering(another_global_relevance) >>> global_explanation == another_global_explanation ... False >>> # Visualize global feature relevance >>> global_explanation.show_global_relevance()
- show_global_relevance()#
Visualizes global feature relevance using a bar plot.
Example:
>>> # Create a GlobalExplainedClustering instance with global feature relevance >>> global_relevance = pd.Series([0.3, 0.5, 0.2], index=["A", "B", "C"]) >>> global_explanation = GlobalExplainedClustering(global_relevance) >>> # Visualize global feature relevance >>> global_explanation.show_global_relevance()
- class cxplain.base_explainer.ClusterExplainedClustering(cluster_relevance: DataFrame)#
A class for representing cluster-specific clustering explanations.
- Variables:
cluster_relevance (pd.DataFrame) – A Pandas DataFrame containing cluster feature relevance scores.
- - __eq__(self, other) -> bool
Checks if two instances of ClusterExplainedClustering are equal.
- - show_cluster_relevance(self, subset_index)
Visualizes cluster-wise feature relevance using a heatmap.
- - show_single_feature_relevance(self, feature, subset_index)
Visualizes feature importance scores for a single feature across clusters using a bar plot.
- - show_single_cluster_relevance(self, cluster_index)
Visualizes feature importance scores for a single cluster using a bar plot.
Example:
>>> # Create a ClusterExplainedClustering instance with cluster feature relevance >>> cluster_relevance_data = pd.DataFrame({ ... 'feature_A': [0.3, 0.5, 0.2], ... 'feature_B': [0.2, 0.4, 0.6], ... 'feature_C': [0.4, 0.2, 0.5], ... 'assigned_clusters': [0, 1, 2] ... }) >>> cluster_relevance_data.set_index('assigned_clusters', inplace=True) >>> cluster_explanation = ClusterExplainedClustering(cluster_relevance_data) >>> # Check if two instances of ClusterExplainedClustering are equal >>> another_cluster_relevance_data = pd.DataFrame({ ... 'feature_A': [0.4, 0.4, 0.2], ... 'feature_B': [0.3, 0.3, 0.5], ... 'feature_C': [0.2, 0.1, 0.6], ... 'assigned_clusters': [0, 1, 2] ... }) >>> another_cluster_relevance_data.set_index('assigned_clusters', inplace=True) >>> another_cluster_explanation = ClusterExplainedClustering(another_cluster_relevance_data) >>> cluster_explanation == another_cluster_explanation ... False >>> # Visualize cluster-wise feature relevance >>> cluster_explanation.show_cluster_relevance() >>> # Visualize feature importance scores for a single feature across clusters >>> cluster_explanation.show_single_feature_relevance("feature_A") >>> # Visualize feature importance scores for a single cluster >>> cluster_explanation.show_single_cluster_relevance(2)
- show_cluster_relevance(subset_index: str | List[str] | List[int] | Index | None = None)#
Visualizes cluster-wise feature relevance using a heatmap.
- Parameters:
subset_index – Optional list of cluster indices to subset the data.
Example:
>>> # Visualize cluster-wise feature relevance >>> cluster_explanation.show_cluster_relevance([0, 1])
- show_single_feature_relevance(feature: str, subset_index: str | List[str] | List[int] | Index | None = None)#
Visualizes feature importance scores for a single feature across clusters using a bar plot.
- Parameters:
feature – The name of a feature.
subset_index – Optional list of cluster indices to subset the data.
Example:
>>> # Visualize feature importance scores for a single feature across clusters >>> cluster_explanation.show_single_feature_relevance("feature_A", [0, 1, 2])
- show_single_cluster_relevance(cluster_index: int)#
Visualizes feature importance scores for a single cluster using a bar plot.
- Parameters:
cluster_index – The index of the cluster.
Example:
>>> # Visualize feature importance scores for a single cluster >>> cluster_explanation.show_single_cluster_relevance(2)
- class cxplain.base_explainer.PointwiseExplainedClustering(pointwise_relevance: DataFrame)#
A class for representing pointwise clustering explanations.
- Variables:
pointwise_relevance (pd.DataFrame) – A Pandas DataFrame containing pointwise feature relevance scores.
- - __eq__(self, other) -> bool
Checks if two instances of PointwiseExplainedClustering are equal.
- - show_pointwise_relevance(self, subset_index)
Visualizes pointwise feature relevance using a heatmap.
- - show_single_feature_relevance(self, feature
str, subset_index): Visualizes feature importance scores for a single feature across observations using a bar plot.
- - show_single_observation_relevance(self, observation_index)
Visualizes feature importance scores for a single observation using a bar plot.
Example:
>>> # Create a PointwiseExplainedClustering instance with pointwise feature relevance >>> pointwise_relevance_data = pd.DataFrame({ ... 'feature_A': [0.3, 0.5, 0.2], ... 'feature_B': [0.2, 0.4, 0.6], ... 'feature_C': [0.4, 0.2, 0.5] ... }) >>> pointwise_explanation = PointwiseExplainedClustering(pointwise_relevance_data) >>> # Check if two instances of PointwiseExplainedClustering are equal >>> another_pointwise_relevance_data = pd.DataFrame({ ... 'feature_A': [0.4, 0.4, 0.2], ... 'feature_B': [0.3, 0.3, 0.5], ... 'feature_C': [0.2, 0.1, 0.6] ... }) >>> another_pointwise_explanation = PointwiseExplainedClustering(another_pointwise_relevance_data) >>> pointwise_explanation == another_pointwise_explanation ... False >>> # Visualize pointwise feature relevance >>> pointwise_explanation.show_pointwise_relevance() >>> # Visualize feature importance scores for a single feature across observations >>> pointwise_explanation.show_single_feature_relevance("feature_A") >>> # Visualize feature importance scores for a single observation >>> pointwise_explanation.show_single_observation_relevance(2)
- show_pointwise_relevance(subset_index: str | List[str] | List[int] | Index | None = None)#
Visualizes pointwise feature relevance using a heatmap.
- Parameters:
subset_index – Optional list of observation indices to subset the data.
Example:
>>> # Visualize pointwise feature relevance >>> pointwise_explanation.show_pointwise_relevance([0, 1, 2])
- show_single_feature_relevance(feature: str, subset_index: str | List[str] | List[int] | Index | None = None)#
Visualizes feature importance scores for a single feature across observations using a bar plot.
- Parameters:
feature – The name of the feature.
subset_index – Optional list of observation indices to subset the data.
Example:
>>> # Visualize feature importance scores for a single feature across observations >>> pointwise_explanation.show_single_feature_relevance("FeatureA", [0, 1, 2])
- show_single_observation_relevance(observation_index: int)#
Visualizes feature importance scores for a single observation using a bar plot.
This method generates a bar plot to visualize feature importance scores for a single observation.
- Parameters:
observation_index – The index of the observation.
Example:
>>> # Visualize feature importance scores for a single observation >>> pointwise_explanation.show_single_observation_relevance(2)
- class cxplain.base_explainer.ExplainedClustering(global_relevance: Series, pointwise_relevance: DataFrame | None = None, cluster_relevance: DataFrame | None = None)#
This class is used to represent clustering explanations, including global, pointwise, and cluster feature relevance.
- - __eq__(self, other) -> bool
Checks if two instances of ExplainedClustering are equal.
- - pointwise_relevance(self) -> Optional[PointwiseExplainedClustering]
Returns the pointwise feature relevance, if available.
- - cluster_relevance(self) -> Optional[ClusterExplainedClustering]
Returns the cluster feature relevance, if available.
- - global_relevance(self) -> GlobalExplainedClustering
Returns the global feature relevance.
- - pointwise_relevance_df(self) -> Optional[pd.DataFrame]
Returns the pointwise feature relevance as a DataFrame, if available.
- - cluster_relevance_df(self) -> Optional[pd.DataFrame]
Returns the cluster feature relevance as a DataFrame, if available.
- - global_relevance_df(self) -> pd.Series
Returns the global feature relevance as a Series.
- - show_pointwise_relevance(self, subset_index)
Visualizes pointwise feature relevance using a heatmap.
- - show_pointwise_relevance_for_feature(self, feature
str, subset_index): Visualizes feature importance scores for a single feature across observations using a bar plot.
- - show_pointwise_relevance_for_observation(self, observation_index)
Visualizes feature importance scores for a single observation using a bar plot.
- - show_cluster_relevance(self, subset_index)
Visualizes cluster-wise feature relevance using a heatmap.
- - show_cluster_relevance_for_feature(self, feature, subset_index)
Visualizes feature importance scores for a single feature across clusters using a bar plot.
- - show_cluster_relevance_for_cluster(self, cluster_index)
Visualizes feature importance scores for a single cluster using a bar plot.
- - show_global_relevance(self)
Visualizes global feature relevance using a bar plot.
Example:
>>> # Create an ExplainedClustering instance with global and pointwise feature relevance >>> global_relevance = pd.Series([0.3, 0.5, 0.2], index=["feature_A", "feature_B", "feature_C"]) >>> pointwise_relevance_data = pd.DataFrame({ ... 'feature_A': [0.3, 0.5, 0.2], ... 'feature_B': [0.2, 0.4, 0.6], ... 'feature_C': [0.4, 0.2, 0.5] ... }) >>> explained_clustering = ExplainedClustering(global_relevance, pointwise_relevance_data) >>> # Check if two instances of ExplainedClustering are equal >>> another_global_relevance = pd.Series([0.4, 0.5, 0.2], index=["feature_A", "feature_B", "feature_C"]) >>> another_pointwise_relevance_data = pd.DataFrame({ ... 'feature_A': [0.4, 0.4, 0.2], ... 'feature_B': [0.3, 0.3, 0.5], ... 'feature_C': [0.2, 0.1, 0.6] ... }) >>> another_explained_clustering = ExplainedClustering(another_global_relevance, another_pointwise_relevance_data) >>> explained_clustering == another_explained_clustering ... False >>> # Visualize pointwise feature relevance >>> explained_clustering.show_pointwise_relevance() >>> # Visualize feature importance scores for a single feature across observations >>> explained_clustering.show_pointwise_relevance_for_feature("feature_A") >>> # Visualize feature importance scores for a single observation >>> explained_clustering.show_pointwise_relevance_for_observation(2)
- property pointwise_relevance: PointwiseExplainedClustering | None#
Returns PointwiseExplainedClustering if it exists.
- property cluster_relevance: ClusterExplainedClustering | None#
Returns ClusterExplainedClustering if it exists.
- property global_relevance: GlobalExplainedClustering#
Returns GlobalExplainedClustering.
- property pointwise_relevance_df: DataFrame | None#
Returns a dataframe containing the pointwise feature importances if they exist.
- property cluster_relevance_df: DataFrame | None#
Returns a dataframe containing the cluster-wise feature importances if they exist.
- property global_relevance_df: Series#
Returns a dataframe containing the global feature importances.
- static _check_relevance_exists(explained_clustering: PointwiseExplainedClustering | ClusterExplainedClustering | None = None)#
Check whether the provided clustering expalantion exists
- Parameters:
explained_clustering – Explained clustering object, which should be checked to exist.
- Raises:
NonExistingRelevanceError – Raised if provided explained clustering does not exist.
- show_pointwise_relevance(subset_index: str | List[str] | List[int] | Index | None = None)#
Visualizes pointwise feature relevance using a heatmap.
- Parameters:
subset_index – Optional list of observation indices to subset the data.
Example:
>>> # Visualize pointwise feature relevance >>> explained_clustering.show_pointwise_relevance([0, 1, 2])
- show_pointwise_relevance_for_feature(feature: str, subset_index: str | List[str] | List[int] | Index | None = None)#
Visualizes feature importance scores for a single feature across observations using a bar plot.
- Parameters:
feature – The name of the feature.
subset_index – Optional list of observation indices to subset the data.
Example:
>>> # Visualize feature importance scores for a single feature across observations >>> explained_clustering.show_pointwise_relevance_for_feature("FeatureA", [0, 1, 2])
- show_pointwise_relevance_for_observation(observation_index: int)#
Visualizes feature importance scores for a single observation using a bar plot.
- Parameters:
observation_index – The index of the observation.
Example:
>>> # Visualize feature importance scores for a single observation >>> explained_clustering.show_pointwise_relevance_for_observation(2)
- show_cluster_relevance(subset_index: str | List[str] | List[int] | Index | None = None)#
Visualizes cluster-wise feature relevance using a heatmap.
- Parameters:
subset_index – Optional list of cluster indices to subset the data.
Example:
>>> # Visualize cluster-wise feature relevance >>> explained_clustering.show_cluster_relevance([0, 1, 2])
- show_cluster_relevance_for_feature(feature: str, subset_index: str | List[str] | List[int] | Index | None = None)#
Visualizes feature importance scores for a single feature across clusters using a bar plot.
- Parameters:
feature – The name of the feature.
subset_index – Optional list of cluster indices to subset the data.
Example:
>>> # Visualize feature importance scores for a single feature across clusters >>> explained_clustering.show_cluster_relevance_for_feature("FeatureA", [0, 1, 2])
- show_cluster_relevance_for_cluster(cluster_index: int)#
Visualizes feature importance scores for a single cluster using a bar plot.
- Parameters:
cluster_index – The index of the cluster.
Example:
>>> # Visualize feature importance scores for a single cluster >>> explained_clustering.show_cluster_relevance_for_cluster(2)
- show_global_relevance()#
Visualizes global feature relevance using a bar plot.
Example:
>>> # Visualize global feature relevance >>> explained_clustering.show_global_relevance()
- class cxplain.base_explainer.BaseExplainer#
This is the base class for all cluster explainers, providing a common interface and functionality for clustering explanation.
- - fit(self)
Abstract method for fitting the explainer. Subclasses must implement this method.
- - explain(self)
Abstract method for generating cluster explanations. Subclasses must implement this method.
- - fit_explain(self)
Convenience method that fits the explainer and immediately generates explanations.
- Variables:
is_fitted (-) – Indicates whether the explainer has been fitted.
- static _rename_feature_columns(df: DataFrame, num_features: int, feature_names: List[str] | None = None) DataFrame#
This method renames the feature columns in a DataFrame, providing more informative names when feature names are provided.
If no feature names are provided every column is renamed to ‘R<column number>’.
- Parameters:
df – The DataFrame to rename columns.
num_features – The number of feature columns.
feature_names – A list of feature names (if provided).
- Returns:
The DataFrame with renamed columns.
- Return type:
pd.DataFrame
- Raises:
InconsistentNamingError – If the number of provided feature names does not match the number of features.
Example:
>>> df = pd.DataFrame({0: [1, 2, 3], 1: [4, 5, 6]}) >>> num_features = 2 >>> feature_names = ["feature_A", "feature_B"] >>> renamed_df = BaseExplainer._rename_feature_columns(df, num_features, feature_names) >>> renamed_df ... feature_A feature_B ... 0 1 4 ... 1 2 5 ... 2 3 6