datavizml package
Submodules
datavizml.exploratorydataanalysis module
- class datavizml.exploratorydataanalysis.ExploratoryDataAnalysis(data: Any, ncols: int, data_deskew: Union[bool, list, str] = False, target: Optional[Any] = None, target_rebalance: bool = False, metric: str = 'prop', prediction_matrix_full: bool = False, figure_width: Union[int, float] = 18, axes_height: Union[int, float] = 3)[source]
Bases:
objectA graphical summary of all given features and their relationship to a target
- Parameters:
data (pandas Series of pandas DataFrame) – Features to be analysed
ncols (float, optional) – Number of columns to use in figure
data_deskew (bool for all features or string or list of string for selective features, optional) – Reduce data skew, trialling: squaring, rooting, logging, exponents and Yeo-Johnson
target (pandas Series, optional) – Target to be predicted
target_rebalance (bool, optional) – Rebalance target
metric (string, optional) – Metric used for prevalence, “count” or “prop” (default)
prediction_matrix_full (bool, optional) – Full or reduced prediction matrix
figure_width (int, optional) – Width of figure
axes_height (int, optional) – Height of axes
- property data: Union[Series, DataFrame]
The feature data
- property prediction_matrix: DataFrame
The prediction matrix data
- prediction_score_plot(ax: Axes) Axes[source]
Plot the prediction scores as a heatmap
- Parameters:
ax (matplotlib Axes) – Axes to plot on
- Returns:
The heatmap plot
- Return type:
matplotlib Axes
- summary() DataFrame[source]
Summarise analysis
- Returns:
A dataframe summarising each of the features and their relationship to the target
- Return type:
pd.DataFrame
- property target: Series
The target data
datavizml.singledistribution module
- class datavizml.singledistribution.SingleDistribution(feature: Any, ax: Any, feature_deskew: bool = False, target: Optional[Any] = None, target_score: Optional[float] = None, target_rebalance: bool = False, binning_threshold: Optional[int] = None, metric: str = 'prop')[source]
Bases:
objectA graphical summary of a given feature and its relationship to a target
- Parameters:
feature (pandas Series) – Feature to be analysed
ax (matplotlib Axes) – Axes to plot on
feature_deskew (bool, optional) – reduce feature skew, trialling: squaring, rooting, logging, exponents and Yeo-Johnson
target (pandas Series, optional) – Target to be predicted
target_score (float, optional) – Precomputed score to avoid recalculation
target_rebalance (bool, optional) – reduce class imbalance in target score
binning_threshold (int, optional) – Maximum number of distinct values in the column before binning, defaults to 12
metric (string, optional) – Metric used for prevalence, “count” or “prop” (default)
- calculate_target_score() None[source]
Calculate the score for the feature based on its predictive power
- property feature: Series
The feature data
- summarise_feature() None[source]
Summarise the feature by calculating summary statistics for each distinct value and binning if there are too many distinct values
- property target: Series
The target data