datavizml package

Submodules

datavizml.exploratorydataanalysis module

class datavizml.exploratorydataanalysis.ExploratoryDataAnalysis(data: Any, ncols: int, data_deskew: Union[bool, list, str] = False, target: Optional[Any] = None, target_rebalance: bool = False, metric: str = 'prop', prediction_matrix_full: bool = False, figure_width: Union[int, float] = 18, axes_height: Union[int, float] = 3)[source]

Bases: object

A graphical summary of all given features and their relationship to a target

Parameters:
  • data (pandas Series of pandas DataFrame) – Features to be analysed

  • ncols (float, optional) – Number of columns to use in figure

  • data_deskew (bool for all features or string or list of string for selective features, optional) – Reduce data skew, trialling: squaring, rooting, logging, exponents and Yeo-Johnson

  • target (pandas Series, optional) – Target to be predicted

  • target_rebalance (bool, optional) – Rebalance target

  • metric (string, optional) – Metric used for prevalence, “count” or “prop” (default)

  • prediction_matrix_full (bool, optional) – Full or reduced prediction matrix

  • figure_width (int, optional) – Width of figure

  • axes_height (int, optional) – Height of axes

property data: Union[Series, DataFrame]

The feature data

property prediction_matrix: DataFrame

The prediction matrix data

prediction_score_plot(ax: Axes) Axes[source]

Plot the prediction scores as a heatmap

Parameters:

ax (matplotlib Axes) – Axes to plot on

Returns:

The heatmap plot

Return type:

matplotlib Axes

summary() DataFrame[source]

Summarise analysis

Returns:

A dataframe summarising each of the features and their relationship to the target

Return type:

pd.DataFrame

property target: Series

The target data

datavizml.singledistribution module

class datavizml.singledistribution.SingleDistribution(feature: Any, ax: Any, feature_deskew: bool = False, target: Optional[Any] = None, target_score: Optional[float] = None, target_rebalance: bool = False, binning_threshold: Optional[int] = None, metric: str = 'prop')[source]

Bases: object

A graphical summary of a given feature and its relationship to a target

Parameters:
  • feature (pandas Series) – Feature to be analysed

  • ax (matplotlib Axes) – Axes to plot on

  • feature_deskew (bool, optional) – reduce feature skew, trialling: squaring, rooting, logging, exponents and Yeo-Johnson

  • target (pandas Series, optional) – Target to be predicted

  • target_score (float, optional) – Precomputed score to avoid recalculation

  • target_rebalance (bool, optional) – reduce class imbalance in target score

  • binning_threshold (int, optional) – Maximum number of distinct values in the column before binning, defaults to 12

  • metric (string, optional) – Metric used for prevalence, “count” or “prop” (default)

calculate_feature_score() None[source]

Calculate the score for the feature based on its skewness

calculate_target_score() None[source]

Calculate the score for the feature based on its predictive power

property feature: Series

The feature data

summarise_feature() None[source]

Summarise the feature by calculating summary statistics for each distinct value and binning if there are too many distinct values

property target: Series

The target data

to_dict() dict[source]

Summarise as a dictionary

Module contents