Distribution scoring metrics
- Predictive Power Score (PPS)
- The improvement in an evaluation metric from a naive model to a model based on the selected feature, relative to the potential improvement achieved from a perfect feature
For classifications, the naive model is the mode and the metric is F1 Score
For regressions, the naive model is the median and the metric is Mean Absolute Error
e.g. a model that uses the modal classes achieves an F1 score of 0.6 and a model using the selected feature achieves an F1 score of 0.9, the PPS score is 0.75
- Categorical skew (categorical features)
The proportion of data points with the modal value
e.g. a dataset of 6 class A, 3 class B and one class C has 60% of data points as the modal value
- Inter-quartile skew (numerical features)
The difference between the median and the midpoint of the upper and lower quantiles, as a proportion of half the inter-quartile range
e.g. if the lower and upper quartile were 1 and 3 and the median was 1.5, the inter-quartile skew would be 0.5