Catboost for tabular data

Catboost is an open-source ML Gradient Boosted Decision Trees algorithm, it's name come from the terms “Category” and “Boosting.” It was developed by Yandex (Russian Google ) in 2017

Key attributes of Catboost:

  • ranking objective function
  • native categorical features preprocessing
  • model analysis
  • fastest prediction time
    • 30-60x faster as documented by real companies
    • on GPUs it is 50-100x times faster than XGBoost.
  • performs remarkably well with default parameters, significantly improving performance when tuned
  • utilising ideas such as Ordered Target Statistics from online learning, CatBoost considers datasets sequential in time and permutes them
    • By creating the concept of artificial time 🕰️ CatBoost cleverly reduces Prediction Shift, inherent in the traditional Gradient Boosting models such as XGBoost and LightGBM.
  • 8X faster inference than XGBoost
    • build better trees 🌲 that result in better regularisation and speed, especially during inference

References

Resources

Catboost

GBDT

11/2/2023