Skip to main content

scikit-learn compatible alternative random forests algorithms

Project description

Build Status Documentation Status PyPI - Python Version PyPI - Wheel GitHub stars GitHub issues GitHub license Coverage Status

WildWood

Scikit-Learn compatible Random Forest algorithms

Documentation | Reproduce experiments |

Installation

The easiest way to install wildwood is using pip

pip install wildwood

But you can also use the latest development from github directly with

pip install git+https://github.com/pyensemble/wildwood.git

Experiments

Experiments with hyperparameters optimization

To run experiments with hyperparameters optimization, under directory experiments/, use

python run_hyperopt_classfiers.py --clf_name WildWood --dataset_name adult

(with WildWood and on adult dataset in this example).

Some options are

  • Setting --n_estimators or -t for number of estimators (for maximal number of boosting iterations in case of gradient boosting algorithms), default 100.
  • Setting --hyperopt_evals or -n for number of hyperopt steps, default 50.

Experiments on default parameters

To run experiments with default parameters, under directory experiments/, use

python run_benchmark_default_params_classifiers.py --clf_name WildWood --dataset_name adult

(with WildWood and on adult dataset in this example).

Datasets and classifiers

For both run_hyperopt_classfiers.py and run_benchmark_default_params_classifiers. py, the available options for dataset_name are:

  • adult
  • bank
  • breastcancer
  • car
  • cardio
  • churn
  • default-cb
  • letter
  • satimage
  • sensorless
  • spambase
  • amazon
  • covtype
  • internet
  • kick
  • kddcup
  • higgs

while the available options for clf_name are

  • LGBMClassifier
  • XGBClassifier
  • CatBoostClassifier
  • RandomForestClassifier
  • HistGradientBoostingClassifier
  • WildWood

Experiments presented in the paper

All the scripts allowing to reproduce the experiments from the paper are available in the experiments/ folder

  1. Figure 1 is produced using fig_aggregation_effect.py.

  2. Figure 2 is produced using n_tree_experiment.py.

  3. Tables 1 and 3 from the paper are produced using run_hyperopt_classfiers.py with n_estimators=5000 for gradient boosting algorithms and with n_estimators=n for RFn and WWn

    • call
    python run_hyperopt_classfiers.py --clf_name <classifier> --dataset_name <dataset> --n_estimators <n_estimators>
    

    for each pair (<classifier>, <dataset>) to run hyperparameters optimization experiments;

    • use for example
    import pickle as pkl
    filename = 'exp_hyperopt_xxx.pickle'
    with open(filename, "rb") as f:
        results = pkl.load(f)
    df = results["results"]
    

    to retrieve experiments information, such as AUC, logloss and their standard deviation.

  4. Tables 2 and 4 are produced using benchmark_default_params.py

    • call
    python run_benchmark_default_params_classifiers.py --clf_name <classifier> --dataset_name <dataset>
    

    for each pair (<classifier>, <dataset>) to run experiments with default parameters;

    • use similar commands to retrieve experiments information.
  5. Using experiments results (AUC and fit time) done by run_hyperopt_classfiers.py, then concatenating dataframes and using fig_auc_fit_time.py to produce Figure 3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wildwood-0.2.tar.gz (21.8 MB view hashes)

Uploaded Source

Built Distribution

wildwood-0.2-py3-none-any.whl (21.8 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page