scikit-learn compatible alternative random forests algorithms

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- POSIX
- Unix
Programming Language
Topic
- Software Development :: Libraries :: Python Modules

Project description

PyPI - Python Version PyPI - Wheel

WildWood

Scikit-Learn compatible Random Forest algorithms

Documentation | Reproduce experiments |

Installation

The easiest way to install wildwood is using pip

pip install wildwood

But you can also use the latest development from github directly with

pip install git+https://github.com/pyensemble/wildwood.git

Experiments

Experiments with hyperparameters optimization

To run experiments with hyperparameters optimization, under directory experiments/, use

python run_hyperopt_classfiers.py --clf_name WildWood --dataset_name adult

(with WildWood and on adult dataset in this example).

Some options are

Setting --n_estimators or -t for number of estimators (for maximal number of boosting iterations in case of gradient boosting algorithms), default 100.
Setting --hyperopt_evals or -n for number of hyperopt steps, default 50.

Experiments on default parameters

To run experiments with default parameters, under directory experiments/, use

python run_benchmark_default_params_classifiers.py --clf_name WildWood --dataset_name adult

(with WildWood and on adult dataset in this example).

Datasets and classifiers

For both run_hyperopt_classfiers.py and run_benchmark_default_params_classifiers. py, the available options for dataset_name are:

adult
bank
breastcancer
car
cardio
churn
default-cb
letter
satimage
sensorless
spambase
amazon
covtype
internet
kick
kddcup
higgs

while the available options for clf_name are

LGBMClassifier
XGBClassifier
CatBoostClassifier
RandomForestClassifier
HistGradientBoostingClassifier
WildWood

Experiments presented in the paper

All the scripts allowing to reproduce the experiments from the paper are available in the experiments/ folder

Figure 1 is produced using fig_aggregation_effect.py.
Figure 2 is produced using n_tree_experiment.py.
Tables 1 and 3 from the paper are produced using run_hyperopt_classfiers.py with n_estimators=5000 for gradient boosting algorithms and with n_estimators=n for RFn and WWn
- call
```
python run_hyperopt_classfiers.py --clf_name <classifier> --dataset_name <dataset> --n_estimators <n_estimators>
```
for each pair (<classifier>, <dataset>) to run hyperparameters optimization experiments;
- use for example
```
import pickle as pkl
filename = 'exp_hyperopt_xxx.pickle'
with open(filename, "rb") as f:
    results = pkl.load(f)
df = results["results"]
```
to retrieve experiments information, such as AUC, logloss and their standard deviation.
Tables 2 and 4 are produced using benchmark_default_params.py
- call
```
python run_benchmark_default_params_classifiers.py --clf_name <classifier> --dataset_name <dataset>
```
for each pair (<classifier>, <dataset>) to run experiments with default parameters;
- use similar commands to retrieve experiments information.
Using experiments results (AUC and fit time) done by run_hyperopt_classfiers.py, then concatenating dataframes and using fig_auc_fit_time.py to produce Figure 3.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- POSIX
- Unix
Programming Language
Topic
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

0.3

Nov 15, 2023

This version

0.2

Sep 24, 2021

0.1

Dec 2, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wildwood-0.2.tar.gz (21.8 MB view hashes)

Uploaded Sep 24, 2021 Source

Built Distribution

wildwood-0.2-py3-none-any.whl (21.8 MB view hashes)

Uploaded Sep 24, 2021 Python 3

Hashes for wildwood-0.2.tar.gz

Hashes for wildwood-0.2.tar.gz
Algorithm	Hash digest
SHA256	`662d849fad90d89496e0be9fbb0e203d451fc2eb9ea533516d3ba6a9d7131a9b`
MD5	`0e8a40af5360b9e349da97cabf70ecfa`
BLAKE2b-256	`64bcbebda0fe51a79436377f628c8d997dc7327d0690d990c13f784f4651818d`

Hashes for wildwood-0.2-py3-none-any.whl

Hashes for wildwood-0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0a9dbedb84706f8550716d4f0fd40d71fd756bcb49c4fa2083143369a1446bfc`
MD5	`14107267b3331c440f55eea071ef5292`
BLAKE2b-256	`4c1f665d32ac0e08d4cf67de24abf447a5091ca1aa875666e6615dbdb8431982`