This project is about use Random Forest approach using a dynamic tree selection Monte Carlo based.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Random Forest with Dynamic Tree Selection Monte Carlo Based (RF-TSMC)

This project is about use Random Forest approach for multiclass classification using a dynamic tree selection Monte Carlo based. The first implementation is found in [2] (using Common Lisp).

Development status: `Release Candidate`.

Install:

Install using pip:

$ pip3 install random-forest-mc

Install from this repo:

$ git clone https://github.com/ysraell/random-forest-mc.git
$ cd random-forest-mc
$ pip3 install .

Usage:

Example of a full cycle using titanic.csv:

import numpy as np
import pandas as pd

from random_forest_mc.model import RandomForestMC
from random_forest_mc.utils import LoadDicts

dicts = LoadDicts("tests/")
dataset_dict = dicts.datasets_metadata
ds_name = "titanic"
params = dataset_dict[ds_name]
dataset = (
    pd.read_csv(params["csv_path"])[params["ds_cols"] + [params["target_col"]]]
    .dropna()
    .reset_index(drop=True)
)
dataset["Age"] = dataset["Age"].astype(np.uint8)
dataset["SibSp"] = dataset["SibSp"].astype(np.uint8)
dataset["Pclass"] = dataset["Pclass"].astype(str)
dataset["Fare"] = dataset["Fare"].astype(np.uint32)
cls = RandomForestMC(
    n_trees=8, target_col=params["target_col"], max_discard_trees=4
)
cls.process_dataset(dataset)
cls.fit()
y_test = dataset[params["target_col"]].to_list()
y_pred = cls.testForest(dataset)
accuracy_hard = sum([v == p for v, p in zip(y_test, y_pred)]) / len(y_pred)
y_pred = cls.testForest(dataset, soft_voting=True)
accuracy_soft = sum([v == p for v, p in zip(y_test, y_pred)]) / len(y_pred)

LoadDicts:

LoadDicts works loading all JSON files inside a given path, creating an object helper to use this files as dictionaries.

For example:

>>> from random_forest_mc.utils import LoadDicts
>>> # JSONs: path/data.json, path/metdada.json
>>> dicts = LoadDicts("path/")
>>> # you have: dicts.data and dicts.metdada as dictionaries
>>> # And a list of dictionaries loaded in:
>>> dicts.List
["data", "metdada"]

Fundamentals:

Based on Random Forest method principles: ensemble of models (decision trees).
In bootstrap process:
- the data sampled ensure the balance between classes, for training and validation;
- the list of features used are randomly sampled (with random number of features and order).
For each tree:
- fallowing the sequence of a given list of features, the data is splited half/half based on meadian value;
- the splitting process ends when the samples have one only class;
- validation process based on dynamic threshold can discard the tree.
For use the forest:
- all trees predictions are combined as a vote;
- it is possible to use soft or hard-voting.
Positive side-effects:
- possible more generalization caused by the combination of overfitted trees, each tree is highly specialized in a smallest and different set of feature;
- robustness for unbalanced and missing data, in case of missing data, the feature could be skipped without degrade the optimization process;
- in prediction process, a missing value could be dealt with a tree replication considering the two possible paths;
- the survived trees have a potential information about feature importance.

References

[2] Laboratory of Decision Tree and Random Forest (github/ysraell/random-forest-lab). GitHub repository.

[3] Credit Card Fraud Detection. Anonymized credit card transactions labeled as fraudulent or genuine. Kaggle. Access: https://www.kaggle.com/mlg-ulb/creditcardfraud.

Development Framework (optional)

My data science Docker image.

With this image you can run all notebooks and scripts Python inside this repository.

TODO v0.2:

Add parallel processing using or TQDM or csv2es style.
Prediction with missing values: useTree must be functional and branching when missing value, combining classes at leaves with their probabilities.
[Plus] Add a method to return the list of feaures and their degrees of importance.

TODO v0.3:

Docstring.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.1.1

Mar 17, 2024

1.1.0

Sep 5, 2023

1.0.3

Nov 28, 2022

1.0.2

Aug 16, 2022

1.0.1

Aug 2, 2022

1.0.0

Aug 2, 2022

0.3.7

Jan 15, 2022

0.3.6

Nov 12, 2021

0.3.5

Oct 19, 2021

0.3.4

Sep 16, 2021

0.3.3

Sep 14, 2021

0.3.2

Sep 13, 2021

0.3.1

Sep 12, 2021

0.3.0

Sep 12, 2021

0.2.1

Sep 7, 2021

0.2.0

Aug 25, 2021

This version

0.1.1

Aug 22, 2021

0.0.1

Aug 20, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

random-forest-mc-0.1.1.tar.gz (7.8 kB view hashes)

Uploaded Aug 22, 2021 Source

Built Distribution

random_forest_mc-0.1.1-py3-none-any.whl (8.3 kB view hashes)

Uploaded Aug 22, 2021 Python 3

Hashes for random-forest-mc-0.1.1.tar.gz

Hashes for random-forest-mc-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`82c1e745a78ab4cc8875d4b2ec26616c61eedf281f6678856c9e4d72f4729d37`
MD5	`cc1f8c39774ccc4795d905c914ae60b9`
BLAKE2b-256	`7e92f3000f84ebf5f1f17f5506b08eaa9860a362f383fde7bbb1aa4128244216`

Hashes for random_forest_mc-0.1.1-py3-none-any.whl

Hashes for random_forest_mc-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4dd0f397a8a05a0f4fa23dfedfe6bd0c14320cac166826031707d0810c17a425`
MD5	`968401be3732b3ab3fc22619773998ad`
BLAKE2b-256	`c1e1eaec1607f4a90cca89b98673f9f76bf9a777c542305c493263cb0a90c76f`

random-forest-mc 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Random Forest with Dynamic Tree Selection Monte Carlo Based (RF-TSMC)

Development status: `Release Candidate`.

Install:

Usage:

LoadDicts:

Fundamentals:

References

Development Framework (optional)

TODO v0.2:

TODO v0.3:

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

random-forest-mc 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Random Forest with Dynamic Tree Selection Monte Carlo Based (RF-TSMC)

Development status: Release Candidate.

Install:

Usage:

LoadDicts:

Fundamentals:

References

Development Framework (optional)

TODO v0.2:

TODO v0.3:

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

Development status: `Release Candidate`.