Skip to main content

Frequent itemsets -- fp-tree naeseth

Project description

og

Frequent itemsets -- fp-tree naeseth

To install: pip install og

Overview

The og package provides a Python implementation of the FP-growth algorithm for finding frequent itemsets in transactional data sets. This method is efficient and scalable, making it suitable for large data sets where traditional apriori-based methods may be too slow. It constructs a compressed representation of the dataset, the FP-tree, which is then used to extract frequent itemsets directly.

Features

  • Efficiently find frequent itemsets without candidate generation.
  • Return itemsets along with their support counts if desired.
  • Handle any iterable of iterables as input for transactions.

Usage

Basic Usage

To use the find_frequent_itemsets function, you need to provide a list of transactions and a minimum support threshold. Here is a simple example:

from og import find_frequent_itemsets

# Sample transactions
transactions = [
    ['milk', 'bread', 'butter'],
    ['beer', 'bread'],
    ['milk', 'bread'],
    ['butter', 'beer'],
    ['bread', 'butter'],
    ['milk', 'butter'],
    ['milk', 'bread', 'butter', 'beer']
]

# Finding itemsets with a minimum support of 3
result = list(find_frequent_itemsets(transactions, minimum_support=3))
print(result)

Including Support Counts

If you want to include the support counts of the itemsets in the results, set include_support=True:

# Finding itemsets with support counts
result_with_support = list(find_frequent_itemsets(transactions, minimum_support=3, include_support=True))
print(result_with_support)

Classes and Functions

FPTree

This class represents an FP-tree structure that can store transaction items. It supports adding transactions, and provides methods to access the nodes and items.

FPNode

Represents a node in an FP-tree. Each node contains a count of occurrences, links to child nodes, and a reference to its parent node.

find_frequent_itemsets

A function to find frequent itemsets in the given transactions using the FP-growth algorithm. It can optionally return the support count for each itemset.

Parameters:

  • transactions: An iterable of iterable items. Each inner iterable represents a transaction.
  • minimum_support: An integer specifying the minimum number of occurrences for an itemset to be considered frequent.
  • include_support: A boolean flag that, when set to True, includes the support count of each itemset in the results.

Development and Contributions

Contributions to the og package are welcome. You can contribute in several ways including providing feedback, reporting bugs, and submitting feature requests or pull requests.

For more detailed contribution guidelines, please refer to the official repository documentation.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

og-0.0.5.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

og-0.0.5-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file og-0.0.5.tar.gz.

File metadata

  • Download URL: og-0.0.5.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for og-0.0.5.tar.gz
Algorithm Hash digest
SHA256 c5cd3f0566390adf9606808bc8aaa4f9c5e6ff72f6887a7d27641f6cf489ca4e
MD5 b8fdc55f0d6de56edbc5e52844403204
BLAKE2b-256 31344daf82925d23061e4e0be69f841c0b68fa76dcba53dbe861f0cf7161baf5

See more details on using hashes here.

File details

Details for the file og-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: og-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for og-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2f1de5b11866f84e944be3c5e3c20f711852ff14c78adf4a0317ca8fc1292b6e
MD5 5f5addc4ea7ce095a3e831065c01fbb9
BLAKE2b-256 09f366dc77d9ed3843f23e861ac35fbe1b7828d3fbc2a155b428369c3dc0743c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page