Find Python objects by exact match on their attributes.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

HashBox

Container for finding Python objects by matching attributes.

Uses hash-based methods for storage and retrieval, so find is very fast.

pip install hashbox

python - 3.7+

Usage:

from hashbox import HashBox

objects = [
    {'a': 1, 'b': 2}, 
    {'a': 1, 'b': 3}
]

hb = HashBox(
    objects,
    on=['a', 'b']
)

hb.find(
    match={'a': 1}, 
    exclude={'b': 3}
)  
# result: [{'a': 1, 'b': 2}]

The objects can be any type: class instances, namedtuples, dicts, strings, floats, ints, etc.

There are two classes available.

HashBox: can add() and remove() objects.
FrozenHashBox: faster finds, lower memory usage, and immutable.

Examples

Expand for sample code.

Specify a list of values for an attribute to include / exclude values in the list

from hashbox import HashBox

objects = [
    {'order': 1, 'size': 'regular', 'topping': 'smothered'}, 
    {'order': 2, 'size': 'regular', 'topping': 'diced'}, 
    {'order': 3, 'size': 'large', 'topping': 'covered'},
    {'order': 4, 'size': 'triple', 'topping': 'chunked'}
]

hb = HashBox(objects, on=['size', 'topping'])

hb.find(
    match={'size': ['regular', 'large']},  # match anything with size in ['regular', 'large'] 
    exclude={'topping': 'diced'}           # exclude where topping is 'diced'
)  # result: orders 1 and 3

hb.find(
    match={},                               # match all objects
    exclude={'size': ['regular', 'large']}  # where size is not in ['regular', 'large']
)  # result: order 4

Define a function to access nested attributes

from hashbox import HashBox

class Order:
    def __init__(self, num, size, toppings):
        self.num = num
        self.size = size
        self.toppings = toppings
        
    def __repr__(self):
        return f"order: {self.num}, size: '{self.size}', toppings: {self.toppings}"
    
objects = [
    Order(1, 'regular', ['scattered', 'smothered', 'covered']),
    Order(2, 'large', ['scattered', 'covered', 'peppered']),
    Order(3, 'large', ['scattered', 'diced', 'chunked']),
    Order(4, 'triple', ['all the way']),
]

def has_cheese(obj):
    return 'covered' in obj.toppings or 'all the way' in obj.toppings

hb = HashBox(objects, ['size', has_cheese])

# returns orders 1, 2 and 4
hb.find({has_cheese: True})

Derived attributes

Find-by-function is very powerful. Here we find string objects with certain characteristics.

from hashbox import FrozenHashBox

objects = ['mushrooms', 'peppers', 'onions']

def o_count(obj):
    return obj.count('o')

f = FrozenHashBox(objects, [o_count, len])
f.find({len: 6})       # returns ['onions']
f.find({o_count: 2})   # returns ['mushrooms', 'onions']

Handling missing attributes

Objects that are missing an attribute will not be stored under that attribute. This saves lots of memory.
To find all objects that have an attribute, match the special value ANY.
To find objects missing the attribute, exclude ANY.
In functions, raise MissingAttribute to tell HashBox the object is missing.

from hashbox import HashBox, ANY
from hashbox.exceptions import MissingAttribute

def get_a(obj):
    try:
        return obj['a']
    except KeyError:
        raise MissingAttribute  # tell HashBox this attribute is missing

objs = [{'a': 1}, {'a': 2}, {}]
hb = HashBox(objs, ['a', get_a])

hb.find({'a': ANY})          # result: [{'a': 1}, {'a': 2}]
hb.find({get_a: ANY})        # result: [{'a': 1}, {'a': 2}]
hb.find(exclude={'a': ANY})  # result: [{}]

Recipes

Auto-updating - Keep HashBox updated when attribute values change
Wordle solver - Demonstrates using functools.partials to make attribute functions
Collision detection - Find objects based on type and proximity (grid-based)
Percentiles - Find by percentile (median, p99, etc.)

Performance

Demo: HashBox going 5x~10x faster than SQLite

How it works

In HashBox, each attribute is a dict of sets: {attribute value: set(object IDs)}. On find(), object IDs are retrieved for each attribute value. Then, set operations are applied to get the final object ID set. Last, the object IDs are mapped to objects, which are then returned.

FrozenHashBox uses arrays instead of sets, thanks to its immutability constraint. It stores a numpy array of objects. Attribute values map to indices in the object array. On find(), the array indices for each match are retrieved. Then, set operations provided by sortednp are used to get a final set of object array indices. Last, the objects are retrieved from the object array by index and returned.

Related projects

HashBox is a type of inverted index. It is optimized for its goal of finding in-memory Python objects.

Other Python inverted index implementations are aimed at things like vector search and finding documents by words. Outside of Python, ElasticSearch is a popular inverted index search tool. Each of these has goals outside of HashBox's niche; there are no plans to expand HashBox towards these functions.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.1.1

Aug 29, 2022

0.9.9

Aug 5, 2022

0.5.1

Aug 5, 2022

This version

0.5.0

Aug 4, 2022

0.4.0

Aug 3, 2022

0.3.2

Aug 3, 2022

0.3.1

Aug 2, 2022

0.3.0

Aug 2, 2022

0.2.3

Aug 1, 2022

0.2.2

Jul 31, 2022

0.2.1

Jul 30, 2022

0.2.0

Jul 30, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hashbox-0.5.0.tar.gz (17.1 kB view hashes)

Uploaded Aug 4, 2022 Source

Built Distribution

hashbox-0.5.0-py3-none-any.whl (18.3 kB view hashes)

Uploaded Aug 4, 2022 Python 3

Hashes for hashbox-0.5.0.tar.gz

Hashes for hashbox-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`2efbd1cb38fa6756648c591aa1045dbb85d26c9197d4e1cba54f2ce3107388ce`
MD5	`d76b7e1d3c0f287cc00dbf0e79f7cb86`
BLAKE2b-256	`48e6a64553ff6d545a37b72aa82ec87939917d202b78c3ef8c4e4feab47f0c0f`

Hashes for hashbox-0.5.0-py3-none-any.whl

Hashes for hashbox-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`493ee0da6d982b1045605ab856a975e6499c4907d8c41803b33f4eb92f6c02ae`
MD5	`d5c7539ecfc6b962d45722f7247d1d07`
BLAKE2b-256	`f43b39d056be0fdee78920c60ea73ce1a74e3fb54e43f5334ce25a53a8435ce8`