Container for finding Python objects by matching attributes. Stores objects by attribute value for fast lookup.
Project description
FilterBox
Container for finding Python objects by matching attributes.
Stores objects pre-filtered by attribute value, so it can find them much faster than filter()
,
and in many cases faster than SQLite. Speed demo
pip install filterbox
Usage:
from filterbox import FilterBox
fb = FilterBox( # Make a FilterBox
[{'color': 'green', 'type': 'apple'},
{'color': 'green', 'type': 'frog'}] # Containing any type of objects
on=['color', 'type']) # Define attributes to find by
fb.find({'color': 'green', 'type': 'frog'}) # Find by attribute match
The objects can be anything: class instances, namedtuples, dicts, strings, floats, ints, etc.
Attributes can be either strings or functions evaluated on the object.
There are two classes available.
- FilterBox: can
add()
andremove()
objects. - FrozenFilterBox: faster finds, lower memory usage, and immutable.
Examples
Expand for sample code.
Match and exclude multiple values
from filterbox import FilterBox
objects = [
{'item': 1, 'size': 10, 'flavor': 'melon'},
{'item': 2, 'size': 10, 'flavor': 'lychee'},
{'item': 3, 'size': 20, 'flavor': 'peach'},
{'item': 4, 'size': 30, 'flavor': 'apple'}
]
fb = FilterBox(objects, on=['size', 'flavor'])
fb.find(
match={'size': [10, 20]}, # match anything with size in [10, 20]
exclude={'flavor': ['lychee', 'peach']} # where flavor is not in ['lychee', 'peach']
)
# result: [{'item': 1, 'size': 10, 'flavor': 'melon'}]
Accessing nested attributes using functions
Function attributes are used to get values from nested data structures.
from filterbox import FilterBox
objs = [
{'a': {'b': [1, 2, 3]}},
{'a': {'b': [4, 5, 6]}}
]
def get_nested(obj):
return obj['a']['b'][0]
fb = FilterBox(objs, [get_nested])
fb.find({get_nested: 4})
# result: {'a': {'b': [4, 5, 6]}}
Derived attributes using functions
Function attributes are very powerful. Here we find string objects with certain characteristics.
from filterbox import FrozenFilterBox
objects = ['mushrooms', 'peppers', 'onions']
def o_count(obj):
return obj.count('o')
f = FrozenFilterBox(objects, [o_count, len])
f.find({len: 6}) # returns ['onions']
f.find({o_count: 2}) # returns ['mushrooms', 'onions']
Greater than, less than
FilterBox and FrozenFilterBox have a function
get_values(attr)
which gets the set of unique values
for an attribute.
Here's how to use that to find objects having x >= 3
.
from filterbox import FilterBox
data = [{'x': i} for i in [1, 1, 2, 3, 5]]
fb = FilterBox(data, ['x'])
vals = fb.get_values('x') # get the set of unique values: {1, 2, 3, 5}
big_vals = [x for x in vals if x >= 3] # big_vals is [3, 5]
fb.find({'x': big_vals}) # result: [{'x': 3}, {'x': 5}
Handling missing attributes
Objects don't need to have every attribute.
- Objects that are missing an attribute will not be stored under that attribute. This saves lots of memory.
- To find all objects that have an attribute, match the special value
ANY
. - To find objects missing the attribute, exclude
ANY
. - In functions, raise MissingAttribute to tell FilterBox the object is missing.
Example:
from filterbox import FilterBox, ANY
from filterbox.exceptions import MissingAttribute
def get_a(obj):
try:
return obj['a']
except KeyError:
raise MissingAttribute # tell FilterBox this attribute is missing
objs = [{'a': 1}, {'a': 2}, {}]
fb = FilterBox(objs, ['a', get_a])
fb.find({'a': ANY}) # result: [{'a': 1}, {'a': 2}]
fb.find({get_a: ANY}) # result: [{'a': 1}, {'a': 2}]
fb.find(exclude={'a': ANY}) # result: [{}]
Recipes
- Auto-updating - Keep FilterBox updated when attribute values change
- Wordle solver - Demonstrates using
functools.partials
to make attribute functions - Collision detection - Find objects based on type and proximity (grid-based)
- Percentiles - Find by percentile (median, p99, etc.)
API documentation:
How it works
For every attribute in FilterBox, it holds a dict that maps each unique value to the set of objects with that value.
FilterBox is roughly this:
FilterBox = {
'attribute1': {val1: {objs}, val2: {more_objs}},
'attribute2': {val3: {objs}, val4: {more_objs}}
}
During find()
, the object sets matching the query values are retrieved, and set operations like union
,
intersect
, and difference
are applied to get the final result.
That's a simplified version; for way more detail, See the "how it works" pages for FilterBox and FrozenFilterBox.
Related projects
FilterBox is a type of inverted index. It is optimized for its goal of finding in-memory Python objects.
Other Python inverted index implementations are aimed at things like vector search and finding documents by words. Outside of Python, ElasticSearch is a popular inverted index search tool. Each of these has goals outside of FilterBox's niche; there are no plans to expand FilterBox towards these functions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for filterbox-0.6.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ccbded7cb6c829596a181e809e3fb281663785de2c41f5c350ad95314a684277 |
|
MD5 | 526dae35a1bad526e29a021191839845 |
|
BLAKE2b-256 | 1a1321c8b5290055c1ccdeea76469ab0c060c42ce31c5dce62f69ed4ccd0bf59 |