Fast multi-keyword search engine for text strings
Project description
Author: Stefan Behnel
What is Acora?
Acora is ‘fgrep’ for Python, a fast multi-keyword text search engine.
Based on a set of keywords, it generates a search automaton (DFA) and runs it over string input, either unicode or bytes.
It is based on the Aho-Corasick algorithm and an NFA-to-DFA transformation.
Features
works with unicode strings and byte strings
about 2-3x as fast as Python’s regular expression engine
finds overlapping matches, i.e. all matches of all keywords
support for case insensitive search (~10x as fast as ‘re’)
frees the GIL while searching
additional (slow but short) pure Python implementation
support for Python 2.5+ and 3.x
support for searching in files
How do I use it?
Import the package:
>>> from acora import AcoraBuilder
Collect some keywords:
>>> builder = AcoraBuilder('ab', 'bc', 'de')
>>> builder.add('a', 'b')
Generate the Acora search engine:
>>> ac = builder.build()
Search a string for all occurrences:
>>> ac.findall('abc')
[('a', 0), ('ab', 0), ('b', 1), ('bc', 1)]
>>> ac.findall('abde')
[('a', 0), ('ab', 0), ('b', 1), ('de', 2)]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file acora-1.0.tar.gz.
File metadata
- Download URL: acora-1.0.tar.gz
- Upload date:
- Size: 49.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d41650a2087e3f0390fb36ea99410e717db3968b14b1d83d1438cb88f23f49a5
|
|
| MD5 |
009a1d4f7a73976403b0ce3d60875352
|
|
| BLAKE2b-256 |
821d120b17fb056c188793058f30f8497a61bdb2d0482c79fcf4e914ef07d94d
|