Skip to main content

Fast multi-keyword search engine for text strings

Project description

Author: Stefan Behnel

What is Acora?

Acora is ‘fgrep’ for Python, a fast multi-keyword text search engine.

Based on a set of keywords, it generates a search automaton (DFA) and runs it over string input, either unicode or bytes.

It is based on the Aho-Corasick algorithm and an NFA-to-DFA transformation.

Features

  • works with unicode strings and byte strings

  • about 2-3x as fast as Python’s regular expression engine

  • finds overlapping matches, i.e. all matches of all keywords

  • support for case insensitive search (~10x as fast as ‘re’)

  • frees the GIL while searching

  • additional (slow but short) pure Python implementation

  • support for Python 2.5+ and 3.x

  • support for searching in files

How do I use it?

Import the package:

>>> from acora import AcoraBuilder

Collect some keywords:

>>> builder = AcoraBuilder('ab', 'bc', 'de')
>>> builder.add('a', 'b')

Generate the Acora search engine:

>>> ac = builder.build()

Search a string for all occurrences:

>>> ac.findall('abc')
[('a', 0), ('ab', 0), ('b', 1), ('bc', 1)]
>>> ac.findall('abde')
[('a', 0), ('ab', 0), ('b', 1), ('de', 2)]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acora-1.0.tar.gz (49.4 kB view details)

Uploaded Source

File details

Details for the file acora-1.0.tar.gz.

File metadata

  • Download URL: acora-1.0.tar.gz
  • Upload date:
  • Size: 49.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for acora-1.0.tar.gz
Algorithm Hash digest
SHA256 d41650a2087e3f0390fb36ea99410e717db3968b14b1d83d1438cb88f23f49a5
MD5 009a1d4f7a73976403b0ce3d60875352
BLAKE2b-256 821d120b17fb056c188793058f30f8497a61bdb2d0482c79fcf4e914ef07d94d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page