acora

Fast multi-keyword search engine for text strings

These details have not been verified by PyPI

Project links

Project description

Author: Stefan Behnel

What is Acora?

Acora is ‘fgrep’ for Python, a fast multi-keyword text search engine.

Based on a set of keywords, it generates a search automaton (DFA) and runs it over string input, either unicode or bytes.

It is based on the Aho-Corasick algorithm and an NFA-to-DFA transformation.

Features

works with unicode strings and byte strings
about 2-3x as fast as Python’s regular expression engine
finds overlapping matches, i.e. all matches of all keywords
support for case insensitive search (~10x as fast as ‘re’)
frees the GIL while searching
additional (slow but short) pure Python implementation
support for Python 2.5+ and 3.x
support for searching in files

How do I use it?

Import the package:

>>> from acora import AcoraBuilder

Collect some keywords:

>>> builder = AcoraBuilder('ab', 'bc', 'de')
>>> builder.add('a', 'b')

Generate the Acora search engine:

>>> ac = builder.build()

Search a string for all occurrences:

>>> ac.findall('abc')
[('a', 0), ('ab', 0), ('b', 1), ('bc', 1)]
>>> ac.findall('abde')
[('a', 0), ('ab', 0), ('b', 1), ('de', 2)]

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.5

Sep 14, 2024

2.4

Sep 12, 2023

2.3

Mar 27, 2021

2.2

Oct 16, 2018

2.1

Dec 15, 2017

2.0

Mar 17, 2016

1.9

Oct 10, 2015

1.8

Feb 12, 2014

1.7

Aug 24, 2011

1.6

Jul 24, 2011

1.5

Jan 24, 2011

1.4

Feb 10, 2010

1.3

Jan 30, 2010

1.2

Jan 30, 2010

1.1

Jan 29, 2010

This version

1.0

Jan 29, 2010

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acora-1.0.tar.gz (49.4 kB view details)

Uploaded Jan 29, 2010 Source

File details

Details for the file acora-1.0.tar.gz.

File metadata

Download URL: acora-1.0.tar.gz
Upload date: Jan 29, 2010
Size: 49.4 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for acora-1.0.tar.gz
Algorithm	Hash digest
SHA256	`d41650a2087e3f0390fb36ea99410e717db3968b14b1d83d1438cb88f23f49a5`
MD5	`009a1d4f7a73976403b0ce3d60875352`
BLAKE2b-256	`821d120b17fb056c188793058f30f8497a61bdb2d0482c79fcf4e914ef07d94d`

See more details on using hashes here.

acora 1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

What is Acora?

Features

How do I use it?

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes