Skip to main content

fuzzysearch is useful for finding approximate subsequence matches

Project description

https://badge.fury.io/py/fuzzysearch.png https://travis-ci.org/taleinat/fuzzysearch.png?branch=master https://coveralls.io/repos/taleinat/fuzzysearch/badge.png?branch=master https://pypip.in/d/fuzzysearch/badge.png

fuzzysearch is useful for finding approximate subsequence matches

Installation

Just install using pip:

$ pip install fuzzysearch

Features

  • Fuzzy sub-sequence search: Find parts of a sequence which match a given sub-sequence up to a given maximum Levenshtein distance.

  • Set individual limits for the number of substitutions, insertions and/or deletions allowed for a near-match.

  • Includes optimized implementations for specific use-cases, e.g. only allowing substitutions in near-matches.

Simple Example

You can usually just use the find_near_matches() utility function, which chooses a suitable fuzzy search implementation according to the given parameters:

>>> from fuzzysearch import find_near_matches
>>> find_near_matches('PATTERN', 'aaaPATERNaaa', max_l_dist=1)
[Match(start=3, end=9, dist=1)]

Advanced Example

If needed you can choose a specific search implementation, such as find_near_matches_with_ngrams():

>>> sequence = '''\
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
GGGATAGG'''
>>> subsequence = 'TGCACTGTAGGGATAACAAT' #distance 1
>>> max_distance = 2

>>> from fuzzysearch import find_near_matches_with_ngrams
>>> find_near_matches_with_ngrams(subsequence, sequence, max_distance)
[Match(start=3, end=24, dist=1)]

History

0.3.0 (2015-02-12)

  • Added C extensions for several search functions as well as internal functions

  • Use C extensions if available, or pure-Python implementations otherwise

  • setup.py attempts to build C extensions, but installs without if build fails

  • Added --noexts setup.py option to avoid trying to build the C extensions

  • Greatly improved testing and coverage

0.2.2 (2014-03-27)

  • Added support for searching through BioPython Seq objects

  • Added specialized search function allowing only subsitutions and insertions

  • Fixed several bugs

0.2.1 (2014-03-14)

  • Fixed major match grouping bug

0.2.0 (2013-03-13)

  • New utility function find_near_matches() for easier use

  • Additional documentation

0.1.0 (2013-11-12)

  • Two working implementations

  • Extensive test suite; all tests passing

  • Full support for Python 2.6-2.7 and 3.1-3.3

  • Bumped status from Pre-Alpha to Alpha

0.0.1 (2013-11-01)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzysearch-0.3.0.tar.gz (52.7 kB view hashes)

Uploaded Source

Built Distributions

fuzzysearch-0.3.0-cp34-cp34m-macosx_10_8_x86_64.whl (55.0 kB view hashes)

Uploaded CPython 3.4m macOS 10.8+ x86-64

fuzzysearch-0.3.0-cp33-cp33m-macosx_10_8_x86_64.whl (54.9 kB view hashes)

Uploaded CPython 3.3m macOS 10.8+ x86-64

fuzzysearch-0.3.0-cp32-cp32m-macosx_10_8_x86_64.whl (54.8 kB view hashes)

Uploaded CPython 3.2m macOS 10.8+ x86-64

fuzzysearch-0.3.0-cp27-none-macosx_10_8_x86_64.whl (54.4 kB view hashes)

Uploaded CPython 2.7 macOS 10.8+ x86-64

fuzzysearch-0.3.0-cp26-none-macosx_10_8_x86_64.whl (54.4 kB view hashes)

Uploaded CPython 2.6 macOS 10.8+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page