fuzzysearch is useful for finding approximate subsequence matches
Project description
fuzzysearch is useful for finding approximate subsequence matches
Free software: MIT license
Documentation: http://fuzzysearch.rtfd.org.
Features
Fuzzy sub-sequence search: Find parts of a sequence which match a given sub-sequence up to a given maximum Levenshtein distance.
Simple Example
You can usually just use the find_near_matches() utility function, which chooses a suitable fuzzy search implementation according to the given parameters:
>>> from fuzzysearch import find_near_matches
>>> find_near_matches('PATTERN', 'aaaPATERNaaa', max_l_dist=1)
[Match(start=3, end=9, dist=1)]
Advanced Example
If needed you can choose a specific search implementation, such as find_near_matches_with_ngrams():
>>> sequence = '''\
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
GGGATAGG'''
>>> subsequence = 'TGCACTGTAGGGATAACAAT' #distance 1
>>> max_distance = 2
>>> from fuzzysearch import find_near_matches_with_ngrams
>>> find_near_matches_with_ngrams(subsequence, sequence, max_distance)
[Match(start=3, end=24, dist=1)]
History
0.2.1 (2014-03-14)
Fixed major match grouping bug
0.2.0 (2013-03-13)
New utility function find_near_matches() for easier use
Additional documentation
0.1.0 (2013-11-12)
Two working implementations
Extensive test suite; all tests passing
Full support for Python 2.6-2.7 and 3.1-3.3
Bumped status from Pre-Alpha to Alpha
0.0.1 (2013-11-01)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.