Skip to main content

Python extension for computing string edit distances and similarities.

Project description

Levenshtein

Continous Integration PyPI package version Python versions Documentation GitHub license

Introduction

The Levenshtein Python C extension module contains functions for fast computation of:

  • Levenshtein (edit) distance, and edit operations
  • string similarity
  • approximate median strings, and generally string averaging
  • string sequence and set similarity

This is a fork of ztane/python-Levenshtein, since the original project is no longer actively maintained.

Requirements

  • Python 3.5 or later

Installation

pip install levenshtein

Documentation

The documentation for the current version can be found at https://maxbachmann.github.io/Levenshtein/

License

Levenshtein is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

See the file COPYING for the full text of GNU General Public License version 2.

Changelog

v0.17.0

  • Removed support for Python 3.5

v0.16.1

  • Add support for RapidFuzz v1.9.*

v0.16.0

  • Add support for Python 3.10

v0.15.0

  • Update SequenceMatcher interface to support the autojunk parameter

v0.14.0

  • Drop Python 2 support
  • Fixed free of non heap object due caused by zero offset on a heap object
  • Fixed warnings about missing type conversions
  • Fix segmentation fault in subtract_edit when incorrect input types are used
  • Fixed unchecked memory allocations
  • Implement distance/ratio/hamming/jaro/jaro_winkler using rapidfuzz instead of providing a own implementation
  • Implement Wrapper for inverse/editops/opcodes/matching_blocks/subtract_edit/apply_edit using Cython to simplify support for new Python versions

v0.13.0

  • Maintainership passed to Max Bachmann
  • use faster bitparallel implementations for distance and ratio
  • avoid string copies in distance, ratio and hamming
  • Fix usage of deprecated Unicode APIs in distance, ratio and hamming
  • Fixed incorrect window size inside Jaro and Jaro-Winkler implementation
  • Fixed incorrect exception messages
  • Removed unused functions and compiler specific hacks
  • Split the Python and C implementations to simplify building of the C library
  • Fixed multiple bugs which prevented the use as C library, since some functions only got defined when compiling for Python
  • Build and deliver python wheels for the library
  • Fixed incorrect allocation size in lev_editops_matching_blocks and lev_opcodes_matching_blocks

v0.12.1

  • Fixed handling of numerous possible wraparounds in calculating the size of memory allocations; incorrect handling of which could cause denial of service or even possible remote code execution in previous versions of the library.

v0.12.0

  • Fixed a bug in StringMatcher.StringMatcher.get_matching_blocks / extract_editops for Python 3; now allow only str editops on both Python 2 and Python 3, for simpler and working code.
  • Added documentation in the source distribution and in GIT
  • Fixed the package layout: renamed the .so/.dll to _levenshtein, and made it reside inside a package, along with the StringMatcher class.
  • Fixed spelling errors.

v0.11.2

  • Fixed a bug in setup.py: installation would fail on Python 3 if the locale did not specify UTF-8 charset (Felix Yan).

  • Added COPYING, StringMatcher.py, gendoc.sh and NEWS in MANIFEST.in, as they were missing from source distributions.

v0.11.1

  • Added Levenshtein.h to MANIFEST.in

v0.11.0

  • Python 3 support, maintainership passed to Antti Haapala

v0.10.2

  • Made python-Lehvenstein Git compatible and use setuptools for PyPi upload
  • Created HISTORY.txt and made README reST compatible

v0.10.1

  • apply_edit() broken for Unicodes was fixed (thanks to Radovan Garabik)
  • subtract_edit() function was added

v0.10.0

  • Hamming distance, Jaro similarity metric and Jaro-Winkler similarity metric were added
  • ValueErrors raised on wrong argument types were fixed to TypeErrors

v0.9.0

  • a poor-but-fast generalized median method quickmedian() was added
  • some auxiliary functions added to the C api (lev_set_median_index, lev_editops_normalize, ...)

v0.8.2

  • fixed missing `static' in the method list

v0.8.1

  • some compilation problems with non-gcc were fixed

v0.8.0

  • median_improve(), a generalized median improving function, was added
  • an arbitrary length limitation imposed on greedy median() result was removed
  • out of memory should be handled more gracefully (on systems w/o memory overcomitting)
  • the documentation now passes doctest

v0.7.0

  • fixed greedy median() for Unicode characters > U+FFFF, it's now usable with whatever integer type wchar_t happens to be
  • added missing MANIFEST
  • renamed exported C functions, all public names now have lev_, LEV_ or Lev prefix; defined lev_byte, lev_wchar, and otherwise santinized the (still unstable) C interface
  • added edit-ops group of functions, with two interfaces: native, useful for string averaging, and difflib-like for interoperability
  • added an example SequenceMatcher-like class StringMatcher

v0.6.0

  • a segfault in seqratio()/setratio() on invalid input has been fixed to an exception
  • optimized ratio() and distance() (about 20%)
  • Levenshtein.h header file was added to make it easier to actually use it as a C library

v0.5.0

  • a segfault in setratio() was fixed
  • median() handles all empty strings situation more gracefully

v0.4.0

  • new functions seqratio() and setratio() computing similarity between string sequences and sets
  • Levenshtein optimizations (affects all routines except median())
  • all Sequence objects are accepted, not just Lists

v0.3.0

  • setmedian() finding set median was added
  • median() initial overhead for Unicodes was reduced

v0.2.0

  • ratio() and distance() now accept both Strings and Unicodes
  • removed uratio() and udistance()
  • Levenshtein.c is now compilable as a C library (with -DNO_PYTHON)
  • a median() function finding approximate weighted median of a string set was added

v0.1.0

  • Inital release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Levenshtein-0.17.0.tar.gz (105.0 kB view hashes)

Uploaded Source

Built Distributions

Levenshtein-0.17.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl (167.3 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

Levenshtein-0.17.0-pp37-pypy37_pp73-macosx_10_9_x86_64.whl (89.4 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

Levenshtein-0.17.0-cp310-cp310-win_amd64.whl (71.1 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

Levenshtein-0.17.0-cp310-cp310-win32.whl (63.5 kB view hashes)

Uploaded CPython 3.10 Windows x86

Levenshtein-0.17.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (110.0 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

Levenshtein-0.17.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl (107.1 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ s390x

Levenshtein-0.17.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (126.5 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ppc64le

Levenshtein-0.17.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (102.9 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

Levenshtein-0.17.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (109.9 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

Levenshtein-0.17.0-cp310-cp310-macosx_10_9_x86_64.whl (92.7 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

Levenshtein-0.17.0-cp39-cp39-win_amd64.whl (71.1 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

Levenshtein-0.17.0-cp39-cp39-win32.whl (63.5 kB view hashes)

Uploaded CPython 3.9 Windows x86

Levenshtein-0.17.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (110.0 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

Levenshtein-0.17.0-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl (107.1 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ s390x

Levenshtein-0.17.0-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (126.5 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ppc64le

Levenshtein-0.17.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (102.9 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

Levenshtein-0.17.0-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (109.9 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

Levenshtein-0.17.0-cp39-cp39-macosx_10_9_x86_64.whl (92.8 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

Levenshtein-0.17.0-cp38-cp38-win_amd64.whl (71.2 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

Levenshtein-0.17.0-cp38-cp38-win32.whl (63.5 kB view hashes)

Uploaded CPython 3.8 Windows x86

Levenshtein-0.17.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (110.8 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

Levenshtein-0.17.0-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl (108.2 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ s390x

Levenshtein-0.17.0-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (127.2 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ppc64le

Levenshtein-0.17.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (103.5 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

Levenshtein-0.17.0-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (110.8 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

Levenshtein-0.17.0-cp38-cp38-macosx_10_9_x86_64.whl (91.6 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

Levenshtein-0.17.0-cp37-cp37m-win_amd64.whl (70.8 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

Levenshtein-0.17.0-cp37-cp37m-win32.whl (63.2 kB view hashes)

Uploaded CPython 3.7m Windows x86

Levenshtein-0.17.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (110.5 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

Levenshtein-0.17.0-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl (107.6 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ s390x

Levenshtein-0.17.0-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (128.5 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ppc64le

Levenshtein-0.17.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (103.0 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

Levenshtein-0.17.0-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (111.5 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

Levenshtein-0.17.0-cp37-cp37m-macosx_10_9_x86_64.whl (91.4 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

Levenshtein-0.17.0-cp36-cp36m-win_amd64.whl (69.4 kB view hashes)

Uploaded CPython 3.6m Windows x86-64

Levenshtein-0.17.0-cp36-cp36m-win32.whl (61.8 kB view hashes)

Uploaded CPython 3.6m Windows x86

Levenshtein-0.17.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (106.3 kB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64

Levenshtein-0.17.0-cp36-cp36m-manylinux_2_17_s390x.manylinux2014_s390x.whl (103.7 kB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.17+ s390x

Levenshtein-0.17.0-cp36-cp36m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (123.5 kB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.17+ ppc64le

Levenshtein-0.17.0-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (100.1 kB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.17+ ARM64

Levenshtein-0.17.0-cp36-cp36m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (107.1 kB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

Levenshtein-0.17.0-cp36-cp36m-macosx_10_9_x86_64.whl (89.6 kB view hashes)

Uploaded CPython 3.6m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page