Skip to main content

# Remove duplicates 重复内容筛选

Project description

Remove duplicates 重复内容筛选

tkitSimhash zh

根据经验,一般当两个文档特征字之间的汉明距离小于 3, 就可以判定两个文档相似。《数学之美》一书中,在讲述信息指纹时对这种算法有详细的介绍。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tkitSimhash-0.0.1.2.tar.gz (4.8 kB view hashes)

Uploaded Source

Built Distribution

tkitSimhash-0.0.1.2-py2.py3-none-any.whl (5.2 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page