# Remove duplicates 重复内容筛选
Project description
Remove duplicates 重复内容筛选
tkitSimhash zh
根据经验,一般当两个文档特征字之间的汉明距离小于 3, 就可以判定两个文档相似。《数学之美》一书中,在讲述信息指纹时对这种算法有详细的介绍。
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tkitSimhash-0.0.1.2.tar.gz
(4.8 kB
view hashes)
Built Distribution
Close
Hashes for tkitSimhash-0.0.1.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75cd4eff2f3949945bb3c2b9f1a0c51338147e7ba3691b1e6883bba2be62b23c |
|
MD5 | 89c3f7cd22bee9f3a8e7383c7d6e116f |
|
BLAKE2b-256 | 21132d65c0ee7348c7bb306f8b24b8be1627bb0536d5530a0625a83b31b03f9b |