Terry toolkit tkitreadability
Project description
一个从html中提取正文的库
from tkitreadability import tkitReadability
html = """
<div class="full-component-wrapper">
<div class="component component--text-image image-position--right" data-id="45290" data-type="c_sideimagetext_ttt">
<div class="text-image--component-wrapper twb-container">
<div class="text-image--content-wrapper row">
<div class="text-image--image col-12 col-xl-7 order-2 order-xl-3">
<div class="field field--name-field-c-image field--type-entity-reference field--label-hidden field__item">
<picture>
<source srcset="/sites/default/files/styles/ttt_image_690/public/2021-07/border-collie.webp?itok=1oyChjVg 2x" media="all and (min-width: 1140px)" type="image/webp">
<source srcset="/sites/default/files/styles/ttt_image_930/public/2021-07/border-collie.webp?itok=QxWrubxE 1x" media="all and (min-width: 992px)" type="image/webp">
<source srcset="/sites/default/files/styles/ttt_image_690/public/2021-07/border-collie.webp?itok=1oyChjVg 1x" media="all and (min-width: 768px)" type="image/webp">
<source srcset="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.webp?itok=jhilnwqZ 1x" media="all and (min-width: 576px)" type="image/webp">
<source srcset="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.webp?itok=jhilnwqZ 1x" type="image/webp">
<source srcset="/sites/default/files/styles/ttt_image_690/public/2021-07/border-collie.jpg?itok=1oyChjVg 2x" media="all and (min-width: 1140px)" type="image/jpeg">
<source srcset="/sites/default/files/styles/ttt_image_930/public/2021-07/border-collie.jpg?itok=QxWrubxE 1x" media="all and (min-width: 992px)" type="image/jpeg">
<source srcset="/sites/default/files/styles/ttt_image_690/public/2021-07/border-collie.jpg?itok=1oyChjVg 1x" media="all and (min-width: 768px)" type="image/jpeg">
<source srcset="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.jpg?itok=jhilnwqZ 1x" media="all and (min-width: 576px)" type="image/jpeg">
<source srcset="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.jpg?itok=jhilnwqZ 1x" type="image/jpeg">
<img src="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.jpg?itok=jhilnwqZ" alt="Border Collie" typeof="foaf:Image" loading="lazy">
</picture>
</div>
</div>
<img src="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.jpg?itok=jhilnwqZ" alt="Border Collie" typeof="foaf:Image" loading="lazy">
<div class="text-image--text-wrapper col-12 col-xl-5 order-3 order-xl-2">
<div class="text-image--text">
<div class="clearfix text-formatted field field--name-field-c-sideimagetext-summary field--type-text-long field--label-hidden field__item"><h2>Pet Card</h2>
<ul>
<li><strong>Living Considerations:</strong> Not hypoallergenic, suitable for apartment living, good with older children</li>
<li><strong>Size:</strong> Medium</li>
<li><strong>Height:</strong> Males - 48 to 56 centimetres at the withers, Females - 45 to 53 centimetres at the withers</li>
<li><strong>Weight:</strong> Males -13 to 20 kilograms, Females - 12 to 19 kilograms</li>
<li><strong>Coat:</strong> Medium/Long</li>
<li><strong>Energy:</strong> High</li>
<li><strong>Colour:</strong> All colours or colour combinations</li>
<li><strong>Activities:</strong> Agility, Conformation, Herding, Obedience, Rally Obedience, Tracking</li>
<li><strong>Indoor/Outdoor:</strong> Both</li>
</ul>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
"""
Readability = tkitReadability()
content = Readability.html2text(html)
print(content)
# 输出为html
print(Readability.markdown2Html(content))
更新
version:'0.0.0.4'
加入的markdown的转换为html
文档查看 https://docs.terrychan.org/tkitreadability/
快速上传操作
可以自动查找依赖,然后上传
sh upload.sh
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tkitreadability-0.0.0.5.2.tar.gz
(10.0 kB
view hashes)
Built Distribution
Close
Hashes for tkitreadability-0.0.0.5.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0110521bafc1ec880e8345fa4ac01036747333d6f838ad1f173206cb91e359bc |
|
MD5 | 1c561b114ce49d091de80e3e004cefca |
|
BLAKE2b-256 | 77648c6b89dd9254fec3d106c23e05558b5b3a496c12ee40130c3fffddd47548 |
Close
Hashes for tkitreadability-0.0.0.5.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49f96da9fdc318d0ccb24c553e0b47aa3c4ed8ed3a8b2d21efbd5a2a784fcae4 |
|
MD5 | 0a9906382eb359e7ee72e20da92949e7 |
|
BLAKE2b-256 | 3668ed25d0160a216e6b2d013d21b7da31c90a8ad8fe93da8a66daf9ad754ee0 |