Terry toolkit tkitreadability
Project description
一个从html中提取正文的库
from tkitreadability import tkitReadability
html = """
<div class="full-component-wrapper">
<div class="component component--text-image image-position--right" data-id="45290" data-type="c_sideimagetext_ttt">
<div class="text-image--component-wrapper twb-container">
<div class="text-image--content-wrapper row">
<div class="text-image--image col-12 col-xl-7 order-2 order-xl-3">
<div class="field field--name-field-c-image field--type-entity-reference field--label-hidden field__item">
<picture>
<source srcset="/sites/default/files/styles/ttt_image_690/public/2021-07/border-collie.webp?itok=1oyChjVg 2x" media="all and (min-width: 1140px)" type="image/webp">
<source srcset="/sites/default/files/styles/ttt_image_930/public/2021-07/border-collie.webp?itok=QxWrubxE 1x" media="all and (min-width: 992px)" type="image/webp">
<source srcset="/sites/default/files/styles/ttt_image_690/public/2021-07/border-collie.webp?itok=1oyChjVg 1x" media="all and (min-width: 768px)" type="image/webp">
<source srcset="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.webp?itok=jhilnwqZ 1x" media="all and (min-width: 576px)" type="image/webp">
<source srcset="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.webp?itok=jhilnwqZ 1x" type="image/webp">
<source srcset="/sites/default/files/styles/ttt_image_690/public/2021-07/border-collie.jpg?itok=1oyChjVg 2x" media="all and (min-width: 1140px)" type="image/jpeg">
<source srcset="/sites/default/files/styles/ttt_image_930/public/2021-07/border-collie.jpg?itok=QxWrubxE 1x" media="all and (min-width: 992px)" type="image/jpeg">
<source srcset="/sites/default/files/styles/ttt_image_690/public/2021-07/border-collie.jpg?itok=1oyChjVg 1x" media="all and (min-width: 768px)" type="image/jpeg">
<source srcset="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.jpg?itok=jhilnwqZ 1x" media="all and (min-width: 576px)" type="image/jpeg">
<source srcset="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.jpg?itok=jhilnwqZ 1x" type="image/jpeg">
<img src="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.jpg?itok=jhilnwqZ" alt="Border Collie" typeof="foaf:Image" loading="lazy">
</picture>
</div>
</div>
<img src="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.jpg?itok=jhilnwqZ" alt="Border Collie" typeof="foaf:Image" loading="lazy">
<div class="text-image--text-wrapper col-12 col-xl-5 order-3 order-xl-2">
<div class="text-image--text">
<div class="clearfix text-formatted field field--name-field-c-sideimagetext-summary field--type-text-long field--label-hidden field__item"><h2>Pet Card</h2>
<ul>
<li><strong>Living Considerations:</strong> Not hypoallergenic, suitable for apartment living, good with older children</li>
<li><strong>Size:</strong> Medium</li>
<li><strong>Height:</strong> Males - 48 to 56 centimetres at the withers, Females - 45 to 53 centimetres at the withers</li>
<li><strong>Weight:</strong> Males -13 to 20 kilograms, Females - 12 to 19 kilograms</li>
<li><strong>Coat:</strong> Medium/Long</li>
<li><strong>Energy:</strong> High</li>
<li><strong>Colour:</strong> All colours or colour combinations</li>
<li><strong>Activities:</strong> Agility, Conformation, Herding, Obedience, Rally Obedience, Tracking</li>
<li><strong>Indoor/Outdoor:</strong> Both</li>
</ul>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
"""
Readability = tkitReadability()
content = Readability.html2text(html)
print(content)
# 输出为html
print(Readability.markdown2Html(content))
更新
version:'0.0.0.4'
加入的markdown的转换为html
文档查看 https://docs.terrychan.org/tkitreadability/
快速上传操作
可以自动查找依赖,然后上传
sh upload.sh
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for tkitreadability-0.0.0.5.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 260a15ac92337b1601e0037d261d9c328eaa208eac58fb3df3135bb5c8ada195 |
|
MD5 | 8b71cd6723efb6a86fbf2438ed16876c |
|
BLAKE2b-256 | f33be546051665a372ed14aa613e9086d03fb82914745a4f7cd53e2dcbf859a4 |
Close
Hashes for tkitreadability-0.0.0.5.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e88a0c682ae0691ee1a6e3a6d58d02c5572cd42f28331f9083795fa9d65848d0 |
|
MD5 | 9958348760906ab126825289aa55a72a |
|
BLAKE2b-256 | 8d3751e2c043d241ff8046ff96afd444ad9b0dd74094485e74501b51f9ab938c |