Yet another manga scraper and downloader
Project description
tankobon
What?
tankobon is a website scraper for comics and mangas. tankobon relies on stores, which define how to parse a website for chapters and chapters for links to the pages themselves. (somewhat like youtube-dl extractors.) Currently, the following websites are supported:
komi-san.com
m.mangabat.com
mangadex.org
mangakakalot.com
Creating a Store
A store is a regular Python module in the stores/
folder.
It should provide a Parser
class, which is a subclass of tankobon.manga.Parser
.
The following methods below must be implemented:
chapters(self) -> Generator[Tuple[str, Dict[str, str]], None, None]
Yields chapter_info which looks like this:
{
"id": ..., # chapter number
"title": ..., # chapter title
"url": ..., # chapter url
"volume": ..., # volume, i.e '0'
}
Volume is optional and may be undefined. Example:
def chapters(self):
# use self.soup to access the title page
for href in self.soup.find_all("a", href=True):
# validify href here and parse chapter id
...
yield {"id": ..., "title": href.text, "url": href["href"]}
pages(self, chapter_data: Dict[str, str]) -> List[str]
Return a list of urls to a chapter's pages, given the chapter data yielded from chapters()
.
The pages must be in order (page 1 is [0], page 2 is [1], etc.) Example:
def pages(self, chapter_data):
pages = []
# to get the chapter's html, use self.session.get (requests session)
# or self.soup (html already parsed by BeautifulSoup).
chapter_page = self.soup_from_url(chapter_data["url"])
for href in chapter_page.find_all("a", href=True):
# validify href here
...
pages.append(href["href"])
return pages
The following methods below may or may not be implemented: generic implementations are provided.
title(self) -> str
Return the title of the manga. Example:
def title(self):
return self.soup.title
Index Compatibility
Between version v3.1.0a1 and v3.2.0a0, the location of the index file has moved from site-packages to ~/.tankobon/index.json
, specific to each install of tankobon.
Todo
- download pre-parsed indexes from a special Github repo (tankobon-index?)
- create GUI to make downloading easier (like youtube-DLG)
Usage
tankobon download 'https://komi-san.com' # download all chapters
tankobon store info 'komi_san/https://komi-san.com' # and then get info on the chapters
Install
python(3) -m pip install tankobon
Build
All my python projects now use flit to build and publish.
To build, do flit build
.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for tankobon-5.0.0b0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3d21eda97358ec8d03c316eaf6bc258d214584f46b9b69fe56a7db11083ea8c |
|
MD5 | 842230d0c0b7a15806f17b09a4c66f8a |
|
BLAKE2b-256 | c72441beb79b4d75d4b109bf7bade9f417607c521fe5102a6bb72ab55a0b124a |