download image from Baidu Image
Project description
BaiduImagesDownload
BaiduImagesDownload
是一个快速、简单百度图片爬取工具
from BaiduImagesDownload.crawler import Crawler
net, num, urls = Crawler.get_images_url('二次元', 20)
Crawler.download_images(urls)
目录
安装
pip install BaiduImagesDownload
使用
基本
from BaiduImagesDownload.crawler import Crawler
net, num, urls = Crawler.get_images_url('二次元', 20)
Crawler.download_images(urls)
设置图片格式
from BaiduImagesDownload.crawler import Crawler
# rule默认为('.png', '.jpg')
net, num, urls = Crawler.get_images_url('二次元', 20)
Crawler.download_images(urls, rule=('.png', '.jpg'))
设置 timeout
from BaiduImagesDownload.crawler import Crawler
# timeout默认为60(s)
net, num, urls = Crawler.get_images_url('二次元', 20, timeout=60)
Crawler.download_images(urls, rule=('.png', '.jpg'), timeout=60)
文档
get_images_url
class Crawler:
@staticmethod
def get_images_url(word: str, num: int, timeout: int = __CONCURRENT_TIMEOUT) -> (bool, bool, list):
参数
word: str
: 搜索关键词num: int
: 搜索数量timeout: int
: 请求 timeout, 默认为60(s)
返回
net: bool
: 网络连接是否成功,成功为 True,失败为 Falsenum: bool
: 图片数量是否满足,满足为 True,不足为 Falseurls: list
: 获取的 urls,每项为一个dict
,其中有两个键obj_url
,from_url
。obj_url
为对应图片的url
,from_url
为Referer
download_images
class Crawler:
@staticmethod
def download_images(urls: list, rule: tuple = ('.png', '.jpg'),
path: str = 'download', timeout: int = __CONCURRENT_TIMEOUT,
concurrent: int = __CONCURRENT_NUM) -> (int, int):
参数
urls: list
: 需要爬的图片列表,格式与get_images_url
返回的相同rule: tuple, optional
: 允许下载的格式,默认为('.png', '.jpg')
path: str, optional
: 图片下载的路径,默认为'download'
timeout: int, optional
: 请求 timeout, 默认为60(s)
concurrent: int, optional
: 并行下载的数量,默认为100
返回
success: int
: 下载成功的数量failed: int
: 下载失败的数量
日志
可以设置日志的等级以及输出,具体请查看logging
import logging
from BaiduImagesDownload.crawler import logging
# 设置日志的等级为DEBUG
# 默认为INFO
logger.setLevel(logging.DEBUG)
# 设置输出到文件
file_handler = logging.FileHandler('~/BaiduImagesDownload.log')
file_handler.setFormatter(logging.Formatter(
'[%(asctime)s] [%(levelname)s] %(message)s')) # 设置输出格式
logger.addHandler(file_handler)
许可
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for BaiduImagesDownload-1.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 443e5b84d6413265011d47f74b11cc8f8624cfdf870c1aab3e190fe835faf6cf |
|
MD5 | 4d87dc4ba083184376a828cf5b6a71f6 |
|
BLAKE2b-256 | fc4f00efb45da0b8654ab8468c6d04e3d2603e8fd8aa6d699bd13254b988238f |
Close
Hashes for BaiduImagesDownload-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f029a679ecdf1e57c3fabc8e54c02ac655e7ef4309306e37c8720b43c6f987f2 |
|
MD5 | 2eba4511a4b99963cd962f3cff701546 |
|
BLAKE2b-256 | a713b3d999f35838ca788612599c4a7f7792040e834296c23f3e1bb0d3c0af13 |