download image from Baidu Image
Project description
BaiduImagesDownload
BaiduImagesDownload
是一个快速、简单百度图片爬取工具
from BaiduImagesDownload.crawler import Crawler
net, num, urls = Crawler.get_images_url('二次元', 20)
Crawler.download_images(urls)
目录
安装
pip install BaiduImagesDownload
使用
基本
from BaiduImagesDownload.crawler import Crawler
net, num, urls = Crawler.get_images_url('二次元', 20)
Crawler.download_images(urls)
设置图片格式
from BaiduImagesDownload.crawler import Crawler
# rule默认为('.png', '.jpg')
net, num, urls = Crawler.get_images_url('二次元', 20)
Crawler.download_images(urls, rule=('.png', '.jpg'))
设置timeout
from BaiduImagesDownload.crawler import Crawler
# timeout默认为60(s)
net, num, urls = Crawler.get_images_url('二次元', 20, timeout=60)
Crawler.download_images(urls, rule=('.png', '.jpg'), timeout=60)
文档
get_images_url
class Crawler:
@staticmethod
def get_images_url(word: str, num: int, timeout: int = __CONCURRENT_TIMEOUT) -> (bool, bool, list):
参数
word: str
: 搜索关键词num: int
: 搜索数量timeout: int
: 请求 timeout, 默认为60(s)
返回
net: bool
: 网络连接是否成功,成功为 True,失败为 Falsenum: bool
: 图片数量是否满足,满足为 True,不足为 Falseurls: list
: 获取的 urls,每项为一个dict
,其中有两个键obj_url
,from_url
。obj_url
为对应图片的url
,from_url
为Referer
download_images
class Crawler:
@staticmethod
def download_images(urls: list, rule: tuple = ('.png', '.jpg'),
path: str = 'download', timeout: int = __CONCURRENT_TIMEOUT,
concurrent: int = __CONCURRENT_NUM) -> (int, int):
参数
urls: list
: 需要爬的图片列表,格式与get_images_url
返回的相同rule: tuple, optional
: 允许下载的格式,默认为('.png', '.jpg')
path: str, optional
: 图片下载的路径,默认为'download'
timeout: int, optional
: 请求 timeout, 默认为60(s)
concurrent: int, optional
: 并行下载的数量,默认为100
返回
success: int
: 下载成功的数量failed: int
: 下载失败的数量
日志
可以设置日志的等级以及输出,具体请查看logging
import logging
from BaiduImagesDownload.crawler import logging
# 设置日志的等级为DEBUG
# 默认为INFO
logger.setLevel(logging.DEBUG)
# 设置输出到文件
file_handler = logging.FileHandler('~/BaiduImagesDownload.log')
file_handler.setFormatter(logging.Formatter(
'[%(asctime)s] [%(levelname)s] %(message)s')) # 设置输出格式
logger.addHandler(file_handler)
许可
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for BaiduImagesDownload-1.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b568462ce0ccac6f9199b0cdaf96eb2410242cc12a3a74603106bc8b0d0ce7b |
|
MD5 | db85c48e33e26e59b222e1ab21fa1d6f |
|
BLAKE2b-256 | 96d34dc99511464f7408a0ff51db2908d1daecdb402fb3a728d39181f72c482d |
Close
Hashes for BaiduImagesDownload-1.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9bf7155f285014b92f668cd4b791b5d1a85ffee59a6c982694cdf70cf003c16c |
|
MD5 | 6cba118440b9f1ae33271b024508c3f9 |
|
BLAKE2b-256 | 23bc825c54f51792f44bf829241785dfb5f8b33cb0584b999cc20f9e16af3362 |