Skip to main content

download image from Baidu Image

Project description

BaiduImagesDownload

Python package codecov Codacy Badge

BaiduImagesDownload是一个快速、简单百度图片爬取工具

from BaiduImagesDownload.crawler import Crawler

net, num, urls = Crawler.get_images_url('二次元', 20)
Crawler.download_images(urls)

目录

安装

pip install BaiduImagesDownload

使用

基本

from BaiduImagesDownload.crawler import Crawler

net, num, urls = Crawler.get_images_url('二次元', 20)
Crawler.download_images(urls)

设置图片格式

from BaiduImagesDownload.crawler import Crawler

# rule默认为('.png', '.jpg')
net, num, urls = Crawler.get_images_url('二次元', 20)
Crawler.download_images(urls, rule=('.png', '.jpg'))

设置timeout

from BaiduImagesDownload.crawler import Crawler

# timeout默认为60(s)
net, num, urls = Crawler.get_images_url('二次元', 20, timeout=60)
Crawler.download_images(urls, rule=('.png', '.jpg'), timeout=60)

文档

get_images_url

class Crawler:

    @staticmethod
    def get_images_url(word: str, num: int, timeout: int = __CONCURRENT_TIMEOUT) -> (bool, bool, list):

参数

  • word: str: 搜索关键词
  • num: int: 搜索数量
  • timeout: int: 请求timeout, 默认为60(s)

返回

  • net: bool: 网络连接是否成功,成功为True,失败为False
  • num: bool: 图片数量是否满足,满足为True,不足为False
  • urls: list: 获取的urls,每项为一个dict,其中有两个键obj_urlfrom_urlobj_url为对应图片的urlfrom_urlReferer

download_images

class Crawler:

    @staticmethod
    def download_images(urls: list, rule: tuple = ('.png', '.jpg'),
                        path: str = 'download', timeout: int = __CONCURRENT_TIMEOUT,
                        concurrent: int = __CONCURRENT_NUM) -> (int, int):

参数

  • urls: list: 需要爬的图片列表,格式与get_images_url返回的相同
  • rule: tuple, optional: 允许下载的格式,默认为('.png', '.jpg')
  • path: str, optional: 图片下载的路径,默认为'download'
  • timeout: int, optional: 请求timeout, 默认为60(s)
  • concurrent: int, optional: 并行下载的数量,默认为100

返回

  • success: int: 下载成功的数量
  • failed: int: 下载失败的数量

日志

可以设置日志的等级以及输出,具体请查看logging

import logging
from BaiduImagesDownload.crawler import logging

# 设置日志的等级为DEBUG
# 默认为INFO
logger.setLevel(logging.DEBUG)

# 设置输出到文件
file_handler = logging.FileHandler('~/BaiduImagesDownload.log')
file_handler.setFormatter(logging.Formatter(
    '[%(asctime)s] [%(levelname)s] %(message)s')) # 设置输出格式
logger.addHandler(file_handler)

许可

License: MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

BaiduImagesDownload-0.0.3.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

BaiduImagesDownload-0.0.3-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file BaiduImagesDownload-0.0.3.tar.gz.

File metadata

  • Download URL: BaiduImagesDownload-0.0.3.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for BaiduImagesDownload-0.0.3.tar.gz
Algorithm Hash digest
SHA256 6b2e5a162051a382df10de95e623888b9c7a50e581ac9f7cf49847dbc5fbbfae
MD5 ad54bcbff809c26d4249cab19a3e9f67
BLAKE2b-256 f6e30a7076bb52595f349c3c3fee9a17600d9d81b8f956a30dd55f28f1458cae

See more details on using hashes here.

File details

Details for the file BaiduImagesDownload-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: BaiduImagesDownload-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for BaiduImagesDownload-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4fe2e7428b004497243736d9675004f327229118bfc1e7fceea64800ae485392
MD5 49b5fe26d4284c2ff7624b0a37ebf1fd
BLAKE2b-256 28457dd84b97db04807218cf5eb723d1bbab3c8885c7afbac88575ceb34de2bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page