Project description

Pixelripper

Package and CLI for downloading media from a webpage.
Install with:

pip install pixelripper

Pixelripper contains a class called PixelRipper and a subclass called PixelRipperSelenium.
PixelRipper uses the requests library to fetch webpages and PixelRipperSelenium uses a selenium based engine to do the same.
The selenium engine is slower and requires more resources, but is useful for webpages that don't render their media content without a JavaScript engine.
It can use either Firefox or Chrome browsers.
Note: You must have the appropriate webdriver for your machine and browser version installed in order to use PixelRipperSelenium.
pixelripper can be used programmatically or from the command line.

Programmatic usage:

from pixelripper import PixelRipper
from pathlib import Path
ripper = PixelRipper()
# Scrape the page for image, video, and audio urls.
ripper.rip("https://somewebsite.com")
# Any content urls found will now be accessible as members of ripper.
print(ripper.image_urls)
print(ripper.video_urls)
print(ripper.audio_urls)
# All the urls found on a page can be accessed through the ripper.scraper member.
all_urls = ripper.scraper.get_links("all")
# The urls can also be filtered according to a list of extensions 
# with the filter_by_extensions function.
# The following will return only .jpg and .mp3 file urls.
urls = ripper.filter_by_extensions([".jpg", ".mp3"])
# The content can then be downloaded.
ripper.download_files(urls, Path.cwd()/"somewebsite")
# Alternatively, everything in ripper.image_urls, ripper.video_urls, and ripper.audio_urls
# can be downloaded with just a call to ripper.download_all()
ripper.download_all(Path.cwd()/"somewebsite")
# Separate subfolders named "images", "videos", and "audio"
# will be created inside the "somewebsite" folder when using this function.

Command line usage:

>pixelripper -h
usage: pixelripper [-h] [-s] [-nh] [-b BROWSER] [-o OUTPUT_PATH] [-eh [EXTRA_HEADERS ...]] url

positional arguments:
  url                   The url to scrape for media.

options:
  -h, --help            show this help message and exit
  -s, --selenium        Use selenium to get page content instead of requests.
  -nh, --no_headless    Don't use headless mode when using -s/--selenium.
  -b BROWSER, --browser BROWSER
                        The browser to use when using -s/--selenium. Can be 'firefox' or 'chrome'. You must have the appropriate webdriver installed for your machine and browser version in order to use the selenium engine.
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Output directory to save results to. If not specified, a folder with the name of the webpage will be created in the current working directory.
  -eh [EXTRA_HEADERS ...], --extra_headers [EXTRA_HEADERS ...]
                        Extra headers to use when requesting files as key, value pairs. Keys and values whould be colon separated and pairs should be space separated. e.g. -eh Referer:website.com/page Host:website.com

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.1

Mar 22, 2023

0.0.0

Feb 1, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pixelripper-0.0.1.tar.gz (57.4 kB view hashes)

Uploaded Mar 22, 2023 Source

Built Distribution

pixelripper-0.0.1-py3-none-any.whl (7.8 kB view hashes)

Uploaded Mar 22, 2023 Python 3

Hashes for pixelripper-0.0.1.tar.gz

Hashes for pixelripper-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`d2cce23ca51db8dcdd9c8acb7c25f7a08d128194bf1b3b296f4dd7de56dde687`
MD5	`0b13cb0ff105214ab2bbd5513827b207`
BLAKE2b-256	`db32c14da54fbe9c0dd0b85f2e563478d40457f6c97d52ef12c88b62c8b175f0`

Hashes for pixelripper-0.0.1-py3-none-any.whl

Hashes for pixelripper-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aa7dd6b59fce552b162fc9147c1cb4d72b2897ef620f639d89330f4df8e7e010`
MD5	`ac5af15e3cb6694152c2b0ed0114006e`
BLAKE2b-256	`9f203441aaaa8dcee8feab509f4b4c486f8adb39c36b41b00c8ef479bc671250`