Skip to main content

Clark University, Package for YouTube crawler and cleaning data

Project description

clarku-youtube-crawler

Clark University YouTube crawler and JSON decoder for YouTube json. Please read documentation in DOCS

Installing

To install,

pip install clarku-youtube-crawler

The crawler needs multiple other packages to function. If missing requirements (I already include all dependencies so it shouldn't happen), download requirements.txt . Navigate to the folder where it contains requirements.txt and run

pip install -r requirements.txt

Example usage

To initialize,

# your_script.py
import clarku_youtube_crawler as cu

test = cu.RawCrawler()
test.__build__()
test.crawl("searchkey",start_date=14, start_month=12, start_year=2020, day_count=2)
test.crawl_videos_in_list(comment_page_count=1)
test.merge_all()

channel = cu.ChannelCrawler()
channel.__build__()
channel.setup_channel(subscriber_cutoff=1000, keyword="")
channel.crawl()
channel.crawl_videos_in_list(comment_page_count=1)
channel.merge_all()

jsonn = cu.JSONDecoder()
jsonn.load_json("YouTube_RAW_20201221/FINAL_raw_merged.json")

Changelog

Version 0.0.1->0.0.3

This is beta without testing since python packaging is a pain. Please don't install these versions.

Version 0.0.5

Finally figured out testing. It works okay. More documentation to come.

Version 0.0.6

Stable release only for RawCrawler feature

Version 1.0.0 Version 1.0.1

I think this might be our first full stable release.

Version 1.0.1.dev Pre-release

Added different file types for ChannelCrawler. Added documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clarku_youtube_crawler-1.1.0.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clarku_youtube_crawler-1.1.0-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file clarku_youtube_crawler-1.1.0.tar.gz.

File metadata

  • Download URL: clarku_youtube_crawler-1.1.0.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.9.1

File hashes

Hashes for clarku_youtube_crawler-1.1.0.tar.gz
Algorithm Hash digest
SHA256 91bf299d7ffbf16c80fa85acca9fa63a9a165ccf94eca0d1d2700202757fb888
MD5 bc2e9827c9cfbeadd537fe567adbe965
BLAKE2b-256 9937ccf57809b119ff04703d84b6026715a0abdd1e923c56df9be06294d2397a

See more details on using hashes here.

File details

Details for the file clarku_youtube_crawler-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: clarku_youtube_crawler-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.9.1

File hashes

Hashes for clarku_youtube_crawler-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 17c2d10340d4bbf047455d8ba2265f4768f3fbf0a9b0dc8c34a6f3f139f292dd
MD5 bdd7aca30fb4a327f60d0a7525571ae7
BLAKE2b-256 0460b7a25d27987bd8549c995c3e0a46907d88789e67f6a9a963899d6aed3ab0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page