Skip to main content

Clark University, Package for YouTube crawler and cleaning data

Project description

clarku-youtube-crawler

Clark University YouTube crawler and JSON decoder for YouTube json. Please read documentation in DOCS

Installing

To install,

pip install clarku-youtube-crawler

If missing requirements (I already include all dependencies so it shouldn't happen), download requirements.txt here on this repo and run

pip install -r requirements.txt

Example usage

To initialize,

# your_script.py
import clarku_youtube_crawler as cu

test = cu.RawCrawler()
test.__build__()
test.crawl("searchkey",start_date=14, start_month=12, start_year=2020, day_count=2)
test.crawl_videos_in_list(comment_page_count=1)
test.merge_all()

channel = cu.ChannelCrawler()
channel.__build__()
channel.setup_channel(subscriber_cutoff=1000, keyword="")
channel.crawl()
channel.crawl_videos_in_list(comment_page_count=1)
channel.merge_all()

jsonn = cu.JSONDecoder()
jsonn.load_json("YouTube_RAW_20201221/FINAL_raw_merged.json")

Changelog

Version 0.0.1->0.0.3

This is beta without testing since python packaging is a pain. Please don't install these versions.

Version 0.0.5

Finally figured out testing. It works okay. More documentation to come.

Version 0.0.6

Stable release only for RawCrawler feature

Version 1.0.0 Version 1.0.1

I think this might be our first full stable release.

Version 1.0.1.dev Pre-release

Added different file types for ChannelCrawler. Added documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clarku_youtube_crawler-1.0.1.dev0.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clarku_youtube_crawler-1.0.1.dev0-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file clarku_youtube_crawler-1.0.1.dev0.tar.gz.

File metadata

  • Download URL: clarku_youtube_crawler-1.0.1.dev0.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.7.8

File hashes

Hashes for clarku_youtube_crawler-1.0.1.dev0.tar.gz
Algorithm Hash digest
SHA256 148882a0ef59776049f06724fc106cb646ef7556c35d2a2ab24ed342b6db685a
MD5 6daef5345a8f0d203adcde991f8c016f
BLAKE2b-256 f0bf4ce3f4e4e75cbaf011c5b0a39dd058e2e035184fc81bb0dbb80e7f12c57f

See more details on using hashes here.

File details

Details for the file clarku_youtube_crawler-1.0.1.dev0-py3-none-any.whl.

File metadata

  • Download URL: clarku_youtube_crawler-1.0.1.dev0-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.7.8

File hashes

Hashes for clarku_youtube_crawler-1.0.1.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 7f582bd065f470dc0fa57ca6febc91ee2e90471eab819d2b50c665ae8be2daa6
MD5 95e29bf5c993a4d43cbee747e4d4e5d0
BLAKE2b-256 b575f64ef9dbf9c376a4eb808282a9c17e765ace735e29fa3c1178ce757dde6a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page