Skip to main content

A sink for Open edX events to send them to ClickHouse

Project description

Event Sink ClickHouse

PyPI CI Codecov Documentation Supported Python versions License status-badge

Purpose

This project acts as a plugin to the Edx Platform, listens for configured Open edX events, and sends them to a ClickHouse database for analytics or other processing. This is being maintained as part of the Open Analytics Reference System (OARS) project.

OARS consumes the data sent to ClickHouse by this plugin as part of data enrichment for reporting, or capturing data that otherwise does not fit in xAPI.

Sinks

Currently the only sink is in the CMS. It listens for the COURSE_PUBLISHED signal and serializes a subset of the published course blocks into one table and the relationships between blocks into another table. With those we are able to recreate the “graph” of the course and get relevant data, such as block names, for reporting.

Commands

In addition to being an event listener, this package provides commands for exporting the same data in bulk. This allows bootstrapping a new data platform or backfilling lost or missing data. Currently the only command is the Django command for the COURSE_PUBLISHED data:

python manage.py cms dump_courses_to_clickhouse

This command allows bulk export of all courses, or various limiting factors. Please see the command help for details:

python manage.py cms dump_courses_to_clickhouse -h

Getting Started

Developing

One Time Setup
# Clone the repository
git clone git@github.com:openedx/openedx-event-sink-clickhouse.git
cd openedx-event-sink-clickhouse

# Set up a virtualenv using virtualenvwrapper with the same name as the repo and activate it
mkvirtualenv -p python3.8 openedx-event-sink-clickhouse
Every time you develop something in this repo
# Activate the virtualenv
workon openedx-event-sink-clickhouse

# Grab the latest code
git checkout main
git pull

# Install/update the dev requirements
make requirements

# Run the tests and quality checks (to verify the status before you make any changes)
make validate

# Make a new branch for your changes
git checkout -b <your_github_username>/<short_description>

# Using your favorite editor, edit the code to make your change.
vim ...

# Run your new tests
pytest ./path/to/new/tests

# Run all the tests and quality checks
make validate

# Commit all your changes
git commit ...
git push

# Open a PR and ask for review.

Deploying

The Open edX Event Sink Clickhouse component is a django plugin which doesn’t need independent deployment. Therefore, its setup is reasonably straightforward. First, it needs to be added to your service requirements, and then it will be installed alongside requirements of the service.

This plugin will be deployed by default in an OARS Tutor environment. For other deployments install the library or add it to private requirements of your virtual environment ( requirements/private.txt ).

  1. Run pip install openedx-event-sink-clickhouse.

  2. Run migrations:

  • python manage.py lms migrate

  • python manage.py cms migrate

  1. Restart LMS service and celery workers of edx-platform.

Configuration

Currently all events will be listened to by default (there is only one). So the only necessary configuration is a ClickHouse connection:

EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG = {
    # URL to a running ClickHouse server's HTTP interface. ex: https://foo.openedx.org:8443/ or
    # http://foo.openedx.org:8123/ . Note that we only support the ClickHouse HTTP interface
    # to avoid pulling in more dependencies to the platform than necessary.
    "url": "http://clickhouse:8123",
    "username": "changeme",
    "password": "changeme",
    "database": "event_sink",
    "timeout_secs": 3,
}

Getting Help

Documentation

See documentation on Read the Docs.

More Help

If you’re having trouble, we have discussion forums at https://discuss.openedx.org where you can connect with others in the community.

Our real-time conversations are on Slack. You can request a Slack invitation, then join our community Slack workspace.

For anything non-trivial, the best path is to open an issue in this repository with as many details about the issue you are facing as you can provide.

https://github.com/openedx/openedx-event-sink-clickhouse/issues

For more information about these options, see the Getting Help page.

License

The code in this repository is licensed under the AGPL 3.0 unless otherwise noted.

Please see LICENSE.txt for details.

Contributing

Contributions are very welcome. Please read How To Contribute for details.

This project is currently accepting all types of contributions, bug fixes, security fixes, maintenance work, or new features. However, please make sure to have a discussion about your new feature idea with the maintainers prior to beginning development to maximize the chances of your change being accepted. You can start a conversation by creating a new issue on this repo summarizing your idea.

The Open edX Code of Conduct

All community members are expected to follow the Open edX Code of Conduct.

People

The assigned maintainers for this component and other project details may be found in Backstage. Backstage pulls this data from the catalog-info.yaml file in this repo.

Reporting Security Issues

Please do not report security issues in public. Please email security@openedx.org.

Change Log

Unreleased

0.1.0 – 2023-05-11

Added

  • First release on PyPI

  • CMS listener for COURSE_PUBLISHED

  • Management command to bulk push course data to ClickHouse

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openedx_event_sink_clickhouse-0.2.2.tar.gz (41.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openedx_event_sink_clickhouse-0.2.2-py2.py3-none-any.whl (37.2 kB view details)

Uploaded Python 2Python 3

File details

Details for the file openedx_event_sink_clickhouse-0.2.2.tar.gz.

File metadata

File hashes

Hashes for openedx_event_sink_clickhouse-0.2.2.tar.gz
Algorithm Hash digest
SHA256 61c3de79beae923ae713946a067f6b694b2301ae7de25f193c9b83164cbce9c0
MD5 de3fe57c13f4576ed6786d9ed0ff1a86
BLAKE2b-256 56061e77cfb8e18c70b7479cf49ae64357fd981d0f8edeec75fb8b05c20c3a95

See more details on using hashes here.

File details

Details for the file openedx_event_sink_clickhouse-0.2.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for openedx_event_sink_clickhouse-0.2.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f12944a27c9bec2004eeccc0b6b4fe5655404f452ccaaa1d35d6320c6cef5e61
MD5 5b9bef309e989ac47e9a1255027a29b8
BLAKE2b-256 7748c38e8846677c8b9ed72b904fb30b20d9e17abef24c4f2dc9fe4dc7894ff3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page