Skip to main content

quality control meteorological data in a pandas.DataFrame

Project description

ci docs pre-commit.ci status

meteo-qc

meteo_qc is a customizable framework for applying quality checks to meteorological data. The framework can be easily extended by registering custom functions/plugins.

Installation

To install meteo-qc, open an interactive shell and run

pip install meteo-qc

Getting started

Check out the Documentation for detailed information.

Apply the quality control to this csv data called test_data.csv:

date,temp,pressure_reduced
2022-01-01 10:00:00,1,600
2022-01-01 10:10:00,2,1024
2022-01-01 10:20:00,3,1024
2022-01-01 10:30:00,4,1090
2022-01-01 10:50:00,4,
2022-01-01 11:00:00,,1024
2022-01-01 11:10:00,2,1024
2022-01-01 11:20:00,3,1024
2022-01-01 11:30:00,4,1090
2022-01-01 11:40:00,4,1090
  1. Read in the data as a pd.DataFrame.
  2. Create a meteo_qc.ColumnMapping object and use the column names as keys to use the method add_group to add the column to the group (temperature or pressure). This can be an existing group or a new group.
  3. Call meteo_qc.apply_qc to apply the control to the DataFrame data using the column_mapping as a definition for the checks to be applied.
import pandas as pd
import meteo_qc

# read in the data
data = pd.read_csv('test_data.csv', index_col=0, parse_dates=True)

# map the columns to groups
column_mapping = meteo_qc.ColumnMapping()
column_mapping['temp'].add_group('temperature')
column_mapping['pressure_reduced'].add_group('pressure')

# apply the quality control
result = meteo_qc.apply_qc(df=data, column_mapping=column_mapping)
print(result)

This will result in this object which can be used to display the result in a nice way e.g. using an html template to render it.

{
    'columns': defaultdict(<function apply_qc.<locals>.<lambda> at 0x7f9b0edd5480>, {
        'temp': {
            'results': {
                'missing_timestamps': Result(
                    function='missing_timestamps',
                    passed=False,
                    msg='missing 1 timestamps (assumed frequency: 10min)',
                    data=None,
                ),
                'null_values': Result(
                    function='null_values',
                    passed=False,
                    msg='found 1 values that are null',
                    data=[[1641034800000, None, True]],
                ),
                'range_check': Result(
                    function='range_check',
                    passed=True,
                    msg=None,
                    data=None,
                ),
                'spike_dip_check': Result(
                    function='spike_dip_check',
                    passed=True,
                    msg=None,
                    data=None,
                ),
                'persistence_check': Result(
                    function='persistence_check',
                    passed=True,
                    msg=None,
                    data=None,
                )
            },
            'passed': False,
        },
        'pressure_reduced': {
            'results': {
                'missing_timestamps': Result(
                    function='missing_timestamps',
                    passed=False,
                    msg='missing 1 timestamps (assumed frequency: 10min)',
                    data=None,
                ),
                'null_values': Result(
                    function='null_values',
                    passed=False,
                    msg='found 1 values that are null',
                    data=[[1641034200000, None, True]],
                ),
                'range_check': Result(
                    function='range_check',
                    passed=False,
                    msg='out of allowed range of [860 - 1055]',
                    data=[[1641031200000, 600.0, True], [1641033000000, 1090.0, True], [1641036600000, 1090.0, True], [1641037200000, 1090.0, True]],
                ),
                'spike_dip_check': Result(
                    function='spike_dip_check',
                    passed=False,
                    msg='spikes or dips detected. Exceeded allowed delta of 0.3 / min',
                    data=[[1641031800000, 1024.0, True], [1641033000000, 1090.0, True], [1641034200000, None, True], [1641036600000, 1090.0, True]],
                ),
                'persistence_check': Result(
                    function='persistence_check',
                    passed=True,
                    msg=None,
                    data=None,
                )
            },
            'passed': False
        }
    }),
    'passed': False,
    'data_start_date': 1641031200000,
    'data_end_date': 1641037200000,
}

It is also possible to write and register your own functions if they are not already in the predefined Groups. Please check out the Docs for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meteo_qc-0.4.5.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

meteo_qc-0.4.5-py2.py3-none-any.whl (12.2 kB view details)

Uploaded Python 2Python 3

File details

Details for the file meteo_qc-0.4.5.tar.gz.

File metadata

  • Download URL: meteo_qc-0.4.5.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for meteo_qc-0.4.5.tar.gz
Algorithm Hash digest
SHA256 afe4b3a1fe95f96b247a9e12323752a5ea763a33f304ff3573ea2e0642781f16
MD5 3528d4ef28c56f045f8497325af8e8d3
BLAKE2b-256 bb127c12349e32a225483bea4f81e2b3cf9a023f743c150693fa76769ae0c8aa

See more details on using hashes here.

File details

Details for the file meteo_qc-0.4.5-py2.py3-none-any.whl.

File metadata

  • Download URL: meteo_qc-0.4.5-py2.py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for meteo_qc-0.4.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 2fdbbe4b8a3614aa5b819169c8b0008b92661277c377a7c12f48a5b7ad3f241c
MD5 664f3f22163c83523bda81d1b2668f07
BLAKE2b-256 d4d11b878a8eb8f7ef77640e06ee210c52c897ec4a43070c53e790fda5645c91

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page