Skip to main content

Select, weight and analyze complex sample data

Project description

samplics is a python package for selecting, weighting and analyzing sample obtained from complex sampling design.

Sample Analytics

In large scale surveys, often complex random mechanisms are used to select samples. Estimations obtained from such samples must reflect the random mechanism to ensure accurate calculations. samplics implements a set of sampling techniques for complex survey designs.

Selection

Since the full population cannot be observed, a sample is selected to estimate population parameters of interest. The assumption is that the sample is representative of the population for the characteristics of interest. The selection methods in samplics are:

  • Simple random sampling (SRS)

  • Systematic selection (SYS)

  • Probability proportional to size (PPS)
    • Systematic

    • Brewer’s method

    • Hanurav-Vijayan method

    • Murphy’s method

    • Sampford’s method

  • Unequal sample selection

Weighting

Sample weighting is the main mechanism used in surveys to formalize the representivity of the sample. The base or design weights are usually adjusted to compensate for distorsions due nonresponse and other shorcomings of the operationalization of the sampling design.

  • Weight adjustment due to nonresponse

  • Weight poststratification, calibration and normalization

  • Weight replication i.e. Boostrap, BRR, and Jackknife

Estimation

The estimation of the parameters of interest must reflect the sampling mechanism and the weight adjustments.

  • Taylor linearization procedures

  • Replicate-based estimation i.e. Boostrap, BRR, and Jackknife

  • Regression-based

Parameters of interest * Linear parameters e.g. total, mean, proportion * Non-linear (complex) parameters e.g. ratio, regression coefficient

Installation

pip install samplics

if both Python 2.x and python 3.x is installed on your computer, you may have to use: pip3 install samplics

Dependencies

Python versions 3.6.x or newer and the following packages:

Usage

To select a sample of primary sampling units using PPS method, we can use a code similar to:

import samplics
from samplics.sampling import Sample

psu_frame = pd.read_csv("psu_frame.csv")
psu_sample_size = {"East":3, "West": 2, "North": 2, "South": 3}
pps_design = Sample(method="pps-sys", stratification=True, with_replacement=False)
frame["psu_prob"] = pps_design.inclusion_probs(
    psu_frame["cluster"],
    psu_sample_size,
    psu_frame["region"],
    psu_frame["number_households_census"]
    )

To adjust the design sample weight for nonresponse, we can use a code similar to:

import samplics
from samplics.weighting import SampleWeight

status_mapping = {
    "in": "ineligible", "rr": "respondent", "nr": "non-respondent", "uk":"unknown"
    }

full_sample["nr_weight"] = SampleWeight().adjust(
    samp_weight=full_sample["design_weight"],
    adjust_class=full_sample["region"],
    resp_status=full_sample["response_status"],
    resp_dict=status_mapping
    )
import samplics
from samplics.estimation import TaylorEstimation, ReplicateEstimator

zinc_mean_str = TaylorEstimator("mean").estimate(
    y=nhanes2f["zinc"],
    samp_weight=nhanes2f["finalwgt"],
    stratum=nhanes2f["stratid"],
    psu=nhanes2f["psuid"],
    exclude_nan=True
)

ratio_wgt_hgt = ReplicateEstimator("brr", "ratio").estimate(
    y=nhanes2brr["weight"],
    samp_weight=nhanes2brr["finalwgt"],
    x=nhanes2brr["height"],
    rep_weights=nhanes2brr.loc[:, "brr_1":"brr_32"],
    exclude_nan = True
)

Contributing

TBD

License

MIT

Project status

This is an alpha version. At this stage, this project is not recommended to be used for production or any project that the user depend on.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

samplics-0.0.9.tar.gz (45.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

samplics-0.0.9-py3-none-any.whl (52.3 kB view details)

Uploaded Python 3

File details

Details for the file samplics-0.0.9.tar.gz.

File metadata

  • Download URL: samplics-0.0.9.tar.gz
  • Upload date:
  • Size: 45.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.6 Darwin/19.4.0

File hashes

Hashes for samplics-0.0.9.tar.gz
Algorithm Hash digest
SHA256 36f655700b211c0d2c3d1de1b26b13618bfeebd3938cf8490eedc049074c7710
MD5 d8dba9003ca4ef7bae95ced5af5fbcac
BLAKE2b-256 6c6d71fc30945d723c30e7018c14abe88f998db870422092b76e730f0a03e2bf

See more details on using hashes here.

File details

Details for the file samplics-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: samplics-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 52.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.6 Darwin/19.4.0

File hashes

Hashes for samplics-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 9f5f3c80e379444e9ab7cfa716e4a8a4e8877741943b3149ceb0baa1ac5a2433
MD5 1502573ed2a3044b56a789d3f6eb1e72
BLAKE2b-256 416d9e579784dcbe499b5b1a6aeb9eacf6da2817e1dd8dbdbdf0df562737f3b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page