Skip to main content

Cape manages secure access to all of your data.

Project description

Cape Privacy

License codecov PyPI version

Cape Privacy offers data scientists and data engineers a policy-based interface for applying privacy-enhancing techniques across several popular libraries and frameworks to protect sensitive data throughout the data science life cycle.

Cape Python brings Cape's policy language to Pandas and Apache Spark, enabling you to collaborate on privacy-preserving policy at a non-technical level. The supported techniques include tokenization with linkability as well as perturbation and rounding. You can experiment with these techniques programmatically, in Python or in human-readable policy files. Stay tuned for more privacy-enhancing techniques in the future!

See below for instructions on how to get started or visit the documentation.

Getting Started

Cape Python is available via Pypi.

pip install cape-privacy

Support for Apache Spark is optional. If you plan on using the library together with Apache Spark, we suggest the following instead:

pip install cape-privacy[spark]

We recommend running it in a virtual environment, such as venv.

Installing from source

It is also possible to install the library from source.

git clone https://github.com/capeprivacy/cape-python.git
cd cape-python
make bootstrap

This will also install all dependencies, including Apache Spark. Make sure you have make installed before running the above.

Example

(this example is an abridged version of the tutorial found here)

To discover what different transformations do and how you might use them, it is best to explore via the transformations APIs:

df = pd.DataFrame({
        "name": ["alice", "bob"],
        "age": [34, 55],
        "birthdate": [pd.Timestamp(1985, 2, 23), pd.Timestamp(1963, 5, 10)],
    })

tokenize = Tokenizer(
    max_token_len=10,
    key=b"my secret",
)

perturb_numeric = NumericPerturbation(
    dtype=dtypes.Integer,
    min=-10,
    max=10,
)

df["name"] = tokenize(df["name"])
df["age"] = perturb_numeric(df["age"])

print(df.head())

# >>
#          name  age  birthdate
# 0  f42c2f1964   34 1985-02-23
# 1  2e586494b2   63 1963-05-10

These steps can be saved in policy files so you can share them and collaborate with your team:

# my-policy.yaml
label: my-policy
version: 1
rules:
  - match:
      name: age
    actions:
      - transform:
          type: numeric-perturbation
          dtype: Integer
          min: -10
          max: 10
          seed: 4984
  - match:
      name: name
    actions:
      - transform:
          type: tokenizer
          max_token_len: 10
          key: my secret

You can then load this policy and apply it to your data frame:

# df can be a Pandas or Spark data frame 
policy = cape.parse_policy("my-policy.yaml")
df = cape.apply_policy(policy, df)

print(df.head())
# >>
#          name  age  birthdate
# 0  f42c2f1964   34 1985-02-23
# 1  2e586494b2   63 1963-05-10

You can see more examples and usage here or by visiting our documentation.

Contributing and Bug Reports

Please file any feature request or bug report as GitHub issues.

License

Licensed under Apache License, Version 2.0 (see LICENSE or http://www.apache.org/licenses/LICENSE-2.0). Copyright as specified in NOTICE.

About Cape

Cape Privacy helps teams share data and make decisions for safer and more powerful data science. Learn more at capeprivacy.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cape-privacy-0.1.0.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cape_privacy-0.1.0-py3-none-any.whl (39.3 kB view details)

Uploaded Python 3

File details

Details for the file cape-privacy-0.1.0.tar.gz.

File metadata

  • Download URL: cape-privacy-0.1.0.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.3

File hashes

Hashes for cape-privacy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c383f86903b5c87430a2a02d0eeac0bee6fd1c73f515895f17805055ef980176
MD5 a47b0702ef93abca5bf1581319b0c0d2
BLAKE2b-256 dcde985967a17002a963c8b7aebe0e7cd646932989f62556c1afa22cf64f7786

See more details on using hashes here.

File details

Details for the file cape_privacy-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cape_privacy-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.3

File hashes

Hashes for cape_privacy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1751db3428b9e0571dec79dd95dd59d1c4103cc0493c4e46ddea7437eaf29322
MD5 14c45b7d0aa4f5d423d282145c6f8cbc
BLAKE2b-256 08fc943ef42bf3b86e2606fb711cf6e70ec67b4af8131a34ea90cc1a60bf35ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page