Skip to main content

Anonymize CSV datasets

Project description

vendetta

Build Status Python Version wemake-python-styleguide

Anonymize CSV file(s) by replacing sensitive values with fakes.

Installation

pip install vendetta

Example

Suppose you have orders.csv dataset with real customer names and order IDs.

CustomerName,CustomerLastName,OrderID
Darth,Wader,1254
Darth,Wader,1255
,Yoda,1256
Luke,Skywalker,1257
Leia,Skywalker,1258
,Yoda,1259

This list contains 4 unique customers. Let's create a configuration file, say, orders.yaml:

columns:
  CustomerName: first_name
  CustomerLastName: last_name

and run:

vendetta anonymize orders.yaml < orders.csv > anon.csv

which gives something like this in anon.csv:

CustomerName,CustomerLastName,OrderID
Elizabeth,Oliver,1254
Elizabeth,Oliver,1255
Karen,Rodriguez,1256
Jonathan,Joseph,1257
Katelyn,Joseph,1258
Karen,Rodriguez,1259
  • OrderID column was not mentioned in the config, and was left as is
  • Using faker, program replaced the first and last names with random first and last names, making the data believable
  • If in the source file two cells for the same column had the same value (Vader), the output file will also have identical values in these cells.

Enjoy!

License

MIT

Credits

This project was generated with wemake-python-package. Current template version is: b80221aaae4ac702bea7e66b77b9389d527c1e3c. See what is updated since then.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vendetta-0.0.2.tar.gz (5.0 kB view hashes)

Uploaded Source

Built Distribution

vendetta-0.0.2-py3-none-any.whl (5.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page