Skip to main content

Predict religion and caste based on name

Project description

pranaam: predict religion from name

ci image Documentation image

Pranaam uses the Bihar Land Records data, plot-level land records (N= 41.87 million plots or 12.13 individuals/accounts across 35,626 villages), to build machine learning models that predict religion and caste from the name. Our final dataset has around 4M unique records. To learn how to transform the data and the models underlying the package, check the notebooks.

The first function we are releasing with the package is [pred_rel]{.title-ref}, which predicts religion based on the name (currently only [muslim]{.title-ref} or [not]{.title-ref}). (For context, nearly 95% of India's population are Hindu or Muslim, with Sikhs, Buddhists, Christians, and other groups making up the rest.) The OOS accuracy assessed on unseen names is nearly 98% for both Hindi and English models.

Our training data is in Hindi. To build models that classify names provided in English, we used the indicate package to transliterate our training data to English.

We are releasing this software in the hope that it enables activists and researchers

  1. Highlight biases
  2. Fight biases
  3. Prevent biases (regress out some of these biases in models built on natural language corpus with person names).

Install

We strongly recommend installing pranaam inside a Python virtual environment. (see venv documentation)

Standard Installation

pip install pranaam

This installs TensorFlow 2.14.1, which is known to work correctly with the models.

Requirements

  • Python 3.10 or 3.11 (TensorFlow 2.14.1 compatibility requirement)
  • TensorFlow 2.14.1 (automatically installed)

Note: This package requires TensorFlow 2.14.1 with Keras 2.14.0 for model compatibility. Python 3.12+ is not currently supported due to TensorFlow availability constraints.

General API

  1. pranaam.pred_rel takes a list of Hindi/English names and predicts whether the person is Muslim or not.

Examples

By using names in English :

from pranaam import pranaam
names = ["Shah Rukh Khan", "Amitabh Bachchan"]
result = pranaam.pred_rel(names)
print(result)

output -:

name  pred_label  pred_prob_muslim
0    Shah Rukh Khan      muslim              73.0
1  Amitabh Bachchan  not-muslim              27.0

By using names in Hindi :

from pranaam import pranaam
names = ["शाहरुख खान", "अमिताभ बच्चन"]
result = pranaam.pred_rel(names, lang="hin")
print(result)

output -:

name  pred_label  pred_prob_muslim
0    शाहरुख खान      muslim              73.0
1  अमिताभ बच्चन  not-muslim              27.0

Functions

We expose one function, which takes Hindi/English text (name) and predicts religion and caste.

  • pranaam.pred_rel(input)
    • What it does:
      • predicts religion based on hindi/english text (name)
    • Output
      • Returns pandas with name and label (muslim/not-muslim)

Authors

Rajashekar Chintalapati, Aaditya Dar, and Gaurav Sood

🔗 Adjacent Repositories

  • appeler/naampy — Infer Sociodemographic Characteristics from Names Using Indian Electoral Rolls
  • appeler/parsernaam — AI name parsing. Predict first or last name using a DL model.
  • appeler/namesexdata — Data on international first names and sex of people with that name
  • appeler/graphic_names — Infer the gender of person with a particular first name using Google image search and Clarifai
  • appeler/ethnicolr2 — Ethnicolr implementation with new models in pytorch

Contributor Code of Conduct

The project welcomes contributions from everyone! It depends on it. To maintain this welcoming atmosphere and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.

License

The package is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pranaam-0.5.0.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pranaam-0.5.0-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file pranaam-0.5.0.tar.gz.

File metadata

  • Download URL: pranaam-0.5.0.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pranaam-0.5.0.tar.gz
Algorithm Hash digest
SHA256 213c4a53dfe2bf24bbf13d287a74acb39472c6d52124706cc002490c08abcda9
MD5 eb38672d6d042351c2ce50f616df56d5
BLAKE2b-256 6e92e595c3f7c3ba87fdbdb7f3b2556184669aca4c0cf59af3539f273deb62e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for pranaam-0.5.0.tar.gz:

Publisher: python-publish.yml on appeler/pranaam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pranaam-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: pranaam-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pranaam-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f582aa3026092c94db91e88c8ea1c02084d16ce43143bb4f8454a3cd40ec5873
MD5 5eaf488c8d1f7b5d7c50a2dcc213ad53
BLAKE2b-256 a5f3711b5b486785104e123ca62aa27f25c73c3e7871a472a1ad0b053eb21c24

See more details on using hashes here.

Provenance

The following attestation bundles were made for pranaam-0.5.0-py3-none-any.whl:

Publisher: python-publish.yml on appeler/pranaam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page