Skip to main content

Segregation Index

Project description

Reardon's Segregation Index for Continuous Variables

  • Many kinds of segregation index are used for various purposes of from policies to studies. While there are a wide range of categorical variables like race group to meausre an amount of segregation, continuous values like income are also important but do not fit very well with categorical segregation index.

  • Reardon(2011), Reardon and Bischoff(2011) propsed a rank-order segregation index based on Theil index of the concept of Entropy. This index is widely accepted for practical and academic uses to calculate continuous value based segregation index like in Chetty et al. 2014.

  • The proposed method of them is a bit intricate however and there seems to be no good online library or code that implements it. Therefore, here is one. Python codes inside src/segregation_index.py implements Rank-Order Information Theory Index of Reardon(2011).

  • Essentials of the Index

    • The inequality index H is an average of each value from a total of K sectors, which is total region's entropy(E) minus each sector's entropy(E_K). It is weighted by each sector's relative popultaion size(t_k/T). Here the entropy stands for how equally variables(ex. income) are distributed over sectors.

    • Below is an equation to calculate entropy when there is two groups. p is a ratio of a group. As the variable here is continuous not categorical, one needs to integrate the below equation over p with a range of 0≤p≤1. Thus transformation of raw values into rank ordered values is required.

    • Combining above equations, we can calculate below one to get a Rank-Order Information Theory Index, which is the segregation index for continuous variables. 0 means perfect equality. 1 means perfect segregation.


  • Use

    from src.segregation_index import estimate_Hp
    
    # Say, each variables are income.
    # Each inner list stands for a sector.
    areas1 = [[80, 80, 70, 70], [50, 45, 40],[20, 20, 20, 10]]
    areas2 = [[80, 70, 50], [80, 70, 45, 20, 20], [40, 20, 10]]
    
    print(estimate_Hp(areas1))
    print(estimate_Hp(areas2))
    
    >> 0.7182    
    >> 0.3191
    # areas1 is more income-way segregated than areas2. In other words, areas2 is more mixed.
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

segdex-0.1.1.tar.gz (4.9 kB view hashes)

Uploaded Source

Built Distribution

segdex-0.1.1-py3-none-any.whl (5.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page