Skip to main content

Python implementation of Gowers distance, pairwise between records in two data sets

Project description

Build Status PyPI version

Introduction

Gower's distance calculation in Python. Gower Distance is a distance measure that can be used to calculate distance between two entity whose attribute has a mixed of categorical and numerical values. Gower (1971) A general coefficient of similarity and some of its properties. Biometrics 27 857–874.

Examples

Installation

pip install gower

Generate some data

import numpy as np
import pandas as pd
import gower

Xd=pd.DataFrame({'age':[21,21,19, 30,21,21,19,30,None],
'gender':['M','M','N','M','F','F','F','F',None],
'civil_status':['MARRIED','SINGLE','SINGLE','SINGLE','MARRIED','SINGLE','WIDOW','DIVORCED',None],
'salary':[3000.0,1200.0 ,32000.0,1800.0 ,2900.0 ,1100.0 ,10000.0,1500.0,None],
'has_children':[1,0,1,1,1,0,0,1,None],
'available_credit':[2200,100,22000,1100,2000,100,6000,2200,None]})
Yd = Xd.iloc[1:3,:]
X = np.asarray(Xd)
Y = np.asarray(Yd)

Find the distance matrix

gower.gower_matrix(X)
array([[0.        , 0.3590238 , 0.6707398 , 0.31787416, 0.16872811,
        0.52622986, 0.59697855, 0.47778758,        nan],
       [0.3590238 , 0.        , 0.6964303 , 0.3138769 , 0.523629  ,
        0.16720603, 0.45600235, 0.6539635 ,        nan],
       [0.6707398 , 0.6964303 , 0.        , 0.6552807 , 0.6728013 ,
        0.6969697 , 0.740428  , 0.8151941 ,        nan],
       [0.31787416, 0.3138769 , 0.6552807 , 0.        , 0.4824794 ,
        0.48108295, 0.74818605, 0.34332284,        nan],
       [0.16872811, 0.523629  , 0.6728013 , 0.4824794 , 0.        ,
        0.35750175, 0.43237334, 0.3121036 ,        nan],
       [0.52622986, 0.16720603, 0.6969697 , 0.48108295, 0.35750175,
        0.        , 0.2898751 , 0.4878362 ,        nan],
       [0.59697855, 0.45600235, 0.740428  , 0.74818605, 0.43237334,
        0.2898751 , 0.        , 0.57476616,        nan],
       [0.47778758, 0.6539635 , 0.8151941 , 0.34332284, 0.3121036 ,
        0.4878362 , 0.57476616, 0.        ,        nan],
       [       nan,        nan,        nan,        nan,        nan,
               nan,        nan,        nan,        nan]], dtype=float32)

Find Top n results

gower.gower_topn(Xd.iloc[0:2,:], Xd.iloc[:,], n = 5)
{'index': array([4, 3, 1, 7, 5]),
 'values': array([0.16872811, 0.31787416, 0.3590238 , 0.47778758, 0.52622986],
       dtype=float32)}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gower-0.0.4.tar.gz (4.4 kB view details)

Uploaded Source

File details

Details for the file gower-0.0.4.tar.gz.

File metadata

  • Download URL: gower-0.0.4.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.1.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for gower-0.0.4.tar.gz
Algorithm Hash digest
SHA256 b2076088d86c1380ca3cffc9b806f6e900b6e93b1a6079fd97fc082599fa0728
MD5 1730153a0f5293c986a1fa7a6d1fb186
BLAKE2b-256 c3e52c849b6d63970d7ba2fb628f28caa0b4817ab1dbd9a812673d10fc0214e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page