Skip to main content

A collection of code I have either made or found that helps streamline things. For Data Analysis.

Project description

=================================================================== ----------------------- larkinlab 0.0.13 ------------------------

This library contains the functions I have created or come accross that I find myself using often.

I will be adding functions as I create and find them, so be sure to update to the latest version.

Check the CHANGELOG for release info.

======== In The Future ========

  • long description for pypi
  • a new plot_ex(df, type='') function for larkinlab.explore, to return various graphs for quick analysis. Plans to incluse scatterplot, regplot, bubble plot, bar chart, pie chart, histogram and more. Will work by returning plots for each column in dataframe.
  • a set of colex (column explore) functions to do some of the stuff over columns rather than the entire dataframe.
  • a set of functions to perform basic machine learning algorithms over dataframes and return evaluation metrics also with a colex function to come later.

======================================================================================== ------------------------- Code Descriptions ------------------------------------------

----- to install/update ------

pip3 install larkinlab pip3 install --upgrade larkinlab

-------- to import -----------

import larkinlab as ll

--------- Subpackages --------

larkinlab.explore larkinlab.machinelearning


========================= ll.explore =============================

This is built for exploring data. Contains functions that help you get an understanding of the data at hand quickly.

to import: from larkinlab import explore as llex

Dependencies: pandas, numpy, matplotlib.pyplot, seaborn


functions

  • llex.dframe_ex(df, head_val) *

The dframe_ex function takes a dataframe and returns a few things

  • The number of rows, columns, and total data points
  • The names of the columns, limited to the first 60 if more than 60 exist
  • Displays up to the first n rows of the dataframe via the df.head method, set by head parameter.

// Parameter Default Values \ df :: pandas DataFrame head_val =5 :: Sets the number of rown to display in the dataframe preview. Works via the pandas .head method. Set to 'all' for all rows


  • llex.vcount_ex(df, print_count) *

The vcount_ex function returns the value counts and normalized value counts for all of columns in the dataframe passed through it.

// Parameter Default Values \ df :: pandas DataFrame print_count =5 :: sets the number of value counts to print for each column. Set to 'all' for all of them, for example - (df, print_count='all')


  • llex.missing_ex(df) *

The missing_ex function prints the number of missing values in each column of the dataframe passed through it.

// Parameter Default Values // df :: pandas DataFrame


  • llex.scat_ex(df) *

The scat_ex function returns a scatterplot representing the value counts and thier respective occurances for each column in the dataframe passed through it.

// Parameter Default Values // df :: pandas DataFrame


  • llex.corr_ex(df, min_corr, min_count, fig_size, colors) *

The corr_ex function returns either a pearson correlation values chart and a heatmap of said correlation values, or only the heatmap, for all of the columns in the dataframe passed through it.

// Parameter Default Values // df :: pandas DataFrame min_corr =0.2 :: minimum correlation value to appear on heatmap min_count =1 :: minimum number of observations required per pair of columns to have a valid result(pandas.df.corr(min_periods) argument) fig_size =(8, 10) :: heatmap size, 2 numbers colors ='Reds' :: color of the heatmap. Heatmap from seaborn, so uses thier color codes




========================= ll.machinelearning =============================

This package contains streamlined machine learning models and evaluation tools

to import: from larkinlab import machinelearning as llml

Dependencies: pandas, numpy, matplotlib.pyplot


functions




=========================================================================================================================

=========================================================================================================================

Created By: Conor Larkin

email: conor.larkin16@gmail.com GitHub: github.com/clarkin16 LinkedIn: linkedin.com/in/clarkin16

Thanks for checking this out!

====================================

----------- CHANGE LOG -----------

==================================== ------ Latest Release: 0.0.13 -----

0.0.13 (11/2/2020)

  • readme updates
  • added print_count param to .explore's vcount_ex function
  • added head_val and max_col param to .explore's dframe_ex function. Default max columns printed is now 50

0.0.12 (10/29/2020)

  • changed error in .explore's missing_ex() function's code
  • updated .explore corr_ex() function to include min_count arg
  • changed .explore.corr_ex() arg hm_only to map_only() with True or False keywords

0.0.11 (10/29/2020)

  • changed "install_required" values in setup.py

0.0.10 (10/29/2020)

  • fixed an error in corr_ex() function's code

0.0.9 (10/29/2020)

  • readme improvements
  • added function missing_ex() to .explore
  • added function corr_ex() to .explore
  • .explore added seaborn dependency
  • description change

0.0.8 (10/29/2020)

  • readme improved
  • changed description

0.0.7 (10/29/2020)

  • updated name to larkinlab from clarklib
  • added 2 subpackages: explore, machinelearning
  • changed explore.frame_ex to explore.dframe_ex
  • deleted clarklib (v0.0.0 - v0.0.6) from pypi, v0.0.7 and onward will be known as larkinlab

0.0.6 (10/29/2020)

  • Changed README to larkinlab format, with subpackages.
  • In The Future section
  • commented out long_description in setup.py
  • changed check_df() to frame_ex()
  • changed vcount_examine() to vcount_ex()
  • changed scat_examine() to scat_ex()

0.0.5 (10/28/2020)

  • Changed the ghangelog to be in descending chronological order
  • Changed description in setup.py
  • updated the readme to contain details on using the functions and contact info

0.0.4 (10/28/2020)

  • Changed check_df() function to only display up to 60 column names.
  • Changed check_df() to print "Rows:", "Columns:", and "Total Data Points:" instead of just print(df.shape, df.size)

0.0.3 (10/28/2020)

  • Added the 'import' section to code in clarklib init file. Works now!

0.0.2 (10/27/2020)

  • Moved init file into folder

0.0.1 (10/27/2020)

  • First release
  • Added 3 functions: check_df(), vcount_examine(), scat_examine()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

larkinlab-0.0.13.tar.gz (8.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page