Skip to main content

A toolkit for making dot-density maps in Python

Project description

dorchester

PyPI Changelog Tests License

A tool for making dot-density maps in Python.

Caveat emptor

This is very alpha right now. Use at your own risk and evaluate any editorial usage of this of this library before publishing.

Installation

Install this tool using pip:

$ pip install dorchester

Usage

The main command is dorchester plot. That takes an input file, an output file and one or more property keys to extract population counts.

dorchester plot --help
Usage: dorchester plot [OPTIONS] SOURCE DEST

  Generate data for a dot-density map. Input may be any GIS format readable
  by Fiona (Shapefile, GeoJSON, etc).

Options:
  -k, --key TEXT                  Property name for a population. Use multiple
                                  to map different population classes.

  -f, --format [csv|geojson|null]
                                  Output format. If not given, will guess
                                  based on output file extension.

  -m, --mode [w|a|x]              File mode for destination  [default: w]
  --fid TEXT                      Use a property key (instead of feature.id)
                                  to uniquely identify each feature

  --coerce                        Coerce properties passed in --key to
                                  integers. BE CAREFUL. This could cause
                                  incorrect results if misused.

  --progress                      Show a progress bar  [default: False]
  -m, --multiprocessing           Use multiprocessing
  --help                          Show this message and exit.

Input can be in any format readable by Fiona, such as Shapefiles and GeoJSON. The input file needs to contain both population data and boundaries. You may need to join different files together before plotting with dorchester.

Output format (--format) can be CSV or GeoJSON (more formats coming soon). For GeoJSON, the output will be a stream of newline-delimited Point features, like this:

{"type": "Feature", "geometry": {"type": "Point", "coordinates": [76, 38]}, "properties": {"group": "population", "fid": 1}}
{"type": "Feature", "geometry": {"type": "Point", "coordinates": [77, 39]}, "properties": {"group": "population", "fid": 1}}
{"type": "Feature", "geometry": {"type": "Point", "coordinates": [78, 37]}, "properties": {"group": "population", "fid": 1}}

This will be big files, because we are creating a point for every individual. Massachusetts, for example, had a population of 6.631 million in 2010, which means a dot density CSV file will be 6,336,107 lines long and 305 mb.

Each key (--key) should correspond to a property on each feature whose value is a whole number. In a block like this, use --key POP10 to extract population:

{
  "geometry": {
    "coordinates": [...],
    "type": "Polygon"
  },
  "id": "0",
  "properties": {
    "BLOCKCE": "4023",
    "BLOCKID10": "250010112004023",
    "COUNTYFP10": "001",
    "HOUSING10": 16,
    "PARTFLG": "N",
    "POP10": 12,
    "STATEFP10": "25",
    "TRACTCE10": "011200"
  },
  "type": "Feature"
}

You can pass multiple --key options to create different groups that will be layered together. This is how you would create a map showing different racial groups, for example.

The --mode option controls how the output file is opened:

  • w will create or overwrite the output file
  • a will append to an existing file
  • x will try to create a new file and fail if that file already exists

Setting --fid will use a property key to identify each feature, instead of the feature's id field (which is often missing, or will be an index number in shapefiles). In the Census block example above, BLOCKID10 will uniquely identify this block, while id: 0 only identifies it as the first feature in its source shapefile.

For data sources where properties are encoded as strings, the --coerce option will recast anything passed via --key to integers. Be careful with this option, as it involves changing data. It will fail (and stop plotting) if it encounters something that can't be coerced into an integer.

Use the --progress flag to show a progress bar. This is off by default.

Use -m or --multiprocessing to use Python's multiprocessing module to significantly speed up point generation. This will try to use every processor on your machine instead of just one.

Putting points on a map

For small-ish areas, QGIS will render lots of points just fine. Generate points, and load the output as a delimited or GeoJSON file.

To build an interactive dot density map, you can use tippecanoe to generate an MBTiles file, which can be uploaded to Mapbox (or possibly other hosting providers). This has worked for me:

tippecanoe -zg -o points.mbtiles --drop-densest-as-needed --extend-zooms-if-still-dropping points.csv

About the name

Dorchester is the largest and most diverse neighborhood in Boston, Massachusetts, and is often referred to as Dot.

The name is also a nod to Englewood, built by the Chicago Tribune News Apps team. This is, hopefully, a worthy successor.

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd dorchester
python -m venv .venv
source .venv/bin/activate

Or if you are using pipenv:

pipenv shell

Now install the dependencies and tests:

pip install -e '.[test]'

To run the tests:

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dorchester-0.6.0.tar.gz (11.8 kB view hashes)

Uploaded Source

Built Distribution

dorchester-0.6.0-py3-none-any.whl (13.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page