Skip to main content

Add features to json: en/decoding of numpy arrays, preservation of ordering and ignoring of comments in input

Project description

JSON tricks (python)

At this time, the pyjson-tricks package brings three pieces of functionality to python handling of json files:

  1. Store and load numpy arrays in human-readable format.

  2. Preserve map order {} using OrderedDict.

  3. Allow for comments in json files by starting lines with #.

It also allows for gzip compression using the compress=True argument (off by default).

Installation and use

You can install using

pip install json-tricks

If you want to use numpy features, you should install numpy as well. If you don’t, then numpy is not required.

You can import the usual json functions dump(s) and load(s), as well as a separate comment removal function, as follows:

from json_tricks.np import dump, dumps, load, loads, strip_hash_comments

If you do not have numpy and want to use only order preservation and commented json reading, you should import from json_tricks.nonp`` instead.

The exact signatures of these functions are in the documentation. In many cases, keeping the arguments of the standard json functions but changing the import will be enough to use the extra features.

Features

Numpy arrays

This implementation is mostly based on an answer by tlausch on stackoverflow that you could read for details.

The array is encoded in sort-of-readableformat, like so:

arr = arange(0, 10, 1, dtype=uint8).reshape((2, 5))
print dumps({'mydata': arr})

after indering this yields:

{
        "mydata": {
                "dtype": "uint8",
                "shape": [2, 5],
                "__ndarray__": [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]
        }
}

which will be converted back to a numpy array when using json_tricks.loads.

As you’ve seen, this uses the magic key __ndarray__. Don’t use __ndarray__ as a dictionary key unless you’re trying to make a numpy array (and know what you’re doing).

Order

Given an ordered dictionary like this (see the tests for a longer one):

ordered = OrderedDict((
        ('elephant', None),
        ('chicken', None),
        ('tortoise', None),
))

Converting to json and back will preserve the order:

from json_tricks import dumps, loads
json = dumps(ordered, preserve_order=True)
ordered = loads(json, preserve_order=True)

where preserve_order=True is added for emphasis; it can be left out since it’s the default.

As a note on performance, both dicts and OrderedDicts have the same scaling for getting and setting items (O(1)). In Python versions before 3.5, OrderedDicts were implemented in Python rather than C, so were somewhat slower; since Python 3.5 both are implemented in C. In summary, you should have no scaling problems and probably no performance problems at all, especially for 3.5 and later.

Comments

This package uses # for comments, which seems to be the most common convention. For example, you could call loads on the following string:

{ # "comment 1
        "hello": "Wor#d", "Bye": "\"M#rk\"", "yes\\\"": 5,# comment" 2
        "quote": "\"th#t's\" what she said", # comment "3"
        "list": [1, 1, "#", "\"", "\\", 8], "dict": {"q": 7} #" comment 4 with quotes
}
# comment 5

And it would return the de-commented version:

{
        "hello": "Wor#d", "Bye": "\"M#rk\"", "yes\\\"": 5,
        "quote": "\"th#t's\" what she said",
        "list": [1, 1, "#", "\"", "\\", 8], "dict": {"q": 7}
}

Since comments aren’t stored in the Python representation of the data, loading and then saving a json file will remove the comments (it also likely changes the indentation).

There is already a commentjson package for Python. However, as of November 2015, it doesn’t support Python 3.x, and a pull request to add support has been left pending for five months.

The implementation of comments is not particularly efficient, but it does handle all the special cases I tested. For a few files you shouldn’t notice any performance problems, but if you’re reading hundreds of files, then they are presumably computer-generated, and you could consider turning comments off (strip_comments=False).

License

Revised BSD License; at your own risk, you can mostly do whatever you want with this code, just don’t use my name for promotion and do keep the license file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

json_tricks-1.1.tar.gz (7.4 kB view details)

Uploaded Source

File details

Details for the file json_tricks-1.1.tar.gz.

File metadata

  • Download URL: json_tricks-1.1.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for json_tricks-1.1.tar.gz
Algorithm Hash digest
SHA256 b32fdfcab5103f7d0cd63dd4fa01f8fea463765cf3d12d784d260a6f7876b6bd
MD5 091a0d4848b362156ca1141c7c170e57
BLAKE2b-256 739904874ef8c6357d2f38f49d63a778e09df8b822b88039642de5e872f65b5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page