Skip to main content

Traversal over Python's objects subtree and calculate the total size of the subtree in bytes (deep size).

Project description

objsize

Coverage Status

Traversal over Python's objects subtree and calculate the total size of the subtree in bytes (deep size).

This module traverses all child objects using Python's internal GC implementation. It attempts to ignore shared objects (i.e., None, types, modules, classes, functions, lambdas), as they are common among all objects. It is implemented without recursive calls for high performance.

Features

  • Traverse objects' subtree
  • Calculate objects' (deep) size in bytes
  • Exclude non-exclusive objects
  • Exclude specified objects subtree
  • Allow the user to specify unique handlers for:
    • Object's size calculation
    • Object's referents (i.e., its children)
    • Object filter (skip specific objects)

Pympler also supports determining an object deep size via pympler.asizeof(). There are two main differences between objsize and pympler.

  1. objsize has additional features:
    • Traversing the object subtree: iterating all the object's descendants one by one.
    • Excluding non-exclusive objects. That is, objects that are also referenced from somewhere else in the program. This is true for calculating the object's deep size and for traversing its descendants.
  2. objsize has a simple and robust implementation with significantly fewer lines of code, compared to pympler. The Pympler implementation uses recursion, and thus have to use a maximal depth argument to avoid reaching Python's max depth. objsize, however, uses BFS which is more efficient and simple to follow. Moreover, the Pympler implementation carefully takes care of any object type. objsize archives the same goal with a simple and generic implementation, which has fewer lines of code.

Install

pip install objsize==0.6.0

Basic Usage

Calculate the size of the object including all its members in bytes.

>>> import objsize
>>> objsize.get_deep_size(dict(arg1='hello', arg2='world'))
340

It is possible to calculate the deep size of multiple objects by passing multiple arguments:

>>> objsize.get_deep_size(['hello', 'world'], dict(arg1='hello', arg2='world'), {'hello', 'world'})
628

Complex Data

objsize can calculate the size of an object's entire subtree in bytes regardless of the type of objects in it, and its depth.

Here is a complex data structure, for example, that include a self reference:

my_data = (list(range(3)), list(range(3, 6)))


class MyClass:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        self.d = {'x': x, 'y': y, 'self': self}

    def __repr__(self):
        return "MyClass"


my_obj = MyClass(*my_data)

We can calculate my_obj deep size, including its stored data.

>>> objsize.get_deep_size(my_obj)
708

We might want to ignore non-exclusive objects such as the ones stored in my_data.

>>> objsize.get_deep_size(my_obj, exclude=[my_data])
384

Or simply let objsize detect that automatically:

>>> objsize.get_exclusive_deep_size(my_obj)
384

Non Shared Functions or Classes

objsize filters functions, lambdas, and classes by default since they are usually shared among many objects. For example:

>>> method_dict = {"identity": lambda x: x, "double": lambda x: x*2}
>>> objsize.get_deep_size(method_dict)
232

Some objects, however, as illustrated in the above example, have unique functions not shared by other objects. Due to this, it may be useful to count their sizes. You can achieve this by providing an alternative filter function.

>>> objsize.get_deep_size(method_dict, filter_func=objsize.shared_object_filter)
986
  • Note that using this filter function (objsize.shared_object_filter_with_functions) will also count shared functions and lambdas.

Special Cases

Some objects handle their data in a way that prevents Python's GC from detecting it. The user can supply a special way to calculate the actual size of these objects.

Case 1: torch

Using a simple calculation of the object size won't work for torch.Tensor.

>>> import torch
>>> objsize.get_deep_size(torch.rand(200))
72

So the user can define its own size calculation handler for such cases:

import objsize
import sys
import torch


def get_size_of_torch(o):
    # `objsize.safe_is_instance` catches `ReferenceError` caused by `weakref` objects
    if objsize.safe_is_instance(o, torch.Tensor):
        return sys.getsizeof(o.storage())
    else:
        return sys.getsizeof(o)

Then use it as follows:

>>> import torch
>>> objsize.get_deep_size(
...   torch.rand(200),
...   get_size_func=get_size_of_torch
... )
848

However, this neglects the object's internal structure. The user can help objsize to find the object's hidden storage by supplying it with its own referent and filter functions:

import objsize
import gc
import torch


def get_referents_torch(*objs):
    # Yield all native referents
    yield from gc.get_referents(*objs)

    for o in objs:
        # If the object is a torch tensor, then also yield its storage
        if objsize.safe_is_instance(o, torch.Tensor):
            yield o.storage()


def filter_func(o):
    # Torch storage points to another meta storage that is
    # already included in the outer storage calculation, 
    # so we need to filter it.
    # Also, `torch.dtype` is a common object like Python's types.
    return not objsize.safe_is_instance(o, (type, torch.storage._UntypedStorage, torch.dtype))

Then use these as follows:

>>> import torch
>>> objsize.get_deep_size(
...   torch.rand(200),
...   get_referents_func=get_referents_torch, 
...   filter_func=filter_func
... )
1024

Case 2: weakref

Using a simple calculation of the object size won't work for weakref.proxy.

>>> import weakref
>>> class Foo(list):
...     pass
... 
>>> o = Foo([0]*100)
>>> objsize.get_deep_size(o)
896
>>> o_ref = weakref.proxy(o)
>>> objsize.get_deep_size(oref)
72

To mitigate this, you can provide a method that attempts to fetch the proxy's referents:

import weakref
import gc


def get_weakref_referents(*objs):
    yield from gc.get_referents(*objs)

    for o in objs:
        if type(o) in weakref.ProxyTypes:
            try:
                yield o.__repr__.__self__
            except ReferenceError:
                pass

Then use it as follows:

>>> objsize.get_deep_size(oref, get_referents_func=get_weakref_referents)
968

After the referenced object will be collected, then the size of the proxy object will be reduced.

>>> del o
>>> gc.collect()
>>> # Wait for the object to be collected 
>>> objsize.get_deep_size(oref, get_referents_func=get_weakref_referents)
72

Traversal

A user can implement its own function over the entire subtree using the traversal method, which traverses all the objects in the subtree.

>>> for o in objsize.traverse_bfs(my_obj):
...     print(o)
... 
MyClass
{'x': [0, 1, 2], 'y': [3, 4, 5], 'd': {'x': [0, 1, 2], 'y': [3, 4, 5], 'self': MyClass}}
[0, 1, 2]
[3, 4, 5]
{'x': [0, 1, 2], 'y': [3, 4, 5], 'self': MyClass}
2
1
0
5
4
3

Similar to before, non-exclusive objects can be ignored.

>>> for o in objsize.traverse_exclusive_bfs(my_obj):
...     print(o)
... 
MyClass
{'x': [0, 1, 2], 'y': [3, 4, 5], 'd': {'x': [0, 1, 2], 'y': [3, 4, 5], 'self': MyClass}}
{'x': [0, 1, 2], 'y': [3, 4, 5], 'self': MyClass}

License

BSD-3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

objsize-0.6.0.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

objsize-0.6.0-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file objsize-0.6.0.tar.gz.

File metadata

  • Download URL: objsize-0.6.0.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for objsize-0.6.0.tar.gz
Algorithm Hash digest
SHA256 4bdcb13d5630b93e3a4f7643a40b777fa6fc97505082a527f9bde3896e61a501
MD5 a7df9e9c8be53d89f77045c21524edfc
BLAKE2b-256 0f75ce5b8f8267bb2af5025c95469cee3dcef5b656f29d274553a837bb1e5ea9

See more details on using hashes here.

File details

Details for the file objsize-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: objsize-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for objsize-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bb2405a00fee2e8c5fbb6be6cd26b1b489deaf61e7ee7d6758870aaeebd5d0ac
MD5 d79651921fbdf48c1a98d099257121ae
BLAKE2b-256 015e593cc7bd60e15d6ad5a5064af28f9fba53741e95fa9209e71d85d1bb3d7d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page