Skip to main content

Converts MS Excel formulas to Python and evaluates them.

Project description

================ Excel Calculator

.. image:: https://travis-ci.org/bradbase/xlcalculator.png?branch=master :target: https://travis-ci.org/bradbase/xlcalculator

.. image:: https://coveralls.io/repos/github/bradbase/xlcalculator/badge.svg?branch=master :target: https://coveralls.io/github/bradbase/xlcalculator?branch=master

.. image:: https://img.shields.io/pypi/v/xlcalculator.svg :target: https://pypi.python.org/pypi/xlcalculator

.. image:: https://img.shields.io/pypi/pyversions/xlcalculator.svg :target: https://pypi.python.org/pypi/xlcalculator/

.. image:: https://img.shields.io/pypi/status/xlcalculator.svg :target: https://pypi.org/project/xlcalculator/

xlcalculator is a Python library that reads MS Excel files and, to the extent of supported functions, can translate the Excel functions into Python code and subsequently evaluate the generated Python code. Essentially doing the Excel calculations without the need for Excel.

xlcalculator is a modernization of the koala2 <https://github.com/vallettea/koala>_ library.

xlcalculator currently supports:

  • Loading an Excel file into a Python compatible state

  • Saving Python compatible state

  • Loading Python compatible state

  • Ignore worksheets

  • Extracting sub-portions of a model. "focussing" on provided cell addresses or defined names

  • Evaluating

    • Individual cells
    • Defined Names (a "named cell" or range)
    • Ranges
    • Shared formulas not an Array Formula <https://stackoverflow.com/questions/1256359/what-is-the-difference-between-a-shared-formula-and-an-array-formula>_
      • Operands (+, -, /, *, ==, <>, <=, >=)
      • on cells only
    • Set cell value
    • Get cell value
    • Parsing a dict into the Model object <https://stackoverflow.com/questions/31260686/excel-formula-evaluation-in-pandas/61586912#61586912>_
      • Code is in examples\third_party_datastructure
    • Functions;
      • ABS
      • AND
      • AVERAGE
      • CHOOSE
      • CONCAT
      • COUNT
      • COUNTA
      • DATE
      • IF
      • IRR
      • LN
        • Python Math.log() differs from Excel LN. Currently returning Math.log()
      • MAX
      • MID
      • MIN
      • MOD
      • NPV
      • OR
      • PI
      • PMT
      • POWER
      • RIGHT
      • ROUND
      • ROUNDDOWN
      • ROUNDUP
      • SLN
      • SQRT
      • SUM
      • SUMIF
      • SUMPRODUCT
      • TODAY
      • TRUNC
      • VDB
      • VLOOKUP
        • Exact match only
      • XNPV
      • YEARFRAC
        • Basis 1, Actual/actual, is only within 3 decimal places

Not currently supported:

Run tests

Setup your environment::

virtualenv -p 3.7 ve ve/bin/pip install -e .[test]

From the root xlcalculator directory::

ve/bin/py.test -rw -s --tb=native

Or simply use tox::

tox

Run Example

From the examples/common_use_case directory::

python use_case_01.py

Adding/Registering Excel Functions

Excel function support can be easily added.

Fundamental function support is found in the xlfunctions directory. The functions are thematically organised in modules.

Excel functions can be added by any code using the xlfunctions.xl.register() decorator. Here is a simple example:

.. code-block:: Python

from xlcalculator.xlfunctions import xl

@xl.register() @xl.validate_args def ADDONE(num: xl.Number): return num + 1

The @xl.validate_args decorator will ensure that the annotated arguments are converted and validated. For example, even if you pass in a string, it is converted to a number (in typical Excel fashion):

.. code-block:: Python

ADDONE(1): 2 ADDONE('1'): 2

If you would like to contribute functions, please create a pull request. All new functions should be accompanied by sufficient tests to cover the functionality. Tests need to be written for both the Python implementation of the function (tests/xlfunctions) and a comparison with Excel (tests/xlfunctions_vs_excel).

Excel number precision

Excel number precision is a complex discussion.

It has been discussed in a Wikipedia page <https://en.wikipedia.org/wiki/Numeric_precision_in_Microsoft_Excel>_.

The fundamentals come down to floating point numbers and a contention between how they are represented in memory Vs how they are stored on disk Vs how they are presented on screen. A Microsoft article <https://www.microsoft.com/en-us/microsoft-365/blog/2008/04/10/understanding-floating-point-precision-aka-why-does-excel-give-me-seemingly-wrong-answers/>_ explains the contention.

This project is attempting to take care while reading numbers from the Excel file to try and remove a variety of representation errors.

Further work will be required to keep numbers in-line with Excel throughout different transformations.

From what I can determine this requires a low-level implementation of a numeric datatype (C or C++, Cython??) to replicate its behaviour. Python built-in numeric types don't replicate behaviours appropriately.

Unit testing Excel formulas directly from the workbook.

If you are interested in unit testing formulas in your workbook, you can use FlyingKoala <https://github.com/bradbase/flyingkoala>. An example on how can be found here <https://github.com/bradbase/flyingkoala/tree/master/flyingkoala/unit_testing_formulas>.

TODO

  • Do not treat ranges as a granular AST node ut instead as an operation ":" of two cell references to create the range. That will make implementing features like A1:OFFSET(...) easy to implement.

  • Support for alternative range evaluation: by ref (pointer), by expr (lazy eval) and current eval mode.

    • Pointers would allow easy implementations of functions like OFFSET().

    • Lazy evals will allow efficient implementation of IF() since execution of true and false expressions can be delayed until it is decided which expression is needed.

  • Implement array functions. It is really not that hard once a proper RangeData class has been implemented on which one can easily act with scalar functions.

  • Improve testing

  • Refactor model and evaluator to use pass-by-object-reference for values of cells which then get "used"/referenced by ranges, defined names and formulas

  • Handle multi-file addresses

  • Improve integration with pyopenxl for reading and writing files xample of problem space <https://stackoverflow.com/questions/40248564/pre-calculate-excel-formulas-when-exporting-data-with-python>_

======= CHANGES

0.2.4 (un-released)

  • Updated README with supported functions.

  • Fix bug in ModelCompiler extract method where a defined name cell was being overwritten with the cell from one of the terms contained within the formula. Added a test for this.

  • Move version of yearfrac to 0.4.4. That project has removed a dependency on the package six.

0.2.3 (2020-08-18)

  • In-boarded xlfunctions.

  • Bugfix COUNTA.

    • Now supports 256 arguments.
  • Updated README. Includes words on xlfunction.

  • Changed licence from GPL-3 style to MIT Style.

0.2.2 (2020-05-28)

  • Make dependency resolution part of the execution.

    • AST eval'ing takes care of depedency resolution.

    • Provide cycle detection with reporting.

    • Implemented a specific evaluation context. That makes cache control, namespace customization and data encapsulation much easier.

  • Add more tokenizer tests to increase coverage.

0.2.1 (2020-05-28)

  • Use a less intrusive way to patch openpyxl. Instead of permanently patching the reader to support cached formula values, mock is used to only patch the reader while reading the workbook.

    This way the patches do not interfere with other packages not expecting these new classes.

0.2.0 (2020-05-28)

  • Support for delayed node evaluation by wrapping them into expressions. The function will eval the expression when needed.

  • Support for native Excel data types.

  • Enable and update Excel file based function tests that are now working properly.

  • Flake8 source code.

0.1.0 (2020-05-25)

  • Refactored xlcalculator types to be more compact.

  • Reimplemented evaluation engine to not generate Python code anymore, but build a proper AST from the AST nodes. Each AST node supports an eval() function that knows how to compute a result.

    This removes a lot of complexities around trying to determine the evaluation context at code creation time and encoding the context as part of the generated code.

  • Removal of all special function handling.

  • Use of new xlfunctions implementation.

  • Use Openpyxl to load the Excel files. This provides shared formula support for free.

0.0.1b (2020-05-03)

  • Initial release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xlcalculator-0.2.4.tar.gz (263.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xlcalculator-0.2.4-py3-none-any.whl (365.2 kB view details)

Uploaded Python 3

File details

Details for the file xlcalculator-0.2.4.tar.gz.

File metadata

  • Download URL: xlcalculator-0.2.4.tar.gz
  • Upload date:
  • Size: 263.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for xlcalculator-0.2.4.tar.gz
Algorithm Hash digest
SHA256 fd8a3677b653598e3e7287ec166e52a67daecb38727e218063e14393851f89d3
MD5 8b9bcbcd6e8f3ba0569ebd0c10109887
BLAKE2b-256 92f78de2c17834af73446a69f1d283118cefebb725bbdf76799bc65dc2c2f3b8

See more details on using hashes here.

File details

Details for the file xlcalculator-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: xlcalculator-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 365.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for xlcalculator-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 bd54ca3bad68b088c35ad30a5df836203856badd873a3cee5328adf48c4b806e
MD5 84693de4cba9814f49b4a422982f71d2
BLAKE2b-256 cef45146df6c3fdd62d3aa49b5ce82363f6258e25b6d341a58025f64eb54156b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page