PostgreSQL Languages AST and statements prettifier
Project description
- Contact:
- lele@metapensiero.it
- License:
- Status:
- Version:
- 2
This is a Python 3 module that exposes the parse tree of a PostgreSQL statement (extracted by the almost standard PG parser repackaged as a standalone static library by libpg_query) as set of interconnected nodes, usually called an abstract syntax tree.
Foreword
I needed a better SQL reformatter than the one implemented by sqlparse, and was annoyed by a few glitches (subselects in particular) that ruins the otherwise excellent job it does, considering that it is a generic library that tries to swallow many different SQL dialects.
When I found psqlparse I decided to try implementing a PostgreSQL focused tool: at the beginning it’s been easier than I feared, but I quickly hit some shortcomings in that implementation, so I opted for writing my own solution restarting from scratch, with the following goals:
target only Python 3.4+
target PostgreSQL 10+
use a more dynamic approach to represent the parse tree, with a twofold advantage:
it is much less boring to code, because there’s no need to write one Python class for each PostgreSQL node tag
the representation is version agnostic, it can be adapted to newer/older Elephants in a snap
allow exploration of parse tree in both directions, because I realized that some kinds of nodes require that knowledge to determine their textual representation
avoid introducing arbitrary renames of tags and attributes, so what you read in PostgreSQL documentation/sources is available without the hassle of guessing how a symbol has been mapped
use a zero copy approach, keeping the original parse tree returned from the underlying libpg_query functions and have each node just borrow a reference to its own subtree
Version 2
In late 2019, Ronan Dunklau opened PR #62 against libpg_query, that reimplemented the build machinery of the library to make it easier (read, semi-automatic) to support PostgreSQL v12, and PR #36 to bring pglast in line.
Since that version of PostgreSQL inevitably introduced some backward incompatibilities, I bumped the major version of pglast to better reflect the fact.
As I’m writing this, the fate of PR #62 is still unclear, so for the time being I switched the libpg_query submodule to Ronan’s fork.
I’m going to keep version 1 aligned to the original Lukas’ PG 10 branch. In all likelihood version 3 will target PostgreSQL 13 once his work on PG 13 branch reaches stability.
Introduction
At the lower level the module exposes two libpg_query functions, parse_sql() and parse_plpgsql(), that take respectively an SQL statement and a PLpgSQL statement and return a parse tree as a hierarchy of Python dictionaries, lists and scalar values. In some cases these scalars correspond to some C typedef enums, that are automatically extracted from the PostgreSQL headers mentioned above and are available as pglast.enums.
At a higher level that tree is represented by three Python classes, a Node that represents a single node, a List that wraps a sequence of nodes and a Scalar for plain values such a strings, integers, booleans or none.
Every node is identified by a tag, a string label that characterizes its content that is exposed as a set of attributes as well as with a dictionary-like interface (technically they implements both a __getattr__ method and a __getitem__ method). When asked for an attribute, the node returns an instance of the base classes, i.e. another Node, or a List or a Scalar, depending on the data type of that item. When the node does not contain the requested attribute it returns a singleton Missing marker instance.
A List wraps a plain Python list and may contains a sequence of Node instances, or in some cases other sub-lists, that can be accessed with the usual syntax, or iterated.
Finally, a Scalar carries a single value of some type, accessible through its value attribute.
On top of that, the module implements two serializations, one that transforms a Node into a raw textual representation and another that returns a prettified representation. The latter is exposed by the pgpp CLI tool, see below for an example.
Installation
As usual, the easiest way is with pip:
$ pip install pglast
Alternatively you can clone the repository:
$ git clone https://github.com/lelit/pglast.git --recursive
and install from there:
$ pip install ./pglast
Development
There is a set of makefiles implementing the most common operations, a make help will show a brief table of contents. A comprehensive test suite, based on pytest, covers 98% of the source lines.
Examples of usage
Parse an SQL statement and get its AST root node:
>>> from pglast import Node, parse_sql >>> root = Node(parse_sql('SELECT foo FROM bar')) >>> print(root) None=[1*{RawStmt}]
Recursively traverse the parse tree:
>>> for node in root.traverse(): ... print(node) ... None[0]={RawStmt} stmt={SelectStmt} fromClause[0]={RangeVar} inh=<True> location=<16> relname=<'bar'> relpersistence=<'p'> op=<0> targetList[0]={ResTarget} location=<7> val={ColumnRef} fields[0]={String} str=<'foo'> location=<7>
As you can see, the representation of each value is mnemonic: {some_tag} means a Node with tag some_tag, [X*{some_tag}] is a List containing X nodes of that particular kind[*] and <value> is a Scalar.
Get a particular node:
>>> from_clause = root[0].stmt.fromClause >>> print(from_clause) fromClause=[1*{RangeVar}]
Obtain some information about a node:
>>> range_var = from_clause[0] >>> print(range_var.node_tag) RangeVar >>> print(range_var.attribute_names) dict_keys(['relname', 'inh', 'relpersistence', 'location']) >>> print(range_var.parent_node) stmt={SelectStmt}
Iterate over nodes:
>>> for a in from_clause: ... print(a) ... for b in a: ... print(b) ... fromClause[0]={RangeVar} inh=<True> location=<16> relname=<'bar'> relpersistence=<'p'>
Reformat a SQL statement[†] from the command line:
$ echo "select a,b,c from sometable" | pgpp SELECT a , b , c FROM sometable $ echo "select a,b,c from sometable" | pgpp -c SELECT a, b, c FROM sometable $ echo "select a, case when a=1 then 'singular' else 'plural' end from test" > /tmp/q.sql $ pgpp /tmp/q.sql SELECT a , CASE WHEN (a = 1) THEN 'singular' ELSE 'plural' END FROM test $ echo 'update "table" set value=123 where value is null' | pgpp UPDATE "table" SET value = 123 WHERE value IS NULL $ echo " insert into t (id, description) values (1, 'this is short enough'), (2, 'this is too long, and will be splitted')" | pgpp -s 20 INSERT INTO t (id, description) VALUES (1, 'this is short enough') , (2, 'this is too long, an' 'd will be splitted')
Programmatically reformat a SQL statement:
>>> from pglast import prettify >>> print(prettify('delete from sometable where value is null')) DELETE FROM sometable WHERE value IS NULL
Documentation
Latest documentation is hosted by Read the Docs at http://pglast.readthedocs.io/en/latest/
Changes
2.0.dev3 (2021-02-20)
Handle INCLUDE clause in IndexStmt (PR #67), thanks to Ronan Dunklau
2.0.dev2 (2020-10-24)
Merge new fingerprint functionality from v1 (i.e. master) branch
2.0.dev1 (2020-09-26)
Require Python 3.6 or greater
Handle ALTER TYPE .. RENAME VALUE in AlterEnumStmt (PR #52), thanks to Ronan Dunklau
Add support for Create / Alter / Drop PROCEDURE (PR #48), thanks to Ronan Dunklau
Use Ronan’s fork of libpg_query, targeting PostgreSQL 12.1 (PR #36)
Change get_postgresql_version() to return a (major, minor) tuple (issue #38)
Handle ALTER TABLE ... ALTER COLUMN ... SET STORAGE ...
Handle PG12 materialized CTEs (issue #57)
Support column numbers in ALTER INDEX (PR #58), thanks to Ronan Dunklau
Handle SET LOGGED and SET UNLOGGED in ALTER TABLE (PR #59), thanks to Ronan Dunklau
Handle ALTER TYPE ... RENAME (PR #62), , thanks to Ronan Dunklau
1.17 (2021-02-20)
Fix the generic case in the RenameStmt printer
1.16 (2021-02-20)
Promote to the stable state
Move the job of building and uploading binary wheels from TravisCI to GitHub Actions
1.15 (2021-02-19)
Fix IF EXISTS variant of RenameStmt printer (PR #70), thanks to Jonathan Mortensen
Update libpg_query to 10-1.0.5
1.14 (2020-10-24)
Produce Python 3.9 wheels, thanks to cibuildwheel 1.6.3
Expose the libpg_query’s fingerprint functionality (PR #64), thanks to Yiming Wang
1.13 (2020-09-26)
Handle SELECT FROM foo
1.12 (2020-06-08)
1.11 (2020-05-08)
1.10 (2020-01-25)
Fix collation name printer (PR #44), thanks to Ronan Dunklau
Implement CreatePLangStmt printer (PR #42), thanks to Bennie Swart
Fix privileges printer (PR #41), thanks to Bennie Swart
Handle TRUNCATE event in CreateTrigStmt printer (PR #40), thanks to Bennie Swart
Fix function body dollar quoting (PR #39), thanks to Bennie Swart
1.9 (2019-12-20)
Prettier INSERT representation
1.8 (2019-12-07)
Prettier CASE representation
New option to emit a semicolon after the last statement (issue #24)
1.7 (2019-12-01)
Implement NotifyStmt printer
Implement RuleStmt printer, thanks to Gavin M. Roy for his PR #28
Fix RenameStmt, properly handling object name
Produce Python 3.8 wheels, thanks to cibuildwheel 1.0.0
Support ALTER TABLE RENAME CONSTRAINT (PR #35), thanks to Ronan Dunklau
1.6 (2019-09-04)
1.5 (2019-06-04)
1.4 (2019-04-06)
1.3 (2019-03-28)
1.2 (2019-02-13)
1.1 (2018-07-20)
No visible changes, but now PyPI carries binary wheels for Python 3.7.
1.0 (2018-06-16)
Important
The name of the package has been changed from pg_query to pglast, to satisfy the request made by the author of libpg_query in issue #9.
This affects both the main repository on GitHub, that from now on is https://github.com/lelit/pglast, and the ReadTheDocs project that hosts the documentation, http://pglast.readthedocs.io/en/latest/.
I’m sorry for any inconvenience this may cause.
0.28 (2018-06-06)
Update libpg_query to 10-1.0.2
Support the ‘?’-style parameter placeholder variant allowed by libpg_query (details)
0.27 (2018-04-15)
Prettier JOINs representation, aligning them with the starting relation
0.26 (2018-04-03)
Fix cosmetic issue with ANY() and ALL()
0.25 (2018-03-31)
Fix issue in the safety belt check performed by pgpp (issue #4)
0.24 (2018-03-02)
Implement Null printer
0.23 (2017-12-28)
Implement some other DDL statements printers
New alternative style to print comma-separated-values lists, activated by a new --comma-at-eoln option on pgpp
0.22 (2017-12-03)
Implement TransactionStmt and almost all DROP xxx printers
0.21 (2017-11-22)
Implement NamedArgExpr printer
New alternative printers for a set of special functions, activated by a new --special-functions option on pgpp (issue #2)
0.20 (2017-11-21)
Handle special de-reference (A_Indirection) cases
0.19 (2017-11-16)
Fix serialization of column labels containing double quotes
Fix corner issues surfaced implementing some more DDL statement printers
0.18 (2017-11-14)
Fix endless loop due to sloppy conversion of command line option
Install the command line tool as pgpp
0.17 (2017-11-12)
Rename printers.sql to printers.dml (backward incompatibility)
List printer functions in the documentation, referencing the definition of related node type
Fix inconsistent spacing in JOIN condition inside a nested expression
Fix representation of unbound arrays
Fix representation of interval data type
Initial support for DDL statements
Fix representation of string literals containing single quotes
0.16 (2017-10-31)
Update libpg_query to 10-1.0.0
0.15 (2017-10-12)
Fix indentation of boolean expressions in SELECT’s targets (issue #3)
0.14 (2017-10-09)
Update to latest libpg_query’s 10-latest branch, targeting PostgreSQL 10.0 final
0.13 (2017-09-17)
Fix representation of subselects requiring surrounding parens
0.12 (2017-08-22)
New option --version on the command line tool
Better enums documentation
Release the GIL while calling libpg_query functions
0.11 (2017-08-11)
Nicer indentation for JOINs, making OUTER JOINs stand out
Minor tweaks to lists rendering, with less spurious whitespaces
New option --no-location on the command line tool
0.10 (2017-08-11)
Support Python 3.4 and Python 3.5 as well as Python 3.6
0.9 (2017-08-10)
Fix spacing before the $ character
Handle type modifiers
New option --plpgsql on the command line tool, just for fun
0.8 (2017-08-10)
Add enums subpackages to the documentation with references to their related headers
New compact_lists_margin option to produce a more compact representation when possible (see issue #1)
0.7 (2017-08-10)
Fix sdist including the Sphinx documentation
0.6 (2017-08-10)
New option --parse-tree on the command line tool to show just the parse tree
Sphinx documentation, available online
0.5 (2017-08-09)
Handle some more cases when a name must be double-quoted
Complete the serialization of the WindowDef node, handling its frame options
0.4 (2017-08-09)
Expose the actual PostgreSQL version the underlying libpg_query libray is built on thru a new get_postgresql_version() function
New option safety_belt for the prettify() function, to protect the innocents
Handle serialization of CoalesceExpr and MinMaxExpr
0.3 (2017-08-07)
Handle serialization of ParamRef nodes
Expose a prettify() helper function
0.2 (2017-08-07)
Test coverage at 99%
First attempt at automatic wheel upload to PyPI, let’s see…
0.1 (2017-08-07)
First release (“Hi daddy!”, as my soul would tag it)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for pglast-2.0.dev3-cp39-cp39-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ea30224d56f0d0a9d30ceba388897b41883e09ee7b4dc7f94ca1bd4e9c08f02 |
|
MD5 | c28d5de4878c7fb0e016c040cb1f8243 |
|
BLAKE2b-256 | 74acdf57bba7a9cf5f165a5e220383568d8e7dc33a1c4aa099c8f0af472748a5 |
Hashes for pglast-2.0.dev3-cp39-cp39-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 370e116444f8c4214560735359004bbd7728add90a6b67cf7f7254593b535ac4 |
|
MD5 | fe360bd12a670c5ab873ca0290150d22 |
|
BLAKE2b-256 | 0f243ac8cb7a18789fec941256b46f3794a3019bfbbe8ec4aaa038e3132d4ade |
Hashes for pglast-2.0.dev3-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa2b86ac7a76d6c4c8b414bee318e9e12fc459e12f77b0cf7ff43699b68922be |
|
MD5 | ca9c6b713d9445661b6de715534ea657 |
|
BLAKE2b-256 | b19776fe307831993d8011db7b238cdf5dda351518f87a44c67dbfb660442f76 |
Hashes for pglast-2.0.dev3-cp39-cp39-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ed8f47c22fb081a0c340935e999839110fa73ce2b961e48f19a4e81ab87caa7 |
|
MD5 | 33fed6454cb7f9f1710c51a7d3337dd4 |
|
BLAKE2b-256 | ccc0088c84a33df358c640dc5236f328af2e523afd945a4a9c06bf8d1ef0feb3 |
Hashes for pglast-2.0.dev3-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 611ef7aa67d7488a40b3e1559abf3a3ccbcee1ad26f3cb72898b693fcf0cc5b9 |
|
MD5 | 217a8a184b75ec9cb78ff919f53211ad |
|
BLAKE2b-256 | ecdc7c2fe9b99b136c40164e6b878a46d93bb377d9f7ff5e4d4652c682bd9ee1 |
Hashes for pglast-2.0.dev3-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9795bba72df6515943a8a6c4ff659d157172dbda6a5febea2afd0f1f0f193e97 |
|
MD5 | b20260b8db78d40ee90e5f8d604b47a9 |
|
BLAKE2b-256 | e09f6e90655252da202d2dd2bad280c2e7f35bb47a3c321a3f19d5317a404211 |
Hashes for pglast-2.0.dev3-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 445fdf4fd2b55a52513585953b5cbbf01f553f4a321ecc92127cf6ba57739e37 |
|
MD5 | c40077bae15e900a9a9a9fbf5fcd3d92 |
|
BLAKE2b-256 | 5389372ecdef86d035f9e44c504f951d650e3ea9abe2a3a53d541ffda52fb910 |
Hashes for pglast-2.0.dev3-cp38-cp38-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93c4140fdb5858765e915d706e36bc386f69aa2f633da7ed3a1490c42d26f0d8 |
|
MD5 | cef1ed36b0c71bec713ec5b81f4da31b |
|
BLAKE2b-256 | bc37eb338a7197ed5b3cac7aa03d6a405c986f9f1df97b5ddb66c976d36375d8 |
Hashes for pglast-2.0.dev3-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4c6e9bb4d4ac53c54fa1808fdab9c238a24cfd4fd10c0fe2a7c3bc586f9c3a5f |
|
MD5 | 16b587caeaecf5b5cc7f02ffde6368ed |
|
BLAKE2b-256 | 37f6d0cb87b8ab13e1c865895a0e21dad85350b8c25714bb9dc5021005a47903 |
Hashes for pglast-2.0.dev3-cp37-cp37m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd29f62540bb8eb03c18fb43eaf441631ceef28715f4eff389e5ef3b8e2e8182 |
|
MD5 | 1014c56abf67ac7062bc30c25b2b96cf |
|
BLAKE2b-256 | 97a3f47883951984c522e2c73ae230fff570179846b3a4ab88e6130878b0acf5 |
Hashes for pglast-2.0.dev3-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6d659ff7a2059b86d8b15cc73629efec9943052c9acb539d173c33805c557d4 |
|
MD5 | 1aedce19f2b4b706bf5baf1993e49a61 |
|
BLAKE2b-256 | 86155b6ca3142a04ef24c7205e7b3d87e11dd15b7908a21cfdbb5e1b31ca9a76 |
Hashes for pglast-2.0.dev3-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | af30440a864fe9d487bfe6be6c20a44aeea3ed074a01af3e83a2cee387095527 |
|
MD5 | f3ace1f8afc013886a735b38d4436e5d |
|
BLAKE2b-256 | 776fe74c035ff665154198757f8c65579ab15d8c4a06c654a2a98966548f62b0 |
Hashes for pglast-2.0.dev3-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | da1e2bbacd9b34dd4f65d69a62912708f6318d88ae58efb19098278399c95ac9 |
|
MD5 | 87b2fb0bba68a0529527400b733fe6b7 |
|
BLAKE2b-256 | aed743a709fdfeb7bd376be67be0a65d12986e2c032e283115bce5ba59220bfb |
Hashes for pglast-2.0.dev3-cp36-cp36m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b41171697cf104882fa604739d62418da4608df57e3f44262384f876c5ca6cf9 |
|
MD5 | f4a3511e6f0eb047f79e4c2d6519370c |
|
BLAKE2b-256 | c46b819fb929b2c0bad8e7a01c454eb3b9c1f60ba1e8abd74e2f103923c6ff6e |
Hashes for pglast-2.0.dev3-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ee3ae164db3f953db037372c8dfff9abebc37b115fc62142b804dd53ede671d |
|
MD5 | acfdae483d5540ed28ecd2f60af2690d |
|
BLAKE2b-256 | 45492d7478b49fae204b17c87bf49c664eec9ef0eb8297df52d13ae67fa0fb0d |
Hashes for pglast-2.0.dev3-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19e19d221079307213633c3421cb60b5f40131512451682cf428c223dbd8913d |
|
MD5 | c6e5aa83043a3aa155259e1c01f88d66 |
|
BLAKE2b-256 | b6c26b8bc79ee15f1d62cf0f38c122ed5244c21ce2f999af38b4f88fa9df2c81 |