PostgreSQL Languages AST and statements prettifier
Project description
- Contact:
- lele@metapensiero.it
- License:
- Status:
- Version:
- 2
This is a Python 3 module that exposes the parse tree of a PostgreSQL statement (extracted by the almost standard PG parser repackaged as a standalone static library by libpg_query) as set of interconnected nodes, usually called an abstract syntax tree.
Foreword
I needed a better SQL reformatter than the one implemented by sqlparse, and was annoyed by a few glitches (subselects in particular) that ruins the otherwise excellent job it does, considering that it is a generic library that tries to swallow many different SQL dialects.
When I found psqlparse I decided to try implementing a PostgreSQL focused tool: at the beginning it’s been easier than I feared, but I quickly hit some shortcomings in that implementation, so I opted for writing my own solution restarting from scratch, with the following goals:
target only Python 3.4+
target PostgreSQL 10+
use a more dynamic approach to represent the parse tree, with a twofold advantage:
it is much less boring to code, because there’s no need to write one Python class for each PostgreSQL node tag
the representation is version agnostic, it can be adapted to newer/older Elephants in a snap
allow exploration of parse tree in both directions, because I realized that some kinds of nodes require that knowledge to determine their textual representation
avoid introducing arbitrary renames of tags and attributes, so what you read in PostgreSQL documentation/sources is available without the hassle of guessing how a symbol has been mapped
use a zero copy approach, keeping the original parse tree returned from the underlying libpg_query functions and have each node just borrow a reference to its own subtree
Version 2
In late 2019, Ronan Dunklau opened PR #62 against libpg_query, that reimplemented the build machinery of the library to make it easier (read, semi-automatic) to support PostgreSQL v12, and PR #36 to bring pglast in line.
Since latest version of PostgreSQL inevitably introduced some backward incompatibilities, I bumped the major version of pglast to better reflect the fact.
As I’m writing this, the fate of PR #62 is still unclear, so for the time being I switched the libpg_query submodule to Ronan’s fork.
Introduction
At the lower level the module exposes two libpg_query functions, parse_sql() and parse_plpgsql(), that take respectively an SQL statement and a PLpgSQL statement and return a parse tree as a hierarchy of Python dictionaries, lists and scalar values. In some cases these scalars correspond to some C typedef enums, that are automatically extracted from the PostgreSQL headers mentioned above and are available as pglast.enums.
At a higher level that tree is represented by three Python classes, a Node that represents a single node, a List that wraps a sequence of nodes and a Scalar for plain values such a strings, integers, booleans or none.
Every node is identified by a tag, a string label that characterizes its content that is exposed as a set of attributes as well as with a dictionary-like interface (technically they implements both a __getattr__ method and a __getitem__ method). When asked for an attribute, the node returns an instance of the base classes, i.e. another Node, or a List or a Scalar, depending on the data type of that item. When the node does not contain the requested attribute it returns a singleton Missing marker instance.
A List wraps a plain Python list and may contains a sequence of Node instances, or in some cases other sub-lists, that can be accessed with the usual syntax, or iterated.
Finally, a Scalar carries a single value of some type, accessible through its value attribute.
On top of that, the module implements two serializations, one that transforms a Node into a raw textual representation and another that returns a prettified representation. The latter is exposed by the pgpp CLI tool, see below for an example.
Installation
As usual, the easiest way is with pip:
$ pip install pglast
Alternatively you can clone the repository:
$ git clone https://github.com/lelit/pglast.git --recursive
and install from there:
$ pip install ./pglast
Development
There is a set of makefiles implementing the most common operations, a make help will show a brief table of contents. A comprehensive test suite, based on pytest, covers 98% of the source lines.
Examples of usage
Parse an SQL statement and get its AST root node:
>>> from pglast import Node, parse_sql >>> root = Node(parse_sql('SELECT foo FROM bar')) >>> print(root) None=[1*{RawStmt}]
Recursively traverse the parse tree:
>>> for node in root.traverse(): ... print(node) ... None[0]={RawStmt} stmt={SelectStmt} fromClause[0]={RangeVar} inh=<True> location=<16> relname=<'bar'> relpersistence=<'p'> op=<0> targetList[0]={ResTarget} location=<7> val={ColumnRef} fields[0]={String} str=<'foo'> location=<7>
As you can see, the representation of each value is mnemonic: {some_tag} means a Node with tag some_tag, [X*{some_tag}] is a List containing X nodes of that particular kind[*] and <value> is a Scalar.
Get a particular node:
>>> from_clause = root[0].stmt.fromClause >>> print(from_clause) fromClause=[1*{RangeVar}]
Obtain some information about a node:
>>> range_var = from_clause[0] >>> print(range_var.node_tag) RangeVar >>> print(range_var.attribute_names) dict_keys(['relname', 'inh', 'relpersistence', 'location']) >>> print(range_var.parent_node) stmt={SelectStmt}
Iterate over nodes:
>>> for a in from_clause: ... print(a) ... for b in a: ... print(b) ... fromClause[0]={RangeVar} inh=<True> location=<16> relname=<'bar'> relpersistence=<'p'>
Reformat a SQL statement[†] from the command line:
$ echo "select a,b,c from sometable" | pgpp SELECT a , b , c FROM sometable $ echo "select a,b,c from sometable" | pgpp -c SELECT a, b, c FROM sometable $ echo "select a, case when a=1 then 'singular' else 'plural' end from test" > /tmp/q.sql $ pgpp /tmp/q.sql SELECT a , CASE WHEN (a = 1) THEN 'singular' ELSE 'plural' END FROM test $ echo 'update "table" set value=123 where value is null' | pgpp UPDATE "table" SET value = 123 WHERE value IS NULL $ echo " insert into t (id, description) values (1, 'this is short enough'), (2, 'this is too long, and will be splitted')" | pgpp -s 20 INSERT INTO t (id, description) VALUES (1, 'this is short enough') , (2, 'this is too long, an' 'd will be splitted')
Programmatically reformat a SQL statement:
>>> from pglast import prettify >>> print(prettify('delete from sometable where value is null')) DELETE FROM sometable WHERE value IS NULL
Documentation
Latest documentation is hosted by Read the Docs at http://pglast.readthedocs.io/en/latest/
Changes
2.0.dev2 (2020-10-24)
Merge new fingerprint functionality from v1 (i.e. master) branch
2.0.dev1 (2020-09-26)
Require Python 3.6 or greater
Handle ALTER TYPE .. RENAME VALUE in AlterEnumStmt (PR #52), thanks to Ronan Dunklau
Add support for Create / Alter / Drop PROCEDURE (PR #48), thanks to Ronan Dunklau
Use Ronan’s fork of libpg_query, targeting PostgreSQL 12.1 (PR #36)
Change get_postgresql_version() to return a (major, minor) tuple (issue #38)
Handle ALTER TABLE ... ALTER COLUMN ... SET STORAGE ...
Handle PG12 materialized CTEs (issue #57)
Support column numbers in ALTER INDEX (PR #58), thanks to Ronan Dunklau
Handle SET LOGGED and SET UNLOGGED in ALTER TABLE (PR #59), thanks to Ronan Dunklau
Handle ALTER TYPE ... RENAME (PR #62), , thanks to Ronan Dunklau
1.14 (unreleased)
Produce Python 3.9 wheels, thanks to cibuildwheel 1.6.3
Expose the libpg_query’s fingerprint functionality (PR #64), thanks to Yiming Wang
1.13 (2020-09-26)
Handle SELECT FROM foo
1.12 (2020-06-08)
1.11 (2020-05-08)
1.10 (2020-01-25)
Fix collation name printer (PR #44), thanks to Ronan Dunklau
Implement CreatePLangStmt printer (PR #42), thanks to Bennie Swart
Fix privileges printer (PR #41), thanks to Bennie Swart
Handle TRUNCATE event in CreateTrigStmt printer (PR #40), thanks to Bennie Swart
Fix function body dollar quoting (PR #39), thanks to Bennie Swart
1.9 (2019-12-20)
Prettier INSERT representation
1.8 (2019-12-07)
Prettier CASE representation
New option to emit a semicolon after the last statement (issue #24)
1.7 (2019-12-01)
Implement NotifyStmt printer
Implement RuleStmt printer, thanks to Gavin M. Roy for his PR #28
Fix RenameStmt, properly handling object name
Produce Python 3.8 wheels, thanks to cibuildwheel 1.0.0
Support ALTER TABLE RENAME CONSTRAINT (PR #35), thanks to Ronan Dunklau
1.6 (2019-09-04)
1.5 (2019-06-04)
1.4 (2019-04-06)
1.3 (2019-03-28)
1.2 (2019-02-13)
1.1 (2018-07-20)
No visible changes, but now PyPI carries binary wheels for Python 3.7.
1.0 (2018-06-16)
Important
The name of the package has been changed from pg_query to pglast, to satisfy the request made by the author of libpg_query in issue #9.
This affects both the main repository on GitHub, that from now on is https://github.com/lelit/pglast, and the ReadTheDocs project that hosts the documentation, http://pglast.readthedocs.io/en/latest/.
I’m sorry for any inconvenience this may cause.
0.28 (2018-06-06)
Update libpg_query to 10-1.0.2
Support the ‘?’-style parameter placeholder variant allowed by libpg_query (details)
0.27 (2018-04-15)
Prettier JOINs representation, aligning them with the starting relation
0.26 (2018-04-03)
Fix cosmetic issue with ANY() and ALL()
0.25 (2018-03-31)
Fix issue in the safety belt check performed by pgpp (issue #4)
0.24 (2018-03-02)
Implement Null printer
0.23 (2017-12-28)
Implement some other DDL statements printers
New alternative style to print comma-separated-values lists, activated by a new --comma-at-eoln option on pgpp
0.22 (2017-12-03)
Implement TransactionStmt and almost all DROP xxx printers
0.21 (2017-11-22)
Implement NamedArgExpr printer
New alternative printers for a set of special functions, activated by a new --special-functions option on pgpp (issue #2)
0.20 (2017-11-21)
Handle special de-reference (A_Indirection) cases
0.19 (2017-11-16)
Fix serialization of column labels containing double quotes
Fix corner issues surfaced implementing some more DDL statement printers
0.18 (2017-11-14)
Fix endless loop due to sloppy conversion of command line option
Install the command line tool as pgpp
0.17 (2017-11-12)
Rename printers.sql to printers.dml (backward incompatibility)
List printer functions in the documentation, referencing the definition of related node type
Fix inconsistent spacing in JOIN condition inside a nested expression
Fix representation of unbound arrays
Fix representation of interval data type
Initial support for DDL statements
Fix representation of string literals containing single quotes
0.16 (2017-10-31)
Update libpg_query to 10-1.0.0
0.15 (2017-10-12)
Fix indentation of boolean expressions in SELECT’s targets (issue #3)
0.14 (2017-10-09)
Update to latest libpg_query’s 10-latest branch, targeting PostgreSQL 10.0 final
0.13 (2017-09-17)
Fix representation of subselects requiring surrounding parens
0.12 (2017-08-22)
New option --version on the command line tool
Better enums documentation
Release the GIL while calling libpg_query functions
0.11 (2017-08-11)
Nicer indentation for JOINs, making OUTER JOINs stand out
Minor tweaks to lists rendering, with less spurious whitespaces
New option --no-location on the command line tool
0.10 (2017-08-11)
Support Python 3.4 and Python 3.5 as well as Python 3.6
0.9 (2017-08-10)
Fix spacing before the $ character
Handle type modifiers
New option --plpgsql on the command line tool, just for fun
0.8 (2017-08-10)
Add enums subpackages to the documentation with references to their related headers
New compact_lists_margin option to produce a more compact representation when possible (see issue #1)
0.7 (2017-08-10)
Fix sdist including the Sphinx documentation
0.6 (2017-08-10)
New option --parse-tree on the command line tool to show just the parse tree
Sphinx documentation, available online
0.5 (2017-08-09)
Handle some more cases when a name must be double-quoted
Complete the serialization of the WindowDef node, handling its frame options
0.4 (2017-08-09)
Expose the actual PostgreSQL version the underlying libpg_query libray is built on thru a new get_postgresql_version() function
New option safety_belt for the prettify() function, to protect the innocents
Handle serialization of CoalesceExpr and MinMaxExpr
0.3 (2017-08-07)
Handle serialization of ParamRef nodes
Expose a prettify() helper function
0.2 (2017-08-07)
Test coverage at 99%
First attempt at automatic wheel upload to PyPI, let’s see…
0.1 (2017-08-07)
First release (“Hi daddy!”, as my soul would tag it)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for pglast-2.0.dev2-cp39-cp39-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9baf8a47c7a62e6aeb54c628f12a59602740631c0bb66afe36051a33598e1d78 |
|
MD5 | 5ae48df1b5d36035c58c9995b608f8fe |
|
BLAKE2b-256 | 3e6736ab19d60267803dda593cfe2c60ec73e7eac2fc1401a98eda37215b2fcf |
Hashes for pglast-2.0.dev2-cp39-cp39-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b19b7589f63f69497a0100da6b4e92565eecdb5f005126d05cbbc03b1c8f0809 |
|
MD5 | 405574acef2eba92e488ffc470d9b528 |
|
BLAKE2b-256 | 58289f420b21264dbf7edd80660d34a84674b6b29aaae7fcdad563a8c178f80d |
Hashes for pglast-2.0.dev2-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 653c7a71c6b158e96facee8dd50122679a88a8471cf52b86127a4b2e6ab09b83 |
|
MD5 | c3a60cfb867cc6c0b8a9b23549e4d724 |
|
BLAKE2b-256 | 8e227daa1014f5885b2575a31d7b2f8c23f8d0a05be6153a670cd4463a7b6105 |
Hashes for pglast-2.0.dev2-cp39-cp39-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a95c63eecbdf431f698e4854600192f524565c6d9d65a4cf96bc976ec77d600 |
|
MD5 | 0418f825a0301c6553e52fa2129ec536 |
|
BLAKE2b-256 | 576a30a75dc358bdc35e6fbc616b65444b03c3ec76e84a70e12557e557e7a322 |
Hashes for pglast-2.0.dev2-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8dbef79012d9378841686871f2e71bbf3bbc27cb24aa067599de20859b6a8cc |
|
MD5 | 266ed45b737c436cd68e762311d8bd29 |
|
BLAKE2b-256 | fa70795d0517fcb0712e5b00dea927856cd89cec68658c535ce537609161efb5 |
Hashes for pglast-2.0.dev2-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cdae622f3f9ff7913c1a26c04672aa49b4743b3521a079c97583fb87dbc7a0b9 |
|
MD5 | b3a3e26e123a37e8d5652664f6f83377 |
|
BLAKE2b-256 | 5e713eb5b3585528d52ed276b11d93487f097806e46996057783ac644147d1f8 |
Hashes for pglast-2.0.dev2-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e3c2e14761ed276d5b3c6bdadb44e8ddc441aac629b8bb9c285b8085581c8ec |
|
MD5 | 3303432632a30db5d201738469eb33de |
|
BLAKE2b-256 | ea68a9f97b4ac9abd6dbe5ce7cd4e25555edffd76c7734d3dd92a6476d5272ce |
Hashes for pglast-2.0.dev2-cp38-cp38-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 593d86b5b6ecda37c011db6faab93be830e55cc2b8b330a1ff89ea692c79a9f4 |
|
MD5 | db9992d8ea74226b8fe3bd6b66095d1d |
|
BLAKE2b-256 | fe0e6129e7965ad4bc1d86ca9a277cb45c09af2f3dd0c7c6ae3ead1e722f02c7 |
Hashes for pglast-2.0.dev2-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | afce97fa3621f433e3db36419d98f43f88f20c6a7c46c4a0e612a5ea238d4322 |
|
MD5 | dd52fa9f548db2b7fd5b9c7bbad610f4 |
|
BLAKE2b-256 | e81b6cf1c6f00f700bbe531afd042edb3d3464d3ec4ab67db6da6d8df6c2ae74 |
Hashes for pglast-2.0.dev2-cp37-cp37m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63804620ef6eea32ae6e26561c04a0e93c66addae117814c22eac209743d9f69 |
|
MD5 | 8c7728c2a0f88541d5a6f30c83a8fe49 |
|
BLAKE2b-256 | f66cbb77edc7d2328be748c0c9b8a99f99e4ea649c320794a23fdced7b8a8c53 |
Hashes for pglast-2.0.dev2-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0fb4ece0a389368d799e950867197f530aed9e8426012bc692e98d94460f9ee |
|
MD5 | f0cc0205357cf64bec1e96dcc93f1461 |
|
BLAKE2b-256 | e110550780e4b9b01848e8b297e52e973c95d0edb7b9a3dcc544ba8f485ef343 |
Hashes for pglast-2.0.dev2-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a3599802bbd939d14b728b0f3f24f4c92becf39003068ebe85b9c04d1ccc0623 |
|
MD5 | 53bb965be38b9991bce29b5b655b6988 |
|
BLAKE2b-256 | 71f4a36166d8a69a28d4e7f93299fa52a083b30dc19ef0f446ae7fcb60ec5413 |
Hashes for pglast-2.0.dev2-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 255bfc6a92f864aac775ad0c7447b1af69b7746b72319ccbeef8392d4c889a54 |
|
MD5 | 12b244e12f398969bd9749d297b4ec54 |
|
BLAKE2b-256 | 77af8ca9ef2bf7c70b86438b63996f9ca1377a1b9a43ae70be4389170b75d5cf |
Hashes for pglast-2.0.dev2-cp36-cp36m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9652b3929a1f03c412cc39c5bca74d6220b5a13ecbf1adc3a888f780d75b3d5c |
|
MD5 | b6078eb3a54a2fc2275f2667ae721c33 |
|
BLAKE2b-256 | 9510417585dd2b009327ca744b49a8502c330aad7e876f4eae5e98350ce7730b |
Hashes for pglast-2.0.dev2-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16b41a5e851f6bafe7c1becfe7c6878a9fc35286cb93e959db7075915881ba21 |
|
MD5 | 547cc373f080190a4fd874d9c7885bde |
|
BLAKE2b-256 | b757541a35510fda7c01c91c056c4d5ebe31e9dc43b4b7bca1f98bc93ed1a5bb |
Hashes for pglast-2.0.dev2-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b90da5f01f19683fb84897a566b0d019f56726e3d95b4c6ba84db543b563edae |
|
MD5 | d4ff3dd9209a7ea3be9145dacc394417 |
|
BLAKE2b-256 | 4672a9905639f2cdf58e6e6a67b5a3d69999119b3d7a80395a48398517a9d539 |