Simple DDL Parser to parse SQL & HQL ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.
Project description
Simple DDL Parser
Parser tested on DDL for PostgreSQL & Hive.
If you have samples that cause an error - please open the issue, I will be glad to fix it.
This parser take as input SQL DDL statements or files, for example like this:
create table prod.super_table
(
data_sync_id bigint not null,
sync_count bigint not null,
sync_mark timestamp not null,
sync_start timestamp not null,
sync_end timestamp not null,
message varchar(2000) null,
primary key (data_sync_id, sync_start)
);
And produce output like this (information about table name, schema, columns, types and properties):
[{
'columns': [
{'name': 'data_sync_id', 'type': 'bigint', 'mode': False, 'size': None, 'default': None},
{'name': 'sync_count', 'type': 'bigint', 'mode': False, 'size': None, 'default': None},
{'name': 'sync_mark', 'type': 'timestamp', 'mode': False, 'size': None, 'default': None},
{'name': 'sync_start', 'type': 'timestamp', 'mode': False, 'size': None, 'default': None},
{'name': 'sync_end', 'type': 'timestamp', 'mode': False, 'size': None, 'default': None},
{'name': 'message', 'type': 'varchar', 'mode': False, 'size': 2000, 'default': None}],
'table_name': 'super_table', 'schema': 'prod',
'primary_key': ['data_sync_id', 'sync_start']
}]
Or one more example
CREATE TABLE "paths" (
"id" int PRIMARY KEY,
"title" varchar NOT NULL,
"description" varchar(160),
"created_at" timestamp,
"updated_at" timestamp
);
and result
[{
'columns': [
{'name': 'id', 'type': 'int', 'nullable': False, 'size': None, 'default': None},
{'name': 'title', 'type': 'varchar', 'nullable': False, 'size': None, 'default': None},
{'name': 'description', 'type': 'varchar', 'nullable': False, 'size': 160, 'default': None},
{'name': 'created_at', 'type': 'timestamp', 'nullable': False, 'size': None, 'default': None},
{'name': 'updated_at', 'type': 'timestamp', 'nullable': False, 'size': None, 'default': None}],
'primary_key': ['id'],
'table_name': 'paths', 'schema': ''
}]
If you pass file or text block with more when 1 CREATE TABLE statement when result will be list of such dicts. For example:
Input:
CREATE TABLE "countries" (
"id" int PRIMARY KEY,
"code" varchar(4) NOT NULL,
"name" varchar NOT NULL
);
CREATE TABLE "path_owners" (
"user_id" int,
"path_id" int,
"type" int DEFAULT 1
);
Output:
[
{'columns': [
{'name': 'id', 'type': 'int', 'size': None, 'nullable': False, 'default': None},
{'name': 'code', 'type': 'varchar', 'size': 4, 'nullable': False, 'default': None},
{'name': 'name', 'type': 'varchar', 'size': None, 'nullable': False, 'default': None}],
'primary_key': ['id'],
'table_name': 'countries',
'schema': None},
{'columns': [
{'name': 'user_id', 'type': 'int', 'size': None, 'nullable': False, 'default': None},
{'name': 'path_id', 'type': 'int', 'size': None, 'nullable': False, 'default': None},
{'name': 'type', 'type': 'int', 'size': None, 'nullable': False, 'default': 1}],
'primary_key': [],
'table_name': 'path_owners',
'schema': None}
]
How to use
From python code
from simple_ddl_parser import DDLParser
parse_results = DDLParser("""create table dev.data_sync_history(
data_sync_id bigint not null,
sync_count bigint not null,
sync_mark timestamp not null,
sync_start timestamp not null,
sync_end timestamp not null,
message varchar(2000) null,
primary key (data_sync_id, sync_start)
); """).run()
print(parse_results)
To parse from file
from simple_ddl_parser import parse_from_file
result = parse_from_file('tests/test_one_statement.sql')
print(result)
More examples & tests
You can find in tests/functional folder.
Dump result in json
To dump result in json use argument .run(dump=True)
You also can provide a path where you want to have a dumps with schema with argument
TODO in next Releases
Support for references (Foreigein key) in column defenition
Support for separate ALTER TABLE statements for Foreigein keys like
ALTER TABLE "material_attachments" ADD FOREIGN KEY ("material_id") REFERENCES "materials" ("id");
Support for parse CREATE INDEX statements
Add command line
Support ARRAYs
Historical context
This library is an extracted parser code from https://github.com/xnuinside/fakeme (Library for fake relation data generation, that I used in several work projects, but did not have time to make from it normal open source library)
For one of the work projects I needed to convert SQL ddl to Python ORM models in auto way and I tried to use https://github.com/andialbrecht/sqlparse but it works not well enough with ddl for my case (for example, if in ddl used lower case - nothing works, primary keys inside ddl are mapped as column name not reserved word and etc.). So I remembered about Parser in Fakeme and just extracted it & improved.
How to contribute
Please describe issue that you want to solve and open the PR, I will review it as soon as possible.
Any questions? Ping me in Telegram: https://t.me/xnuinside
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for simple-ddl-parser-0.2.1a2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | e14a14598a1bb6f99e40221852cf15ff09fc9baaf9700828d3bee35e17239995 |
|
MD5 | da30c1c6f1cda244394413ff8297f1e5 |
|
BLAKE2b-256 | 002796a406ac388ccbc56d442f6099e9fe2a2eee49c6d377c117d9b4932e6571 |
Hashes for simple_ddl_parser-0.2.1a2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2342b2b68d797b5f8b7a9fcd1a6a2040412d9971275f2d7da343e3dfd2779435 |
|
MD5 | 91ed92718c66b1f40f6fc5ddac56fa04 |
|
BLAKE2b-256 | 2689faaedea6ec7ff6b22cdb4c734bba57d120063f7868e266d3591465baa1c5 |