Simple DDL Parser to parse SQL & HQL ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.
Project description
Simple DDL Parser
Parser tested on DDL for PostgreSQL & Hive.
If you have samples that cause an error - please open the issue, I will be glad to fix it.
This parser take as input SQL DDL statements or files, for example like this:
create table prod.super_table
(
data_sync_id bigint not null,
sync_count bigint not null,
sync_mark timestamp not null,
sync_start timestamp not null,
sync_end timestamp not null,
message varchar(2000) null,
primary key (data_sync_id, sync_start)
);
And produce output like this (information about table name, schema, columns, types and properties):
[{
'columns': [
{'name': 'data_sync_id', 'type': 'bigint', 'mode': False, 'size': None, 'default': None},
{'name': 'sync_count', 'type': 'bigint', 'mode': False, 'size': None, 'default': None},
{'name': 'sync_mark', 'type': 'timestamp', 'mode': False, 'size': None, 'default': None},
{'name': 'sync_start', 'type': 'timestamp', 'mode': False, 'size': None, 'default': None},
{'name': 'sync_end', 'type': 'timestamp', 'mode': False, 'size': None, 'default': None},
{'name': 'message', 'type': 'varchar', 'mode': False, 'size': 2000, 'default': None}],
'table_name': 'super_table', 'schema': 'prod',
'primary_key': ['data_sync_id', 'sync_start']
}]
Or one more example
CREATE TABLE "paths" (
"id" int PRIMARY KEY,
"title" varchar NOT NULL,
"description" varchar(160),
"created_at" timestamp,
"updated_at" timestamp
);
and result
[{
'columns': [
{'name': 'id', 'type': 'int', 'nullable': False, 'size': None, 'default': None},
{'name': 'title', 'type': 'varchar', 'nullable': False, 'size': None, 'default': None},
{'name': 'description', 'type': 'varchar', 'nullable': False, 'size': 160, 'default': None},
{'name': 'created_at', 'type': 'timestamp', 'nullable': False, 'size': None, 'default': None},
{'name': 'updated_at', 'type': 'timestamp', 'nullable': False, 'size': None, 'default': None}],
'primary_key': ['id'],
'table_name': 'paths', 'schema': ''
}]
If you pass file or text block with more when 1 CREATE TABLE statement when result will be list of such dicts. For example:
Input:
CREATE TABLE "countries" (
"id" int PRIMARY KEY,
"code" varchar(4) NOT NULL,
"name" varchar NOT NULL
);
CREATE TABLE "path_owners" (
"user_id" int,
"path_id" int,
"type" int DEFAULT 1
);
Output:
[
{'columns': [
{'name': 'id', 'type': 'int', 'size': None, 'nullable': False, 'default': None},
{'name': 'code', 'type': 'varchar', 'size': 4, 'nullable': False, 'default': None},
{'name': 'name', 'type': 'varchar', 'size': None, 'nullable': False, 'default': None}],
'primary_key': ['id'],
'table_name': 'countries',
'schema': None},
{'columns': [
{'name': 'user_id', 'type': 'int', 'size': None, 'nullable': False, 'default': None},
{'name': 'path_id', 'type': 'int', 'size': None, 'nullable': False, 'default': None},
{'name': 'type', 'type': 'int', 'size': None, 'nullable': False, 'default': 1}],
'primary_key': [],
'table_name': 'path_owners',
'schema': None}
]
How to use
From python code
from simple_ddl_parser import DDLParser
parse_results = DDLParser("""create table dev.data_sync_history(
data_sync_id bigint not null,
sync_count bigint not null,
sync_mark timestamp not null,
sync_start timestamp not null,
sync_end timestamp not null,
message varchar(2000) null,
primary key (data_sync_id, sync_start)
); """).run()
print(parse_results)
To parse from file
from simple_ddl_parser import parse_from_file
result = parse_from_file('tests/test_one_statement.sql')
print(result)
More examples & tests
You can find in tests/functional folder.
Dump result in json
To dump result in json use argument .run(dump=True)
You also can provide a path where you want to have a dumps with schema with argument
TODO in next Releases
Support for references (Foreigein key) in column defenition
Support for separate ALTER TABLE statements for Foreigein keys like
ALTER TABLE "material_attachments" ADD FOREIGN KEY ("material_id") REFERENCES "materials" ("id");
Support for parse CREATE INDEX statements
Add command line
Support ARRAYs
Historical context
This library is an extracted parser code from https://github.com/xnuinside/fakeme (Library for fake relation data generation, that I used in several work projects, but did not have time to make from it normal open source library)
For one of the work projects I needed to convert SQL ddl to Python ORM models in auto way and I tried to use https://github.com/andialbrecht/sqlparse but it works not well enough with ddl for my case (for example, if in ddl used lower case - nothing works, primary keys inside ddl are mapped as column name not reserved word and etc.). So I remembered about Parser in Fakeme and just extracted it & improved.
How to contribute
Please describe issue that you want to solve and open the PR, I will review it as soon as possible.
Any questions? Ping me in Telegram: https://t.me/xnuinside
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for simple_ddl_parser-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3cd1b9828937b86d1e07406829710e97594110a9d657052b865a188ba7db8137 |
|
MD5 | b2d5e96e43cf92d58ed27bfec44ece6f |
|
BLAKE2b-256 | cc55bd5a4400e49ee988de6d16d05797960582edc1f39f2b31db051fc6edfcb0 |