Python parse SQL and find relationships -
i've got large list of sql queries in strings, they've been written presto, kinda formatted mysql.
i want able tease out table relationships written in queries.
let's start simple:
select e.object_id, count(*) schema_name.elements e join schema_name2.quotes q on q.id = e.object_id e.object_type = 'something' group e.object_id, q.query order 2 desc;
can see things join together, though there's aliases - need scan through , find aliases - that's fine keyword "as" used.
so i'd want have returned list of relationships query, each relationship dict:
dict = {'sourceschema': 'schema_name', 'sourcetable': "elements", 'sourcecolumn': "object_id", 'targetschema': "schema_name2", 'targettable': "quotes", 'targetcolumn': "id"}
i can imagine doing pretty easy, stuff gets more complicated:
select e.object_id, count(*) schema_name.elements e left join schema_name2.quotes q on q.id = cast(coalesce(nullif(e.object_id,''),'0') bigint) e.object_type = 'something' group e.object_id, q.query order 2 desc;
3 things note
- missing "as" reserved word - make harder get
- when joining, there's lot of stuff needed parse 2 tables together
- this isn't simple "join" it's left join
i'm wondering if there's form of sql parsing library python allow me tease out relationships in 4000 queries? , if not, how efficiently? i'm guessing might need scan through queries, find joins, find alias, @ how they're joined whilst taking account bunch of stop words need discarded.
with minor changes select_parser.py (https://sourceforge.net/p/pyparsing/code/head/tree/trunk/src/examples/select_parser.py) part of pyparsing examples, after parsing first example:
select e.object_id, count(*) schema_name.elements e join schema_name2.quotes q on q.id = e.object_id e.object_type = 'something' group e.object_id, q.query order 2 desc; ['select', [['e.object_id'], ['count', '*']], 'from', [['schema_name', '.', 'elements'], 'as', 'e', ['join'], ['schema_name2', '.', 'quotes'], 'as', 'q', ['on', ['q.id', '=', 'e.object_id']]], 'where', ['e.object_type', '=', 'something'], 'group', 'by', [['e.object_id'], ['q.query']], 'order', 'by', [['2', 'desc']], ';'] - columns: [['e.object_id'], ['count', '*']] [0]: ['e.object_id'] [1]: ['count', '*'] - from: [[['schema_name', '.', 'elements'], 'as', 'e', ['join'], ['schema_name2', '.', 'quotes'], 'as', 'q', ['on', ['q.id', '=', 'e.object_id']]]] [0]: [['schema_name', '.', 'elements'], 'as', 'e', ['join'], ['schema_name2', '.', 'quotes'], 'as', 'q', ['on', ['q.id', '=', 'e.object_id']]] - table_alias: [['e'], ['q']] [0]: ['e'] [1]: ['q'] - order_by_terms: [['2', 'desc']] [0]: ['2', 'desc'] - direction: desc - order_key: 2 - where_expr: ['e.object_type', '=', 'something']
so looks example might started. written select format sqlite, you'll need expand of syntax.
Comments
Post a Comment