Python parse SQL and find relationships -


i've got large list of sql queries in strings, they've been written presto, kinda formatted mysql.

i want able tease out table relationships written in queries.

let's start simple:

select e.object_id, count(*) schema_name.elements e        join schema_name2.quotes q on q.id = e.object_id e.object_type = 'something'  group e.object_id, q.query order 2 desc; 

can see things join together, though there's aliases - need scan through , find aliases - that's fine keyword "as" used.

so i'd want have returned list of relationships query, each relationship dict:

dict = {'sourceschema': 'schema_name', 'sourcetable': "elements", 'sourcecolumn': "object_id", 'targetschema': "schema_name2", 'targettable': "quotes", 'targetcolumn': "id"} 

i can imagine doing pretty easy, stuff gets more complicated:

select e.object_id, count(*) schema_name.elements e         left join schema_name2.quotes q on q.id = cast(coalesce(nullif(e.object_id,''),'0') bigint) e.object_type = 'something'  group e.object_id, q.query order 2 desc; 

3 things note

  • missing "as" reserved word - make harder get
  • when joining, there's lot of stuff needed parse 2 tables together
  • this isn't simple "join" it's left join

i'm wondering if there's form of sql parsing library python allow me tease out relationships in 4000 queries? , if not, how efficiently? i'm guessing might need scan through queries, find joins, find alias, @ how they're joined whilst taking account bunch of stop words need discarded.

with minor changes select_parser.py (https://sourceforge.net/p/pyparsing/code/head/tree/trunk/src/examples/select_parser.py) part of pyparsing examples, after parsing first example:

select e.object_id, count(*) schema_name.elements e        join schema_name2.quotes q on q.id = e.object_id e.object_type = 'something' group e.object_id, q.query order 2 desc; ['select', [['e.object_id'], ['count', '*']], 'from', [['schema_name', '.', 'elements'], 'as', 'e', ['join'], ['schema_name2', '.', 'quotes'], 'as', 'q', ['on', ['q.id', '=', 'e.object_id']]], 'where', ['e.object_type', '=', 'something'], 'group', 'by', [['e.object_id'], ['q.query']], 'order', 'by', [['2', 'desc']], ';'] - columns: [['e.object_id'], ['count', '*']]   [0]:     ['e.object_id']   [1]:     ['count', '*'] - from: [[['schema_name', '.', 'elements'], 'as', 'e', ['join'], ['schema_name2', '.', 'quotes'], 'as', 'q', ['on', ['q.id', '=', 'e.object_id']]]]   [0]:     [['schema_name', '.', 'elements'], 'as', 'e', ['join'], ['schema_name2', '.', 'quotes'], 'as', 'q', ['on', ['q.id', '=', 'e.object_id']]]     - table_alias: [['e'], ['q']]       [0]:         ['e']       [1]:         ['q'] - order_by_terms: [['2', 'desc']]   [0]:     ['2', 'desc']     - direction: desc     - order_key: 2 - where_expr: ['e.object_type', '=', 'something'] 

so looks example might started. written select format sqlite, you'll need expand of syntax.


Comments

Popular posts from this blog

javascript - Thinglink image not visible until browser resize -

firebird - Error "invalid transaction handle (expecting explicit transaction start)" executing script from Delphi -

mongodb - How to keep track of users making Stripe Payments -