How to write an SQL beautifier/indenter

  • Thread starter Thomas Koester via AccessMonster.com
  • Start date
T

Thomas Koester via AccessMonster.com

I have written a tool that analyzes the object dependencies in an ACCESS DB
and reports details about the database objects. It includes a simple SQL
beautifer but it is incomplete and I am unhappy with the code structure. I
was wondering if there are open source SQL beautifiers/indenters available
some place to learn from the example. Theory descriptions would be even
better, but I doubt that these are available.
A Google search has not shown any good results. Can somebody of you point me
into the right direction?
VBA code for Jet-SQL would be perfect but more or less any other combination
of language and SQL dialect (e.g. PHP/MySQL) should also help.

Any hints would be greatly appreciated.

Thomas
 
T

Tom van Stiphout

On Sat, 22 Mar 2008 15:43:12 GMT, "Thomas Koester via

You want a hint? Here is one: don't waste your breath.
My guess is 90+ percent of the users look at SQL in the designer so
indentation does not matter, and the other 10% will find that your
beautifications have been re-interpreted by Access and be made moot.

-Tom.
 
J

John W. Vinson

It includes a simple SQL
beautifer

As Tom says... within Access, this is wasted effort. Access will scramble the
SQL, remove all the hard returns, even resequence the parts if the user opens
it in the design window. Beautification simply doesn't stick, whether you do
it programmatically or manually.
 
T

Thomas Koester via AccessMonster.com

Tom, John, thank you for your replies. You are right about SQL _within_
Access queries. But the tool I need this for _reports_ (among other stuff)
query definitions in a textbox (have a look at the screenshot at the bottom
of http://mdb-analyzer.thomko.de/index.php?page=screenshots for an example).
In addition there is an export feature to plain text to easily determine
differences in two versions of an Access DB. For both purposes nicely
formatted SQL is much simpler to read than the string as generated (fuzzified)
by Access.

The code I have (with a state-machine architecture) works well on simple
queries but does not work with sub-queries, paramenter queries etc. And it
certainly has too many if-then-else statements. Instead of tweaking this code
further I would prefer to re-start from scratch but without re-inventing the
wheel how to write an SQL indenter. As my tool is open source I should be
able to integrate other open source solutions but I have not been able to
find some.

So if anybody knows where to find something like this or how to write it,
please let me know.

Thomas
 
T

Tom van Stiphout

On Sun, 23 Mar 2008 22:02:21 GMT, "Thomas Koester via

Ah, got you. A lofty goal.
Writing a good SQL parser is decidedly non-trivial. I have never got
further than a boolean expression parser which worked nicely. I used
ideas from the UNIX utilities LEX for tokenizing, and especially YACC.
The strategy was to first define the offical language (see the Syntax
section of many SQL Server Books Online pages, e.g. on 'SELECT
clause'), and then have YACC spit out a compiler for that language.
SQL to me seems more complicated. If I REALLY had to do this, I would
see if a Information Technologies professor at a nearby college would
be so kind as to give me some pointers.
You may think "but I don't have to evaluate the statement, just
beautify it", but I think without the above process there will always
be statements your code can't handle. And perhaps that's OK - it's a
tradeoff of your time vs return on investment.

-Tom.
 
T

Thomas Koester via AccessMonster.com

Tom, thank you for your feedback. I read some documentation about LEX/YACC
and found some more stuff in wikipedia:
http://en.wikipedia.org/wiki/Prettyprint (has a link to a well documented
PASCAL/PASCAL example)
http://en.wikipedia.org/wiki/LR_parser (parsing theory)

I think that this pointed me into the right direction. I had already a
tokenizer for JET-SQL as I needed it anyhow to analyze the query/table
dependencies. Based on that and the info and examples, I am already pretty
far into some prototype code. Key element is to define sequences of events
(indent, write token, write space, new line etc.) based on a combination of
the current token and the state/history implemented as a stack.
I may post more details and/or a link to the resulting code when I am done.
 
M

mcescher

Tom, thank you for your feedback. I read some documentation about LEX/YACC
and found some more stuff in wikipedia:http://en.wikipedia.org/wiki/Prettyprint(has a link to a well documented
PASCAL/PASCAL example)http://en.wikipedia.org/wiki/LR_parser(parsing theory)

I think that this pointed me into the right direction. I had already a
tokenizer for JET-SQL as I needed it anyhow to analyze the query/table
dependencies. Based on that and the info and examples, I am already pretty
far into some prototype code. Key element is to define sequences of events
(indent, write token, write space, new line etc.) based on a combination of
the current token and the state/history implemented as a stack.
I may post more details and/or a link to the resulting code when I am done..

This site has one online, and a downloadable tool and it's ready for
use with .NET , COM and VCL components

http://www.wangz.net/gsqlparser/sqlpp/sqlformat.htm?ref=sqlzoo

HTH,
Chris M.
 
Top