Michael Peppler
Sybase Consulting
Sybase on Linux
Install Guide for Sybase on Linux
General Sybase Resources
General Perl Resources
BCP Tool
Bug Tracker
Mailing List Archive
Downloads Directory
Sybase on Linux FAQ
Sybperl FAQ
Michael Peppler's resume

sybperl-l Archive

Up    Prev    Next    

From: "Mark Sutfin" <MSutfin at affinitygroup dot com>
Subject: RE: Parsing stored proc/sql code
Date: Feb 25 2005 4:11PM

Wechsler,   Steven  ( Friday, February 25, 2005 8:12 AM ) wrote 
I've been asked to write a script to examine new stored procedures to see if they do any writing (i.e. delete, insert, update) to the database, or, recursively, if they call  any stored procs that write to the database. I can't just do a simple grep but also have to ignore comments and quoted strings.

 Steve, I'm about halfway (?) thru parsing sprocs/embedded sql from 7 PowerBuilder apps. These 2700 objects contain many variations on db access which has pushed my regex/parsing skills to the max, as I'm relatively new to perl. I'm over the hump now and found Jeffrey Friedl's book Master Regular expressions invaluable. Along with the right text parser of which several are listed/explained in parsing section of Chapter 9 in O'Reilly's Mastering Algorithms with Perl. I've got a long way to go (due date 3/31/2005) and I may not have much code that would plug-in to your task, but if you'd like to chat offline, lemmeno.
These PB apps are being migrated to .NET. The boss thought it might be nice to slice and dice the existing applications for the contractors who won the re-write bid. Aside from identifying all objects that call sprocs, the optional parms, I've been asked to identify the embedded sql..(datawindows in PB) as well as roll up counts of tables used, the combinations of tables, (to be used to build more reusable objects in the new apps) and the types of calls (crud matrix).
All of these apps access sybase 11.9.2 dbs that have very little RI. The next task is identify the data (or lack thereof) that will cause load problems in the new db (it will have all of the necessary RI applied) then the cleanup/optimize the data as the new .Net apps progress thru development. Outside of using sql for removing our missing data from the side of milk cartons, I done the majority of it in perl. (again, no guru, but it's working..!)
For me, the gist of this kind of parsing is to know whether you can work line by line or can you work at the file level(sproc dml). Do you need to break things down in to smaller pieces to get at the meat? Then deciding if you need the abilty to define tokens as provided by Parse::Lex; recursive-descent as in Parse::RecDescent; abbreviations in Text::Abbrev or other variations in Text::ParseWords, Text::DelimMatch, Text::Balanced...are questions that may need to be asked, but up til now, I found were a bit overkill. I'm not counting them out, as the hardest part (embedded sql) is yet to come.
The majority of mine is by file (undef $\ and slurp file into a scalar - $text). Matching against a file seems to offer a bit more flexibilty for me. With $text populated, just apply the syntax mask you're looking for. Mine happens to defined in the ORCA api standard. Yours sounds perhaps like ansi sql. An example of my syntax :
declare  procedure for  @var1 = '' [,...]. I'm at home currently, so I don't have access to the scripts, but from memory, it's something like this (untested)
while ($text = m/\s*\w+\s+procedure\s+for\s+([\w\.])/gi) {
    #the "g" modifier, (I believe) is what affords the "while" a positional conditional to loop on 
    # do something with $1 which is the sproc name. In your case, it might be delete, update (for update in the case of a cursor), insert...
} else {
    # sproc doesn't contain anything but reads..