Wednesday, April 19, 2006

Checking duplicate Sql statements in Excel

In my current project we are doing database Continuous Integration (CI). This means that every developer will have their own database instance for development, and at every check-in the build server kicks off the CI cycle, which starts getting the latest source code, compile, rebuild the database from scratch through running SQL scripts, re-insert data (reference data) these tables should have, compile, test, etc. When done right, this allows the much-required freedom for developers to develop against the database without interfering other developers' databases, and significantly reduces the time it takes to ask the DBA's to modify the database table for you.

When you acquire from the customer/business analysts a new set of reference data that they would like you to insert into these database tables, I have seen in many cases where these new data conflicts with the data that already exists due to duplication (mostly human error). Since most of these reference data are scripted in text files and there are usually hundreds if not thousands of them, sometimes finding that one duplicating line is like trying to find a needle in a sea.

Fortunately in Excel there is a function to rescue: =COUNTIF(set, to_find).

Usage: Increment count if "to_find" exists in "set".

So to quickly find out which is the offending PK constraint INSERT line, you do this:
=COUNTIF(E3:E11, E3)

And as you see you will quickly find out where that awful INSERT statement is... Time to bug the customer =)

1 comment:

Anonymous said...

<plug>Hi...this is a bit of a plug but have you tried Simian ( It works on SQL files as well.</plug>

To be honest I haven't tried it in the specific scenario you're talking about but it might work and it can be used as part of an CI build process in ant.