[This is on an iSeries/DB2 database if that makes any difference]
I want to write a procedure to identify columns that are left as blank or zero (given a list of tables).
Assuming I can pull out table and column definitions from the central system tables, how should I check the above condition? My first guess is for each column generate a statement dynamically such as:
select count(*) from my_table where my_column != 0
and to check if this returns zero rows, but is there a better/faster/standard way to do this?
NB This just needs to handle simple character, integer/decimal fields, nothing fancy!
-
I'm assuming you mean you want to know if there are any values in all the rows of a given column. If your column can have "blanks" you're probably going to need to add an OR NOT NULL to your WHERE clause to get the correct answer.
-
Yes, typically, I would do something like this in SQL Server:
SELECT REPLACE(REPLACE(REPLACE( ' SELECT COUNT(*) AS [COUNT NON-EMPTY IN {TABLE_NAME}.{COLUMN_NAME}] FROM [{TABLE_SCHEMA}].[{TABLE_NAME}] WHERE [{COLUMN_NAME}] IS NOT NULL OR [{COLUMN_NAME}] <> 0 ' , '{TABLE_SCHEMA}', c.TABLE_SCHEMA) , '{TABLE_NAME}', c.TABLE_NAME) , '{COLUMN_NAME}', c.COLUMN_NAME) AS [SQL] FROM INFORMATION_SCHEMA.COLUMNS c INNER JOIN INFORMATION_SCHEMA.TABLES t ON t.TABLE_TYPE = 'BASE TABLE' AND c.TABLE_CATALOG = t.TABLE_CATALOG AND c.TABLE_SCHEMA = t.TABLE_SCHEMA AND c.TABLE_NAME = t.TABLE_NAME AND c.DATA_TYPE = 'int'You can get a lot fancier by doing UNIONs of the entire query and checking the IS_NULLABLE on each column and obviously you might have different requirements for different data types, and skipping identity columns, etc.
-
To check for columns that contain only NULLs on DB2:
- Execute RUNSTATS on your database (http://www.ibm.com/developerworks/data/library/techarticle/dm-0412pay/)
- Check the database statistics by quering SYSSTAT.TABLES and SYSSTAT.COLUMNS . Comparing SYSSTAT.TABLES.CARD and SYSSTAT.COLUMNS.NUMNULLS will tell you what you need. An example could be:
select t.tabschema, t.tabname, c.colname from sysstat.tables t, sysstat.columns c where ((t.tabschema = 'MYSCHEMA1' and t.tabname='MYTABLE1') or (t.tabschema = 'MYSCHEMA2' and t.tabname='MYTABLE2') or (...)) and t.tabschema = c.tabschema and t.tabname = c.tabname and t.card = c.numnullsMore on system stats e.g. here: http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r0001070.htm and http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r0001073.htm
Similarly, you can use SYSSTAT.COLUMNS.AVGCOLLEN to check for empty columns (just it doesn't seem to work for LOBs).
EDIT: And, to check for columns that contain only zeros, use try comparing HIGH2KEY and LOW2KEY in SYSSTAT.COLUMNS.
0 comments:
Post a Comment