So you want to add compression to your house you need to get a tradesman in to give you an estimate, then carry out the work, DB2 can do all of this. Just like building an extension you need to make sure that you need all the appropriate permissions from the “council” (IBM) in place, you either need to buy the Storage Optimisation as a “feature” or as part of Advanced Enterprise Edition of DB2. Please be careful when trying to use compression because as soon as you include “COMPRESSION YES” it will set the features used to YES for compression and if you get audited you could face a hefty bill.
Benefit’s to extending to compression
At a high level the there are three ways of looking at this.
No compression
Benefits
Not having to pay the licensing fee to IBM for compression.
Costs
Large amounts of disk space used for the data, minimal amounts of data in your bufferpools as the page sizes are not made any smaller
Classic Compression
Benefits
Data is compressed on disk and saves you here, data is also compressed in the bufferpools so more pages in them; less I/0 quicker queries. Data is also compressed in the backup images.
Costs
Licensing fee to IBM. Slight increase in CPU usage for the compression dictionary usage. You need to reset the dictionary with a REORG from time to time to make sure that you get the most out of the compression.
Adaptive Compression
Benefits
Data is compressed on disk, data is also compressed in the bufferpools so more pages in them; less I/0 quicker queries. Data is also compressed in the backup images. Data is continually compressed, no need for the RESETDICTIONARY REORG in the same way as the Classic compression.
Costs
Licensing fee to IBM. Increase in CPU usage for the compression dictionary usage. Only available in the latest DB2 V10.1
Here’s what you could be saving – SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO
Handley IBM have included a very useful table function SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO. The full information for this can be found in the information centre here. This table function will estimate the savings that you will get with no compression, “standard” compression and adaptive compression, GOTCHA’s for this are below:
SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO – GOTCHA’s
- Tables that are partitioned will come through the stored procedure as multiple rows. You do get a partition ID which you will be able to either join out too or look up in the table SYSCAT.DATAPARTITIONS.
- If the table has an (or more) XML column then you will get an additional row in the results returned, a “DATA” and an “XML” compression estimation row. Together with the other gotcha you could end up a lot of a rows returned for a partitioned table with XML columns.
Getting an estimate – SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO
This procedure can be used to get information on either a table or an entire schema, obviously the later can take some time to run from what I have found especially when the tables are large. The most simple form of the stored procedure is:
SELECT *
FROM TABLE(SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO({SCHEMA NAME}, {TABLE NAME}))
This will get you a result a little like this (sorry for the formatting):
TABSCHEMA TABNAME DBPARTITIONNUM DATAPARTITIONID OBJECT_TYPE ROWCOMPMODE PCTPAGESSAVED_CURRENT AVGROWSIZE_CURRENT PCTPAGESSAVED_STATIC AVGROWSIZE_STATIC PCTPAGESSAVED_ADAPTIVE AVGROWSIZE_ADAPTIVE
------------ ---------- ----------------- ------------------ -------------- -------------- ------------------------ --------------------- ----------------------- -------------------- ------------------------- ----------------------
SCHEMA TABLE 0 0 DATA S 0 495 65 173 65 170
The example above shows that this table currently is using “Classic” compression, represented by the S, a blank would mean no row compression and an A would be the new adaptive compression in DB2. As you can see it gives you an estimate on the average row size in the different compression modes, this is in bytes and you will then need to work out what the full Gb / Mb size might be based on the cardinality of the table.
The table function is telling us though that there are potentially 65% savings to be made in both adaptive and classic compression, but there is a 3 byte difference and adaptive compression in my opinion is far better so I would ALTER TABLE to COMPRESS YES ADAPTIVE.
If you want to run the table function against a schema leave the table part a blank string
SELECT *
FROM TABLE(SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO({SCHEMA NAME}, ''))
This will get you a row per table in the schema (plus any extra for XML / partitioned tables)
The future
In a future post I will look at using this table function to record the values for all tables, you can then look at a before and after and therefore prove that the change in compression and the associated REORG’s have worked.