Monday, November 14, 2011

Sizing UNDO tablespace

When sizing Undo tablespace we shall be aware of the following aspects:
- this shall be large enough to store the case of all the undo generated by concurrent transactions, that will be active undo;
- enough undo to satisfy the longest running query, unexpired undo.

In advanced systems we shall also consider undo space to allow flashback queries. This is not treated in this note.
The job is to calculate the rate at which undo is being generated at the peak workload
and multiply by the length of your longest query.
All the information we need is in V$UNDOSTAT.

For accuracy, the test is done in an isolated environment,
with controlled number of users and active transactions - only the ones which run the below sessions.

SQL> select banner from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for 64-bit Windows: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

We first create two tables that we will use for our demonstration:

SQL> create table t1 as select * from dba_objects;
Table created.
SQL> create table t2 as select * from dba_objects;

We generate some undo records.

Session 1:
SQL> update t1 set owner='TEST1',object_name='OBJECT_T1';
72462 rows updated.
Elapsed: 00:00:06.99
Session 2:
SQL> update t2 set owner='TEST2',object_name='OBJECT_T2';
72463 rows updated.
Elapsed: 00:00:09.43

Check undo in MB generated for the 2 running queries:

SQL> select
  2        -- s.OSUSER,
  3         round(t.used_ublk * bs.block_size / 1024 / 1024, 2) mb_used,
  4         (select sa.SQL_TEXT
  5            from v$sqlarea sa
  6           where sa.SQL_ID = s.SQL_ID
  7             and rownum = 1) as sql_text
  8    from v$session s,
  9         v$transaction t,
 10         (select block_size
 11            from dba_tablespaces
 12           where upper(tablespace_name) like 'UNDO%') bs
 13   where t.addr = s.taddr
 14   order by mb_used desc;
   MB_USED
----------
SQL_TEXT
---------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
     13.11
update t1 set owner='TEST1',object_name='OBJECT_T1'
      13.1
 update t2 set owner='TEST2',object_name='OBJECT_T2'

So we have ~26 MB of active undo data.

We don't commit the work above, so that we use this undo data when running the below long running query.
When considering the longest running query in the system, that might not necesarily look for undo data at all times when is launched,
but we shall be prepared for the worst scenario, when we have an overlap.
In session 3, we run a query that will make use of the undo data, and we intentionally force a cartesian product to make it run for a longer period of time.

Session 3:
SQL> @temp.sql;

With temp.sql having the following content (we call it from script since we want to inhibit the result which will be displayed to SQL*Plus output and 'set termout off' does not supress output when typed interactively):

set termout off
select t1.owner,t2.owner from t1,t2;
exit;

Check the longest running query:
SQL> select * from
  2  (select max(maxquerylen),maxqueryid from v$undostat group by maxqueryid order by max(maxquerylen) desc)
  3  where rownum <=1;
MAX(MAXQUERYLEN) MAXQUERYID
---------------- -------------
            3472 gt9vc8t3aqtm1

Our query is running for almost 1 hour now.

SQL> select sql_text from v$sqlarea where sql_id = 'gt9vc8t3aqtm1';
SQL_TEXT
--------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
select t1.owner,t2.owner from t1,t2

Now we will run a query that computes the recommended UNDO size for the test case we created.
This is basically the sql we are interested in computing the UNDO size:

SQL> select round(
  2          max_undoblks/(round((end_time-begin_time)*1440,2)*60)*        --determine the maximum size of undo blocks generated each second
  3          (select block_size/1024
  4              from dba_tablespaces
  5            where 1=1
  6                and contents='UNDO'
  7                and  tablespace_name = (select value
  8                                        from v$parameter
  9                                        where name='undo_tablespace'))*  --determine the blocksize of the active UNDO tablespace
 10           (select max(maxquerylen) from v$undostat)/                     --longest running query in seconds
 11           1024) as "Recommended size of UNDO in MB"
 12  from (select * from
 13                                   (select /*+ FIRST_ROWS(1)*/
 14                                              max(undoblks) as max_undoblks ,begin_time,end_time
 15                                    from v$undostat
 16                                    group by begin_time,end_time
 17                                    order by max(undoblks) desc)
 18            where rownum=1);
Recommended size of UNDO in MB
------------------------------
                           164
                          
The conclusions here can be long debated:
- yes, we can set undo_retention to 60 minutes or more - but without retention guarantee, this cannot be of too much help;
- if we do guarantee the retention period, without a proper sizing of UNDO, we may face the situation when the transactions will fail in order
for the retention period to be enforced. Users won't be very happy about this.
The test case here is not intended to be generalistic, is made for demonstration purpose only.
The script which computes the recommended UNDO size shall be used wisely, this can be misleading without properly knowing the business side.
(Yes, in some of the tests I made, it indicated to size the UNDO to 9TB in order to cover the worst scenario).
In the end of the day, all you have to do is monitoring, monitoring, monitoring.