Monday, August 16, 2010

Performance Improvement – Index scan with NULL condition

Performance Improvement – Index scan with NULL condition

What will be our first step when we start tuning the queries?


The first step is creating the index. But simply creating the index does not improve the performance in all scenarios.

The below scripts are proves that.


SQL> create table t
2 as
3 select object_name name, a.*
4 from all_objects a;

Table created.

SQL> alter table t modify name null;

Table altered.

SQL> create index t_idx on t(name);

Index created.

SQL> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

SQL> variable b1 varchar2(30)
SQL> exec :b1 := 'T'

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly explain
SQL> delete from t
2 where name = :b1 or name is null;

2 rows deleted.


Execution Plan
----------------------------------------------------------
0 DELETE STATEMENT Optimizer=CHOOSE (Cost=49 Card=2 Bytes=48)
1 0 DELETE OF 'T'
2 1 TABLE ACCESS (FULL) OF 'T' (Cost=49 Card=2 Bytes=48)




The NAME field is having the index but still it is going for the FULL table scan. the index (t_idx) is not getting used here.

The reason is because NAME is nullable, the index on only on name and entirely null keys are NOT entered into b*tree indexes.


SQL> rollback;

Rollback complete.

SQL> drop index t_idx;

Index dropped.

SQL> create index t_idx on t(name,0);

Index created.

SQL> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

SQL> delete from t
2 where name = :b1 or name is null
3 ;

2 rows deleted.


Execution Plan
----------------------------------------------------------
0 DELETE STATEMENT Optimizer=CHOOSE (Cost=26 Card=2 Bytes=48)
1 0 DELETE OF 'T'
2 1 INDEX (FULL SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=26 Card=
2 Bytes=48)




The above method is going for the index scan and avoids the table scan.

Wednesday, August 11, 2010

Output clause

Output clause:

OUTPUT clause has accesses to inserted and deleted tables just like triggers. OUTPUT clause can be used to return values to client clause. OUTPUT clause can be used with INSERT, UPDATE, or DELETE to identify the actual rows affected by these statements.


Let us understand OUTPUT clause using example.

Initial setup:


create table #t
(id int,
p_name char(10)
)

insert into #t values ( 1,'AAAAA')
insert into #t values ( 2,'BBBBB')
insert into #t values ( 3,'CCCCC')
insert into #t values ( 4,'DDDDD')
insert into #t values ( 5,'EEEEE')

select * from #t




Scneario1:

Archive the records into #t_arc table when records get deleted in #t table.

Normal solution is that, insert into #t_arc table before delete the records in the #t table based on the conditions.

create table #t_arc
(id int,
p_name char(10)
)

insert into #t_arc
select * from
where id =3

delete from #t
where id =3

The output clause avoids the execution of the two statements.

delete from #t
output deleted.* into #t_arc
where id =3



One of the very good advantages is eliminating the data access from table for each time. The DELETE … OUTPUT statement access the data once and do the operations.

Scneario2:
We see another example when we insert the data into table.

create table #t_audit
(id int,
load_time datetime
)


The below statement loads the audit table when loads the data into #t table.

insert into #t
output inserted.id,getdate() into #t_audit
values ( 6,'FFFFF')


Thursday, August 5, 2010

Performance Improvement - Part 3: Correlated Update tuning

Performance Improvement - Part 3: Correlated Update tuning:

We have seen the small difference in correlated update statement execution in the previous Blog. Please refer below link if need to check it.


http://karthikeyanbaskaran.blogspot.com/2010/06/difference-in-correlated-update.html

The scenario is, need to update the “Address” field in table”Main” from the “Sub” table using the “id” field. The -1 needs to update in all other columns. The below update method solve it in single statement.

update main mset address = nvl((select address from sub s where m.id=s.id),-1);
Correlated update statement process for all the rows in the table “MAIN”. We see how it works when we load the lot records in the table “MAIN” and performance improvement method.


SQL> insert into main
2 select * from main;
............
.......
....
SQL> insert into main
2 select * from main;

Now the table “Main” is having the 1048576 records and table “Sub” is having the 3 records.

SQL> select count(*) from main;

COUNT(*)
---------

1048576

SQL> select count(*) from sub;

COUNT(*)
----------

3
The number of distinct address data is same now and we start our analysis process.

SQL> select address,count(*) from main
2 group by address;

ADDRESS COUNT(*)
---------- ----------
NNNNNNN 1048576

The below update statement works very well for our scenario. But every time it updates for the 1048576 records.

All the time it updates the 1048576 records, even if there are less difference needs to update it in the table “Main” from “Sub”.

SQL> set timing on
SQL> set autotrace traceonly
SQL> update main m
2 set address = nvl((select address from sub s where m.id=s.id),-1);

1048576 rows updated.

Elapsed: 00:01:44.04
Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=CHOOSE
1 0 UPDATE OF 'MAIN'
2 1 TABLE ACCESS (FULL) OF 'MAIN'
3 1 TABLE ACCESS (FULL) OF 'SUB'

Statistics
----------------------------------------------------------
805 recursive calls
1073721 db block gets
2659 consistent gets
2124 physical reads
252412780 redo size
618 bytes sent via SQL*Net to client
582 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1048576 rows processed


The statement took 01:44 minutes to complete and it generated the 252412780 bytes. We roll back the results.


SQL> rollback;

Rollback complete.

Elapsed: 00:02:26.09

Performance improved script:

The WHERE condition has added to update statement and it updates the records which is difference comparing with Table “Main” and “Sub”.

SQL> update main m
2 set address = nvl((select address from sub s where m.id=s.id),-1)
3 where address is null
4 or address <> nvl((select address from sub s where m.id=s.id),-1);

1048576 rows updated.

Elapsed: 00:01:42.05

Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=CHOOSE
1 0 UPDATE OF 'MAIN'
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'MAIN'
4 2 TABLE ACCESS (FULL) OF 'SUB'
5 1 TABLE ACCESS (FULL) OF 'SUB'

Statistics
----------------------------------------------------------
536 recursive calls
1073630 db block gets
2611 consistent gets
1963 physical reads
252411468 redo size
627 bytes sent via SQL*Net to client
672 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1048576 rows processed


The statement took 01:42 minutes to complete and it generated the 252411468 bytes. There is no big difference when it updates all the records.

SQL> select address,count(*) from main
2 group by address;

ADDRESS COUNT(*)
---------- ----------
-1 524288
A 262144
C 262144

Elapsed: 00:00:01.03
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00

We change the some set of records to check the execution time.

SQL> update sub set address='M' where id =4;

1 row updated.

Elapsed: 00:00:00.02
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
SQL> set autotrace traceonly
SQL>
SQL> update main m
2 set address = nvl((select address from sub s where m.id=s.id),-1);
1048576 rows updated.

Elapsed: 00:01:51.00

Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=CHOOSE
1 0 UPDATE OF 'MAIN'
2 1 TABLE ACCESS (FULL) OF 'MAIN'
3 1 TABLE ACCESS (FULL) OF 'SUB'




Statistics
----------------------------------------------------------
624 recursive calls
1072662 db block gets
4953 consistent gets
2313 physical reads
248380252 redo size
629 bytes sent via SQL*Net to client
584 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1048576 rows processed



The statement took 01:51minutes to complete and it generated the 248380252 bytes.


SQL> set autotrace off;
SQL> select address,count(*) from main
2 group by address;

ADDRESS COUNT(*)
---------- ----------
-1 524288
A 262144
M 262144

Elapsed: 00:00:00.09

Rollback the result.

SQL> rollback;

Rollback complete.

Elapsed: 00:04:40.08
SQL> select address,count(*) from main
2 group by address;

ADDRESS COUNT(*)
---------- ----------
-1 524288
A 262144
C 262144


Performance improved script:

SQL> set autotrace traceonly

SQL> update main m
2 set address = nvl((select address from sub s where m.id=s.id),-1)
3 where address is null
4 or address <> nvl((select address from sub s where m.id=s.id),-1);

262144 rows updated.

Elapsed: 00:00:45.04

Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=CHOOSE
1 0 UPDATE OF 'MAIN'
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'MAIN'
4 2 TABLE ACCESS (FULL) OF 'SUB'
5 1 TABLE ACCESS (FULL) OF 'SUB'




Statistics
----------------------------------------------------------
176 recursive calls
268198 db block gets
2421 consistent gets
2054 physical reads
62104336 redo size
628 bytes sent via SQL*Net to client
672 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
262144 rows processed


The statement took 00:45.04 seconds to complete and it generated the 62104336 bytes.

SQL> set autotrace off;
SQL> select 248380252-62104336 redo_diff from dual;

REDO_DIFF
----------
186275916

The second update statement is avoiding the 186275916 bytes of the redo. So it is saving space in the UNDO and space in archive folders.

This method gives the good performance if it updates some part records in table.