Monday, August 16, 2010

Performance Improvement – Index scan with NULL condition

Performance Improvement – Index scan with NULL condition

What will be our first step when we start tuning the queries?


The first step is creating the index. But simply creating the index does not improve the performance in all scenarios.

The below scripts are proves that.


SQL> create table t
2 as
3 select object_name name, a.*
4 from all_objects a;

Table created.

SQL> alter table t modify name null;

Table altered.

SQL> create index t_idx on t(name);

Index created.

SQL> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

SQL> variable b1 varchar2(30)
SQL> exec :b1 := 'T'

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly explain
SQL> delete from t
2 where name = :b1 or name is null;

2 rows deleted.


Execution Plan
----------------------------------------------------------
0 DELETE STATEMENT Optimizer=CHOOSE (Cost=49 Card=2 Bytes=48)
1 0 DELETE OF 'T'
2 1 TABLE ACCESS (FULL) OF 'T' (Cost=49 Card=2 Bytes=48)




The NAME field is having the index but still it is going for the FULL table scan. the index (t_idx) is not getting used here.

The reason is because NAME is nullable, the index on only on name and entirely null keys are NOT entered into b*tree indexes.


SQL> rollback;

Rollback complete.

SQL> drop index t_idx;

Index dropped.

SQL> create index t_idx on t(name,0);

Index created.

SQL> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

SQL> delete from t
2 where name = :b1 or name is null
3 ;

2 rows deleted.


Execution Plan
----------------------------------------------------------
0 DELETE STATEMENT Optimizer=CHOOSE (Cost=26 Card=2 Bytes=48)
1 0 DELETE OF 'T'
2 1 INDEX (FULL SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=26 Card=
2 Bytes=48)




The above method is going for the index scan and avoids the table scan.

Wednesday, August 11, 2010

Output clause

Output clause:

OUTPUT clause has accesses to inserted and deleted tables just like triggers. OUTPUT clause can be used to return values to client clause. OUTPUT clause can be used with INSERT, UPDATE, or DELETE to identify the actual rows affected by these statements.


Let us understand OUTPUT clause using example.

Initial setup:


create table #t
(id int,
p_name char(10)
)

insert into #t values ( 1,'AAAAA')
insert into #t values ( 2,'BBBBB')
insert into #t values ( 3,'CCCCC')
insert into #t values ( 4,'DDDDD')
insert into #t values ( 5,'EEEEE')

select * from #t




Scneario1:

Archive the records into #t_arc table when records get deleted in #t table.

Normal solution is that, insert into #t_arc table before delete the records in the #t table based on the conditions.

create table #t_arc
(id int,
p_name char(10)
)

insert into #t_arc
select * from
where id =3

delete from #t
where id =3

The output clause avoids the execution of the two statements.

delete from #t
output deleted.* into #t_arc
where id =3



One of the very good advantages is eliminating the data access from table for each time. The DELETE … OUTPUT statement access the data once and do the operations.

Scneario2:
We see another example when we insert the data into table.

create table #t_audit
(id int,
load_time datetime
)


The below statement loads the audit table when loads the data into #t table.

insert into #t
output inserted.id,getdate() into #t_audit
values ( 6,'FFFFF')


Thursday, August 5, 2010

Performance Improvement - Part 3: Correlated Update tuning

Performance Improvement - Part 3: Correlated Update tuning:

We have seen the small difference in correlated update statement execution in the previous Blog. Please refer below link if need to check it.


http://karthikeyanbaskaran.blogspot.com/2010/06/difference-in-correlated-update.html

The scenario is, need to update the “Address” field in table”Main” from the “Sub” table using the “id” field. The -1 needs to update in all other columns. The below update method solve it in single statement.

update main mset address = nvl((select address from sub s where m.id=s.id),-1);
Correlated update statement process for all the rows in the table “MAIN”. We see how it works when we load the lot records in the table “MAIN” and performance improvement method.


SQL> insert into main
2 select * from main;
............
.......
....
SQL> insert into main
2 select * from main;

Now the table “Main” is having the 1048576 records and table “Sub” is having the 3 records.

SQL> select count(*) from main;

COUNT(*)
---------

1048576

SQL> select count(*) from sub;

COUNT(*)
----------

3
The number of distinct address data is same now and we start our analysis process.

SQL> select address,count(*) from main
2 group by address;

ADDRESS COUNT(*)
---------- ----------
NNNNNNN 1048576

The below update statement works very well for our scenario. But every time it updates for the 1048576 records.

All the time it updates the 1048576 records, even if there are less difference needs to update it in the table “Main” from “Sub”.

SQL> set timing on
SQL> set autotrace traceonly
SQL> update main m
2 set address = nvl((select address from sub s where m.id=s.id),-1);

1048576 rows updated.

Elapsed: 00:01:44.04
Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=CHOOSE
1 0 UPDATE OF 'MAIN'
2 1 TABLE ACCESS (FULL) OF 'MAIN'
3 1 TABLE ACCESS (FULL) OF 'SUB'

Statistics
----------------------------------------------------------
805 recursive calls
1073721 db block gets
2659 consistent gets
2124 physical reads
252412780 redo size
618 bytes sent via SQL*Net to client
582 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1048576 rows processed


The statement took 01:44 minutes to complete and it generated the 252412780 bytes. We roll back the results.


SQL> rollback;

Rollback complete.

Elapsed: 00:02:26.09

Performance improved script:

The WHERE condition has added to update statement and it updates the records which is difference comparing with Table “Main” and “Sub”.

SQL> update main m
2 set address = nvl((select address from sub s where m.id=s.id),-1)
3 where address is null
4 or address <> nvl((select address from sub s where m.id=s.id),-1);

1048576 rows updated.

Elapsed: 00:01:42.05

Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=CHOOSE
1 0 UPDATE OF 'MAIN'
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'MAIN'
4 2 TABLE ACCESS (FULL) OF 'SUB'
5 1 TABLE ACCESS (FULL) OF 'SUB'

Statistics
----------------------------------------------------------
536 recursive calls
1073630 db block gets
2611 consistent gets
1963 physical reads
252411468 redo size
627 bytes sent via SQL*Net to client
672 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1048576 rows processed


The statement took 01:42 minutes to complete and it generated the 252411468 bytes. There is no big difference when it updates all the records.

SQL> select address,count(*) from main
2 group by address;

ADDRESS COUNT(*)
---------- ----------
-1 524288
A 262144
C 262144

Elapsed: 00:00:01.03
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00

We change the some set of records to check the execution time.

SQL> update sub set address='M' where id =4;

1 row updated.

Elapsed: 00:00:00.02
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
SQL> set autotrace traceonly
SQL>
SQL> update main m
2 set address = nvl((select address from sub s where m.id=s.id),-1);
1048576 rows updated.

Elapsed: 00:01:51.00

Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=CHOOSE
1 0 UPDATE OF 'MAIN'
2 1 TABLE ACCESS (FULL) OF 'MAIN'
3 1 TABLE ACCESS (FULL) OF 'SUB'




Statistics
----------------------------------------------------------
624 recursive calls
1072662 db block gets
4953 consistent gets
2313 physical reads
248380252 redo size
629 bytes sent via SQL*Net to client
584 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1048576 rows processed



The statement took 01:51minutes to complete and it generated the 248380252 bytes.


SQL> set autotrace off;
SQL> select address,count(*) from main
2 group by address;

ADDRESS COUNT(*)
---------- ----------
-1 524288
A 262144
M 262144

Elapsed: 00:00:00.09

Rollback the result.

SQL> rollback;

Rollback complete.

Elapsed: 00:04:40.08
SQL> select address,count(*) from main
2 group by address;

ADDRESS COUNT(*)
---------- ----------
-1 524288
A 262144
C 262144


Performance improved script:

SQL> set autotrace traceonly

SQL> update main m
2 set address = nvl((select address from sub s where m.id=s.id),-1)
3 where address is null
4 or address <> nvl((select address from sub s where m.id=s.id),-1);

262144 rows updated.

Elapsed: 00:00:45.04

Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=CHOOSE
1 0 UPDATE OF 'MAIN'
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'MAIN'
4 2 TABLE ACCESS (FULL) OF 'SUB'
5 1 TABLE ACCESS (FULL) OF 'SUB'




Statistics
----------------------------------------------------------
176 recursive calls
268198 db block gets
2421 consistent gets
2054 physical reads
62104336 redo size
628 bytes sent via SQL*Net to client
672 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
262144 rows processed


The statement took 00:45.04 seconds to complete and it generated the 62104336 bytes.

SQL> set autotrace off;
SQL> select 248380252-62104336 redo_diff from dual;

REDO_DIFF
----------
186275916

The second update statement is avoiding the 186275916 bytes of the redo. So it is saving space in the UNDO and space in archive folders.

This method gives the good performance if it updates some part records in table.


Monday, July 19, 2010

Performance Improvement - Part 2: Hard parse is not bad

Performance Improvement - Part 2: Hard parse is not bad

In previous post, we have seen the soft parsing is doing less operation comparing with hard parsing.

Please have a look on below post before read this one.

http://karthikeyanbaskaran.blogspot.com/2010/07/performance-improvement-part-1-parsing.html

The Hard parse includes the parse, optimize, generation of the plan for the query and soft parse skips optimize and generation of the plan. So we decided that hard parse is not good for application performance.

Some times soft parsing is having performance issue. The below scripts proves that.
SQL@ORA10G> create table t
2 as
3 select case when rownum = 1
4 then 1 else 99 end id, a.*
5 from all_objects a;

Table created.

SQL@ORA10G> create index t_idx on t(id);

Index created.


SQL@ORA10G> analyze table t compute statistics;

Table analyzed.



The WHERE ID=1 condition will return one record and WHERE ID=99 will return all of the rest (about 40,000 records). Also, the optimizer is very aware of this fact, we can definitely see different plans for different inputs, as we are passing the literals in the WHERE clause.


SQL@ORA10G> set autotrace traceonly explain
SQL@ORA10G> select * from t where id = 1;

Execution Plan
----------------------------------------------------------


0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=1 Bytes=141)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=2 Card=1 Bytes=141)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (INDEX) (Cost=1 Card=1)

SQL@ORA10G> select * from t where id = 99;

Execution Plan
----------------------------------------------------------


0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=137 Card=34323 Bytes=4839543)
1 0 TABLE ACCESS (FULL) OF 'T' (TABLE) (Cost=137 Card=34323 Bytes=4839543)






From Oracle9i Database Release 1 through Oracle Database 10g Release 2, Oracle Database will wait until the cursor is opened to do the actual optimization of the query—it will wait for the bind variable value to be supplied by the application before figuring out the right way to optimize the query. This is called bind variable peeking, when the optimizer first looks at the bind values and then optimizes the query.



In this case, however, depending on which inputs are used to first run the query, the database will either choose a full scan or an index range scan plus table access by index rowid. And in Oracle9i Database Release 1 through Oracle Database 10g Release 2, that is the plan that will be used to execute the SELECT * FROM t WHERE ID = :ID query, regardless of the subsequent bind values, until the query is hard-parsed and optimized again.

SQL@ORA10G> exec :id := 99

PL/SQL procedure successfully completed.

SQL@ORA10G> select * from t where id = :id;

Execution Plan
----------------------------------------------------------

0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=137 Card=20391 Bytes=1814799)

1 0 TABLE ACCESS (FULL) OF 'T' (TABLE) (Cost=137 Card=20391 Bytes=1814799)





So we started off with ID=99 as the bind, and the optimizer chose a full scan. Therefore, regardless of the bind value, the database will execute a full scan from now on. For example:
SQL@ORA10G> exec :id :=1

PL/SQL procedure successfully completed.

SQL@ORA10G> select * from t where id = :id;

Execution Plan
----------------------------------------------------------

0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=137 Card=20391 Bytes=1814799)

1 0 TABLE ACCESS (FULL) OF 'T' (TABLE) (Cost=137 Card=20391 Bytes=1814799)




The above result demonstrates that it is not going for the index range scan or table access by index rowid. In this case, the logical I/Os (Consistent gets) are more because of the FULL table scan instead of the index scan.

The result proves the “poorly performing query”.

Tuesday, July 13, 2010

Performance Improvement - Part 1: Parsing

Performance Improvement - Part 1: Parsing

In this article, we see how to improve the performance of the SQL queries. Before that we should know about the parsing and types of that.

Parsing:

This is the first step in the processing of any statement in Oracle. Parsing is the process of breaking the submitted statement down into its component parts. Determining what type of statement it is (whether Query, DML, DDL) and performing various checks on it.
The parsing process performs two main functions:

· Syntax Check
· Semantic Analysis

Syntax Check:

The syntax check performs the below operation.

Is the statement a valid one?
Does it make sense given the SQL grammar documented in the SQL Reference Manual? Does it follow all of the rules for SQL?

Semantic Analysis:
The semantic going beyond the syntax and it is performing the below operation.

Is the statement valid in light of the objects in the database (do the tables and columns referenced exist)?
Do you have access to the objects?
Are the proper privileges in place?
Are there ambiguities in the statement?
Soft parsing and hard parsing:
The next step in the parse operation is to see if the statement we are currently parsing has already in fact been processed by some other session. We may be in luck here If it has, we can skip the next two steps in the process, that is

1 .Optimization
2. Row source generation.

If we can skip these next two steps in the process, we have done what is known as a Soft Parse.


If we cannot, if we must do all of the steps, we are performing what is known as a Hard Parse. Hard parse includes the parse, optimize, generation of the plan for the query.
Example for Hard Parse:
The below scripts demonstrate the hard parsing.

SQL> ALTER SYSTEM FLUSH SHARED_POOL;

System altered.

SQL> create table t
(
id number(2),
Des varchar2(20)
);

Table created.

insert into t values(1,'AA');
insert into t values(2,'BB');
insert into t values(3,'CC');
insert into t values(4,'DD');
insert into t values(5,'EE');
insert into t values(6,'FF');


SQL> select * from t;

ID DES
--- --------
1 AA
2 BB
3 CC
4 DD
5 EE
6 FF

6 rows selected.


SQL> select SQl_TEXT,executions,hash_value,child_latch,child_address from v$sql where sql_text like 'select * from t where id %';

no rows selected

SQL> select * from t where id = 2;

ID DES
--- --------
2 BB

SQL> select * from t where id = 2;

ID DES
--- --------
2 BB

SQL> select * from t where id = 3;

ID DES
--- --------
3 CC

SQL> select SQl_TEXT,executions,hash_value,child_latch,child_address from v$sql where sql_text like 'select * from t where id %';





The above result is showing that, it is creating the different hash value for “where id = 2” and “where id =3 “. That means it is going for the hard parse for the each time when we change the condition.

Example for Soft parse:

The Execution column is showing 2 for the “where id =2 “. First time it has gone for the hard parse and second time it is gone for the soft parse.

How to make soft parse for “where id =2 “and “where id =3 “. We use the bind variable and it solves the issue.



SQL> variable id number
SQL> exec :id := 1;

PL/SQL procedure successfully completed.

SQL> select * from t where id = :id;

ID DES
--- --------
1 AA

SQL> exec :id :=2 ;

PL/SQL procedure successfully completed.

SQL> select * from t where id = :id;

ID DES
--- ------
2 BB

SQL> exec :id :=2 ;

PL/SQL procedure successfully completed.

SQL> select * from t where id = :id;

ID DES
--- -------
2 BB


SQL> select SQl_TEXT,executions,hash_value,child_latch,child_address from v$sql where sql_text like 'select * from t where id %';


It is not creating the different plan for each time when we use the bind variable. The hash value vale has created for the script “where id = :id “ and the value of the variable (:id) is changing each time. Refer the “Execution field, It is showing 3 for the different values.
Hard parsing is very CPU intensive. If our applications we want a very high performance then our queries to be Soft Parsed (to be able to skip the optimize/generate phases)

If we have to Hard Parse a large percentage of our queries, our system will function slowly and in some cases not at all.

Now we have a strong proof for the soft parsing and that improve the performance. But soft parsing is not always good and hard parsing is not always bad. We see that in the next Blog.

Monday, July 5, 2010

Reusable scripts - Part1

Find the letter occurrences in the statement:

The below query is finding the number of occurrence letter “a” in the string.

select length('Find a letter in a string') - length(replace('Find a letter in a string', 'a','')) as "Number of occurrence" from dual ;

Number of occurrence
--------------------
2


Remove the alphabetic letters in the given string:

declare
l_start number;
len number;
i number;
p_str varchar2(1000) := 'aa123yksjds45';
p_str1 varchar2(1000);
res varchar2(1000);
begin
i:=0;
len :=length(p_str);
WHILE i <= len

loop
len := ASCII(substr(p_str,i,1));
IF len > 47 AND len < 58 then
i:= i+1;
else
p_str := replace(p_str,substr(p_str,i,1),'');
END IF;
end loop;
dbms_output.put_line(p_str);
end;

Result:

12345

Divide the given comma separated words:

declare v_comma_position number;
v_len number;
p_comma_val varchar2(1000) := 'value1,value2,value3,value4,value5'; p_comma_tmp varchar2(1000);
v_result varchar2(1000);
begin p_comma_val := p_comma_val ',';
v_len :=length(p_comma_val);
WHILE v_len >= 0
loop
v_comma_position := instr(p_comma_val,',')+1;
p_comma_tmp := '';v_result := '';
p_comma_tmp := p_comma_val;
p_comma_val:='';
v_result := substr(p_comma_tmp,1,v_comma_position-2);
p_comma_val := substr(p_comma_tmp,v_comma_position,v_len);
v_len := v_len - v_comma_position;
dbms_output.put_line( v_result);
end loop;
end;

Result:

value1
value2
value3
value4
value5


Monday, June 28, 2010

creating the composite index with more than 900 bytes



Creating the composite index with more than 900 bytes:
Sql server is having some constraints while creating the index. The index entry will not allow if the data size is more than 900 bytes. It gives the error while inserting the data.

There is a way to insert the more than 900 bytes into composite index. we see that in this article.

Intial setup:

Creating the table with two columns and total size is more than 900 bytes.

use tempdb
create table tmp
(
c1 varchar(10),
c2 varchar(1000)
)


Index creation:

The create index statement gives the warnings. Because total size is havng more than 900 bytes.

create index idx_tmp on tmp(c1,c2)

Warning! The maximum key length is 900 bytes. The index 'idx_tmp' has maximum length of 1010 bytes. For some combination of large values, the insert/update operation will fail.

Test data Length:
select len('

test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
')





Insert statement with more than 900 bytes:

The toltal size of the insert is 1010 bytes here. It is not allowing to insert and giving the error.


insert into tmp values( 'hai',
'test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data'
)

INCLUDE clause:

There is another way to avoid this error and create a index on both the columns.

drop index idx_tmp on tmp

By using the INCLUDE clause of the CREATE INDEX statement, the index key could be defined as (c1) and c2 defined as a nonkey column. In this way, the index key size would be 10 bytes.


CREATE INDEX idx_tmp2
ON tmp (c1)
INCLUDE (c2);

Insert statement with more than 900 bytes:

The toltal size of the insert is 1010 bytes here. This time it is allowing to insert

insert into tmp values( 'hai',
'test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data
test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data test_data'
)



The index has included both the columns. The below query proves that.

select b.name index_name,a.name column_name from sys.columns a
inner join sys.indexes b
on a.object_id=b.object_id
where b.name ='idx_tmp2'





Addition to that we verify the explain plan. It goes for the index scan.
select * from tmp where c1='10'


select * from tmp where c2='10'