Tuesday, February 23, 2010

Array Size Effects


The array size is the number of rows fetched (or sent, in the case of inserts, updates, and deletes) by the server at a time. It can have a dramatic effect on performance.
SQL> drop table t;

Table dropped.

SQL> create table t as select * from all_objects;

Table created.

SQL> select count(*) from t;
COUNT (*)

----------
29120
SQL> set autotrace traceonly statistics;
SQL> set arraysize 2
SQL> select * from t;

29120 rows selected.

Statistics
----------------------------------------------------------

14779 consistent gets

Note how one half of 29120 (rows fetched) is very close to 14779 , the number of consistent gets. Every row we fetched from the server actually caused it to send two rows back. So, for every two rows of data, we needed to do a logical I/O to get the data. Oracle got a block, took two rows from it, and sent it to SQL*Plus. Then SQL*Plus asked for the next two rows, and Oracle got thatblock again or got the next block, if we had already fetched the data, and returned the next two rows, and so on.
Next, let’s increase the array size:
SQL> set arraysize 5
SQL> select * from t;

29120 rows selected.

Statistics
----------------------

6152 consistent gets

Now, 29120 divided by 5 is about 5824, and that would be the least amount of consistent gets we would be able to achieve (the actual observed number of consistent gets is slightly higher).

All that means is sometimes in order to get two rows, we needed to get two blocks: we got the last row from one block and the first row from the next block.
Let’s increase the array size again:

SQL> set arraysize 10
SQL> select * from t;

29120 rows selected.

Statistics
----------------------

3271 consistent gets

SQL> set arraysize 25
SQL> select * from t;

29120 rows selected.

Statistics
----------------------

1551 consistent gets

SQL> set arraysize 100
SQL> select * from t;

29120 rows selected.


Statistics
----------------------

688 consistent gets

SQL> set arraysize 500
SQL> select * from t;

29120 rows selected.


Statistics
------------------

460 consistent gets
............
As you can see, as the array size goes up, the number of consistent gets goes down. So, does that mean you should set your array size to 5,000, as in this last test? Absolutely not. If you notice, the overall number of consistent gets has not dropped dramatically between array sizes of 100 and 5,000.

It would be better to have more of a stream of information flowing: Ask for 100 rows, get 100 rows, ask for 100, process 100, and so on. That way, both the client and server are more or less continuously processing data, rather than the processing occurring in small bursts.

Monday, February 15, 2010

Can we create table name and column name with case sensitive in oracle?

The table name and Column names CAN BE case sensitive in oracle, but are not by default. Bydefault, any way you create a table, upper case, lower case, mixed case, doesn't matter; everything will be forced to UPPER CASE. If you want it not to be UPPER CASE, you need to create the column with quotes around the name of the table or column and the case for the column that you specify will be kept.

The below script explains that very well.

The first one we see the table name with the case sensitive.

SQL> create table "Test" ( num number);

Table created.

We select the records from the table test with out any quotes. It is telling us that the table is not available.
SQL> select * from test;
select * from test
*
ERROR at line 1:
ORA-00942: table or view does not exist


SQL> select * from "Test";
no rows selected

The above query with the quotes is showing result. It proves that we can create a table with case sensitive.

Now we see the column name with the case sensitive.

SQL> create table t
2 as
3 select decode(mod(rownum,2),0,'M','F') as "gender",
4 all_objects.*
5 from all_objects
6 /

Table created.


SQL> create index t_idx on t (Gender,object_id);
create index t_idx on t (Gender,object_id)
*
ERROR at line 1:
ORA-00904: "GENDER": invalid identifier


SQL> desc t;
Name Null? Type
-------------- -------- ----------------------------
gender VARCHAR2(1)
OWNER NOT NULL VARCHAR2(30)
OBJECT_NAME NOT NULL VARCHAR2(30)

...........
........
....

Index skip scan


We are using the B-Tree index and our predicate does not use the leading edge of index, in this case we might have a table T with an index on T(x,y). we query SELECT * from T WHERE x= 8.

The optimizer will allow to use the index since our predicate did not involve the column X. the optimizer would notice that it did not have to go to the table to get either X or Y. they are in the index. So it may very well option for the fast full index scan.

See below link for more information:
http://karthikeyanbaskaran.blogspot.com/2010/02/index-is-not-getting-used.html

The is another case whereby the index T(x,y) could be used by the CBO is during an index skip scan. The skip scan works well if and only if the leading edge of the index has very few distinct values and the optimizer understand that. Example, consider an idex on (GENDER, EMPNO) where GENDER has the values M and F.


SQL> create table t
2 as
3 select decode(mod(rownum,2),0,'M','F') as gender,
4 all_objects.*
5 from all_objects;

Table created.

SQL> create index t_idx on t(gender,object_id);

Index created.

SQL> begin
dbms_stats.gather_table_stats('SYSTEM','T');
end;
/

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly explain
SQL> select * from t where object_id = 41;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=94)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=4 Card=1 Bytes= 94)
2 1 INDEX (SKIP SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=3 Card=1 )

The INDEX SKIP scan step tells us that Oracle is going to skip throughout the index, looking for the points where GENDER changes values and read down the tree from there.

Now we increase the count of the distinct value.

SQL> set autotrace off;
SQL> update t set gender = chr(mod(rownum,256));

29119 rows updated.

SQL> begin
2 dbms_stats.gather_table_stats('SYSTEM','T');
3 end;
4 /

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly explain
SQL> select * from t where object_id = 41;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=41 Card=1 Bytes=94)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=41 Card=1 Bytes=94)

Thursday, February 11, 2010

Index is not getting used


We are using the B-Tree index and our predicate does not use the leading edge of index, in this case we might have a table T with an index on T(x,y). we query SELECT * from T WHERE x= 8 and see how the optimizer executes.

We see this in the below example.

create table test (x number, y number);
insert into test values ( 4,5);
insert into test values (1,3);
insert into test values (7,5);
insert into test values (3,6);

SQL> select * from test;

X Y
----- ----

4 5 1 3
7 5
3 6

SQL> create index test_idx on test(x,y);

Index created.

SQL> begin
2 dbms_stats.gather_table_stats('SYSTEM','TEST');
3 end;
4 /

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly explain
SQL> select * from test where y=5;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=6)
1 0 TABLE ACCESS (FULL) OF 'TEST' (Cost=2 Card=1 Bytes=6)

The above query is going for the FULL table access since we have an index but the Y is not a leading edge of the index.


SQL> select * from test where x=3;

Execution Plan
----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=6)
1 0 TABLE ACCESS (FULL) OF 'TEST' (Cost=2 Card=1 Bytes=6)


Now we try with both the conditions. The below example is having both X and Y in the where clause.

SQL> select * from test where x=3 and y=5;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=6)
1 0 TABLE ACCESS (FULL) OF 'TEST' (Cost=2 Card=1 Bytes=6)


Still it is not going for the index scan. The reason is that, the table is having the small number of records (4 records). The optimizer checks the best plan and goes for the FULL table access. This is proving us the FULL table access is not always bad.

“FULL table access is not always bad and indexes are not always good.”


We increase the records in the table.
SQL> set autotrace off;
SQL> insert into test select * from test;
8 rows created.

SQL> /
16 rows created.
....
...

..
.

SQL> /
16384 rows created.

SQL> commit;
Commit complete.


SQL> begin
2 dbms_stats.gather_table_stats('SYSTEM','TEST');
3end;
/
PL/SQL procedure successfully completed.

SQL> set autotrace traceonly explain
SQL> select * from test where x=3;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=8192 Bytes=49152)

1 0 INDEX (FAST FULL SCAN) OF 'TEST_IDX' (NON-UNIQUE) (Cost=4 Card=8192 Bytes=49152)

The optimizer will allow to use the index since our predicate did not involve the column X. the optimizer would notice that it did not have to go to the table to get either X or Y. they are in the index. So it may very well option for the fast full index scan. The index is smaller typically smaller than the underlying table. (This access path is always available with CBO only).
SQL> select * from test where y=6;

Execution Plan
----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=10923 Bytes=65538)

1 0 INDEX (FAST FULL SCAN) OF 'TEST_IDX' (NON-UNIQUE) (Cost=4 Card=10923 Bytes=65538)


Now we try with both the conditions. The below example is having both X and Y in the where clause.
SQL> select * from test where x=3 and y=6;

Execution Plan
----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=2731 Bytes=16386)
1 0 INDEX (RANGE SCAN) OF 'TEST_IDX' (NON-UNIQUE) (Cost=3 Card=2731 Bytes=16386)

The explain plans shows that, it is going for the index scan.

Tuesday, February 9, 2010

Comparing the Contents of Two Tables

Comparing the Contents of Two Tables:
I have two tables named M and N. They have identical columns and have the same number of rows via select count(*) from M and from N. However, the content in one of the rows is different, as shown in the following query:


SQL> select * from M where C1=1;
C1 C2 C3
-- ------------ ----
1 AAAAAAAAAAAA 100

SQL> select * from N where C1=1;

C1 C2 C3
--- ------------ --------
1 AAAAAAAAAAAB 100


The only difference is the last character in column C2. It is an A in table M and a B in table N. I would like to write SQL to compare or see if tables M and N are in sync with respect to their content rather than the number of rows, but I don't know how to do it.
OK, we'll do the specific solution to this problem with columns C1, C2, and C3, and then we'll see how to generalize this to any number of columns. The first and immediate answer we go to was this:

(select 'A', a.* from M
MINUS
select 'A', b.* from N)
UNION ALL
(select 'B', b.* from b
MINUS
select 'B', a.* from a);


That is, just take M minus N (which gives us everything in M that's not in N) and add to that (UNION ALL) the result of N minus M. In fact, that is correct, but it has a couple of drawbacks:
• The query requires four full table scans.
• If a row is duplicated in M, then MINUS will "de-dup" it silently (and do the same with N).
So, this solution would be slow and also hide information from us. There is a better way, however, that uses just two full scans and GROUP BY. Consider these values in A and B:
create table a
(
c1 int,
c2 varchar(3),
c3 varchar(3)
);


create table b
(
c1 int,
c2 varchar(3),
c3 varchar(3)
);

insert into a values(1,'x','y');
insert into a values(2,'xx','y');
insert into a values(3,'x','y');

insert into b values(1,'x','y');
insert into b values(2,'x','y');
insert into b values(3,'x','yy');

select * from a;

C1 C2 C3
---- -- --
1 x y
2 xx y
3 x y

select * from b;
C1 C2 C3
--- -- --
1 x y
2 x y
3 x yy


The first rows are the same, but the second and third rows differ. This is how we can find them:

select c1, c2, c3,
count(src1) CNT1,
count(src2) CNT2
from
( select a.*,
1 src1,
to_number(null) src2
from a
union all
select b.*,
to_number(null) src1,
2 src2
from b
)
group by c1,c2,c3
having count(src1) <> count(src2);
C1 C2 C3 CNT1 CNT2
--- -- -- ---- ----
2 x y 0 1
2 xx y 1 0
3 x y 1 0
3 x yy 0 1





Now, because COUNT() returns a count of the non-null values of —we expect that after grouping by all of the columns in the table—we would have two equal counts (because COUNT(src1) counts the number of records in table A that have those values and COUNT(src2) does the same for table BCNT1 and CNT2, that would have told us that table A has this row twice but table B has it three times (which is something the MINUS and UNION ALL operators above would not be able to do

The below having the script in SQL Server:

http://karthikeyanbaskaransqlserver.blogspot.com/2010/02/comparing-contents-of-two-tables.html

Wednesday, February 3, 2010

Index scan gives the sorted record ?


Index scan gives the sorted record.
SQL> column plan_plus_exp format a100
SQL> set trimspool on
SQL> create table t nologging as select * from dba_objects;

Table created.

SQL> set time on
SQL> set timing on
SQL> create index t_idx on t(owner,object_id) nologging compute statistics;

Index created.

Elapsed: 00:00:00.03
SQL> select count(*) from t;
COUNT(*)
----------
29515
Elapsed: 00:00:00.00

SQL Query 1:
SQL> set autotrace on
18:07:31 SQL> select rownum' 'owner' 'object_id from t where owner='SYS' and rownum < 11;

optimizer="CHOOSE" cost="5" card="10" bytes="18581)" cost="5" card="1093" bytes="18581)"


SQL Query 2:
SQL> select * from (select rownum' 'owner' 'object_id from t
2 where owner='SYS'
3 order by owner,object_id ) where rownum < 11;


optimizer="CHOOSE" cost="5" card="10" bytes="63394)" cost="5" card="1093" bytes="63394)" cost="5" card="1093" bytes="18581)"
The first SQL and second SQL are also giving the same result with same statistics. First SQL may not work as expected in future if I drop index obviously it will not.
But RULES OF THE GAME:
Rule #1: if you want, expect, or need sorted data, there is exactly one way to achieve that in a relational database.

"You must use order by"

Rule #2: please re-read #1 until you believe it.

The first query says "give me a random set of 10 rows for the SYS owner"

The second says "give me 10 rows for the sys owner starting at the smallest object_id in sorted order"