Tuesday, February 9, 2010

Comparing the Contents of Two Tables

Comparing the Contents of Two Tables:
I have two tables named M and N. They have identical columns and have the same number of rows via select count(*) from M and from N. However, the content in one of the rows is different, as shown in the following query:


SQL> select * from M where C1=1;
C1 C2 C3
-- ------------ ----
1 AAAAAAAAAAAA 100

SQL> select * from N where C1=1;

C1 C2 C3
--- ------------ --------
1 AAAAAAAAAAAB 100


The only difference is the last character in column C2. It is an A in table M and a B in table N. I would like to write SQL to compare or see if tables M and N are in sync with respect to their content rather than the number of rows, but I don't know how to do it.
OK, we'll do the specific solution to this problem with columns C1, C2, and C3, and then we'll see how to generalize this to any number of columns. The first and immediate answer we go to was this:

(select 'A', a.* from M
MINUS
select 'A', b.* from N)
UNION ALL
(select 'B', b.* from b
MINUS
select 'B', a.* from a);


That is, just take M minus N (which gives us everything in M that's not in N) and add to that (UNION ALL) the result of N minus M. In fact, that is correct, but it has a couple of drawbacks:
• The query requires four full table scans.
• If a row is duplicated in M, then MINUS will "de-dup" it silently (and do the same with N).
So, this solution would be slow and also hide information from us. There is a better way, however, that uses just two full scans and GROUP BY. Consider these values in A and B:
create table a
(
c1 int,
c2 varchar(3),
c3 varchar(3)
);


create table b
(
c1 int,
c2 varchar(3),
c3 varchar(3)
);

insert into a values(1,'x','y');
insert into a values(2,'xx','y');
insert into a values(3,'x','y');

insert into b values(1,'x','y');
insert into b values(2,'x','y');
insert into b values(3,'x','yy');

select * from a;

C1 C2 C3
---- -- --
1 x y
2 xx y
3 x y

select * from b;
C1 C2 C3
--- -- --
1 x y
2 x y
3 x yy


The first rows are the same, but the second and third rows differ. This is how we can find them:

select c1, c2, c3,
count(src1) CNT1,
count(src2) CNT2
from
( select a.*,
1 src1,
to_number(null) src2
from a
union all
select b.*,
to_number(null) src1,
2 src2
from b
)
group by c1,c2,c3
having count(src1) <> count(src2);
C1 C2 C3 CNT1 CNT2
--- -- -- ---- ----
2 x y 0 1
2 xx y 1 0
3 x y 1 0
3 x yy 0 1





Now, because COUNT() returns a count of the non-null values of —we expect that after grouping by all of the columns in the table—we would have two equal counts (because COUNT(src1) counts the number of records in table A that have those values and COUNT(src2) does the same for table BCNT1 and CNT2, that would have told us that table A has this row twice but table B has it three times (which is something the MINUS and UNION ALL operators above would not be able to do

The below having the script in SQL Server:

http://karthikeyanbaskaransqlserver.blogspot.com/2010/02/comparing-contents-of-two-tables.html

0 comments:

Post a Comment