How to compare two tables column by column in Oracle Database?
Recently in my project, we had a requirement to compare two identical CSV files from a different source. Since the size of these files is more than 50MB we were not able to compare using the comparison tool that we are using. So we decided to compare both these files using SQL.
Let's take two CSV files containing the employee's information from two different upstream sources. I want to compare both these files to check data quality.
The first step is to create two temporary tables EMP1 and EMP2 and load the data into these tables.
Here, EMPNO is a primary key and I want to compare both the tables based on this column and want to display all those rows which are not exactly the same.
CREATE TABLE EMP1(
EMPNO NUMBER(4,0) PRIMARY KEY,
ENAME VARCHAR2(10),
JOB VARCHAR2(9),
MGR NUMBER(4,0),
HIREDATE DATE,
SAL NUMBER(7,2),
COMM NUMBER(7,2),
DEPTNO NUMBER(2,0));
CREATE TABLE EMP2(
EMPNO NUMBER(4,0) PRIMARY KEY,
ENAME VARCHAR2(10),
JOB VARCHAR2(9),
MGR NUMBER(4,0),
HIREDATE DATE,
SAL NUMBER(7,2),
COMM NUMBER(7,2),
DEPTNO NUMBER(2,0));
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7369,null,'CLERK',7902,to_date('17-12-80','DD-MM-RR'),800,null,20);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7499,'ALLEN','SALESMAN',7698,to_date('20-02-81','DD-MM-RR'),1600,300,30);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7521,'WARD','SALESMAN',7698,to_date('22-02-81','DD-MM-RR'),1250,500,30);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7566,'ABC','SALESMAN',7839,to_date('02-04-81','DD-MM-RR'),2975,null,20);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7654,'MARTIN','SALESMAN',7698,to_date('28-09-81','DD-MM-RR'),1250,1400,30);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7698,'BLAKE','MANAGER',7839,to_date('01-05-81','DD-MM-RR'),2850,null,30);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7782,'CLARK','MANAGER',7839,to_date('09-06-81','DD-MM-RR'),2450,null,10);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7788,'SCOTT','ANALYST',7566,to_date('09-12-82','DD-MM-RR'),3000,null,20);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7839,'KING','PRESIDENT',null,to_date('17-11-81','DD-MM-RR'),10000,null,10);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7876,'ADAMS','CLERK',7788,to_date('12-01-83','DD-MM-RR'),1100,null,20);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7900,'JAMES','CLERK',7902,to_date('03-12-81','DD-MM-RR'),950,null,30);
Insert into EMP1 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7934,'MILLER','CLERK',7782,to_date('23-01-82','DD-MM-RR'),1300,null,10);
Insert into EMP2 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7369,'SMITH','CLERK',7902,to_date('17-12-80','DD-MM-RR'),800,null,20);
Insert into EMP2 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7499,'ALLEN','SALESMAN',7698,to_date('20-02-81','DD-MM-RR'),1600,300,30);
Insert into EMP2 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7521,'WARD','SALESMAN',7698,to_date('22-02-81','DD-MM-RR'),1250,500,30);
Insert into EMP2 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7566,'JONES','MANAGER',7839,to_date('02-04-81','DD-MM-RR'),2975,null,20);
Insert into EMP2 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7654,'SMITH','SALESMAN',7698,to_date('28-09-81','DD-MM-RR'),1250,1400,30);
Insert into EMP2 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7698,'BLAKE','MANAGER',7839,to_date('01-05-81','DD-MM-RR'),2850,null,30);
Insert into EMP2 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7782,'CLARK','MANAGER',7839,to_date('09-06-81','DD-MM-RR'),2450,null,10);
Insert into EMP2 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7788,'SCOTT','ANALYST',7566,to_date('09-12-82','DD-MM-RR'),3000,null,20);
Insert into EMP2 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7839,'KING','PRESIDENT',null,to_date('17-11-81','DD-MM-RR'),5000,null,10);
Insert into EMP2 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7844,'TURNER','SALESMAN',7698,to_date('08-09-81','DD-MM-RR'),1500,0,30);
Insert into EMP2 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7876,'ADAMS','ANALYST',7788,to_date('12-01-83','DD-MM-RR'),1100,null,20);
Insert into EMP2 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7900,'JAMES','CLERK',7698,to_date('03-12-81','DD-MM-RR'),950,null,30);
Insert into EMP2 (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) values (7902,'FORD','ANALYST',7566,to_date('03-12-81','DD-MM-RR'),3000,null,20);
Using the below query we will get all the rows that are not exactly the same in both the tables.
SELECT CASE
WHEN E1.EMPNO IS NULL THEN 'Record Missing in EMP1'
WHEN E2.EMPNO IS NULL THEN 'Record Missing in EMP2'
ELSE 'Column value mismatch'
END Comments,
DECODE(E1.ENAME,E2.ENAME,'','ENAME') ENAME_FLAG,
DECODE(E1.JOB,E2.JOB,'','JOB') JOB_FLAG,
DECODE(E1.MGR,E2.MGR,'','MGR') MGR_FLAG,
DECODE(E1.HIREDATE,E2.HIREDATE,'','HIREDATE') HIREDATE_FLAG,
DECODE(E1.SAL,E2.SAL,'','SAL') SAL_FLAG,
DECODE(E1.COMM,E2.COMM,'','COMM') COMM_FLAG,
DECODE(E1.DEPTNO,E2.DEPTNO,'','DEPTNO') DEPTNO_FLAG,
COALESCE (E1.EMPNO,E2.EMPNO) EMPNO_PK,
E1.ENAME EMP1_ENAME, E2.ENAME EMP2_ENAME,
E1.JOB EMP1_JOB,E2.JOB EMP2_JOB,
E1.MGR EMP1_MGR, E2.MGR EMP2_MGR,
E1.HIREDATE EMP1_HIREDATE, E2.HIREDATE EMP2_HIREDATE,
E1.SAL EMP1_SAL, E2.SAL EMP2_SAL,
E1.COMM EMP1_COMM,E2.COMM EMP2_COMM,
E1.DEPTNO EMP1_DEPTNO,E2.DEPTNO EMP2_DEPTNO
FROM EMP1 E1
FULL OUTER JOIN EMP2 E2
ON (E1.EMPNO = E2.EMPNO)
WHERE (DECODE(E1.ENAME,E2.ENAME,'','ENAME') IS NOT NULL
OR DECODE(E1.JOB,E2.JOB,'','JOB') IS NOT NULL
OR DECODE(E1.MGR,E2.MGR,'','MGR') IS NOT NULL
OR DECODE(E1.HIREDATE,E2.HIREDATE,'','HIREDATE') IS NOT NULL
OR DECODE(E1.SAL,E2.SAL,'','SAL') IS NOT NULL
OR DECODE(E1.COMM,E2.COMM,'','COMM') IS NOT NULL
OR DECODE(E1.DEPTNO,E2.DEPTNO,'','DEPTNO') IS NOT NULL )
ORDER BY EMPNO_PK;
In this query,
- Comments column will display the message to know whether it is an extra record in one of the tables or a column-level mismatch.
- Column postfixed with 'PK' indicates that it is a primary key column
- Column postfixed with 'FLAG' indicates the value mismatch for that column for a particular record.
Refer to this original post on asktom for more details.
Comments
Post a Comment