Identify duplicate rows in a table and delete them

Identify duplicate rows in a table and delete them

I create a table and insert some rows into the table t11.

SQL> create table t11 (name varchar2(20),id number(10) unique,salary decimal(10),address varchar2(10));

Table created.

SQL> BEGIN
  2  FOR I IN 20000 .. 300000 LOOP
  3  INSERT INTO T11 VALUES('XYZ',I,I,'ABC');
  4  END LOOP;
  5  COMMIT;
  6  END;
  7  /

PL/SQL procedure successfully completed.

SQL> BEGIN
  2  FOR I IN 20000 .. 300000 LOOP
  3  INSERT /* +append */ T11 VALUES('XYZ',I,I,'ABC');
  4  END LOOP;
  5   COMMIT;
  6  END;
  7  /

BEGIN
*
ERROR at line 1:
ORA-00001: unique constraint (SYS.SYS_C009998) violated
ORA-06512: at line 3

When i create id column with unique constraint,the data dictionary is not allowing to insert duplicates to the table.

So we can either drop or disable the constraint on the id column

SQL> alter table t11 drop constraint SYS_C009998;

Table altered.

SQL> BEGIN
  2  FOR I IN 20000 .. 300000 LOOP
  3  INSERT /* +append */ into T11 VALUES('XYZ',I,I,'ABC');
  4  END LOOP;
  5  COMMIT;
  6  END;
  7  /

PL/SQL procedure successfully completed.

I disable the constraint and somehow inserted the rows by mistake without knowing what metadata is stored inside the table.

Now the end user is complaining about the duplicates values while checking the application.

First ,we have to identify the duplicate values in the table.

For that ,we need to use the unique column on the table to identify duplicates.Here,as we know that id is the unique column,we use id column to identify the table which have > 1 rows and limited the row to 5.

SQL> select id,count(id) from t11
  2  group by id
  3  having count(id) > 1
  4  fetch next 5 rows only;

        ID  COUNT(ID)
---------- ----------
     20001          2
     20003          2
     20009          2
     20010          2
     20012          2

SQL> select count(*) FROM t11
WHERE rowid not in
(SELECT MAX(rowid)
FROM t11
GROUP BY id);

  COUNT(*)
----------
    280001

I identified a total of 280001 duplicate counts from 560002 rows

Total row count in the table:

SQL> select count(*) from t11;

  COUNT(*)
----------
    560002

Now we got the duplicate rows.we can remove these rows by an efficient ctas shift method.

I create a temporary table to store the original data.

SQL> create table t11_temp as
  2  select * from t11
WHERE rowid in
(SELECT MAX(rowid)
FROM t11
GROUP BY id);  3    4    5    6

Table created.

SQL> select count(*) from t11_temp;

  COUNT(*)
----------
    280001

Drop the original table

SQL> drop table t11;

Table dropped.

Create the original table with old data from temporary buffer table

SQL> create table t11 as select * from t11_temp;

Table created.

Check the row count on the new table

SQL> select count(*) from t11_temp;

  COUNT(*)
----------
    280001

Drop the temporary table

SQL> drop table t11_temp;

Table dropped.

We can also use delete statement to remove the duplicate rows but the method is slow and takes much longer to complete.

At the end ,we can create the constraint on the id column.

SQL> alter table t11 add constraint SYS_C009998 unique(id);

Table altered.

To ensure the uniqueness of the table,cross verify the table.

SQL> select count(*) FROM t11
WHERE rowid not in
(SELECT MAX(rowid)
FROM t11
GROUP BY id);
no rows selected

Leave a Reply

%d bloggers like this: