Search This Blog

Deleting Duplicates Records in SQL

Deleting Duplicates Records in SQL

There are Many ways to delete duplicate records from table but i will tell you the most easy and efficient way.
Condition 1
When your table have a AutoIncreament or PrimaryKey Column

Condition 2 
When table doesn't have a AutoIncreament or PrimaryKey Column

we will create a table to delete duplicates records.

For Condition 1

create table Employee_Duplicate
(Empid int identity (1,1) ,EmpName nvarchar(50),EmpSalary float)
insert into Employee_Duplicate values('Amit','5000'),('Sumit','6000'),('Raj','8000'),('vijay','9000'),('suresh','10000')

--Inserting Duplicate Records
insert into Employee_Duplicate values('Raj','8000'),('vijay','9000'),('suresh','10000')

Now we have to delete duplicates records and from Duplicates records we will keep the records with maximum EmpId i.e. record which are inserted later to make records Duplicate

--query to delete duplicate records for condition 1

delete  from Employee_Duplicate where empid not in  (select MAX(empid) from Employee_Duplicate 
group by EmpName,EmpSalary )

Explanation-The inner query will take the max value of Empid group by EmpName and Salary 
and Delete those records which are not in inner query.

For Condition 2

where you dont have any AutoIncreament or PrimaryKey Column

we will create a table to duplicate records without primary key column

create table Employee_Duplicate1
(EmpName nvarchar(50),EmpSalary float)
insert into Employee_Duplicate1  values('Amit','5000'),('Sumit','6000'),('Raj','8000'),('vijay','9000'),('suresh','10000')

--Inserting Duplicate Records
insert into Employee_Duplicate1  values('Raj','8000'),('vijay','9000'),('suresh','10000')

select * from Employee_Duplicate1 


with cte as
(select empname,empsalary,ROW_NUMBER()over(partition by empname,empsalary order by empname) as rownum from Employee_Duplicate1
)
select * from cte where rownum>1 --this will give the Duplicate Records list to test

--In order to Delete duplicate

with cte as
(select empname,empsalary,ROW_NUMBER()over(partition by empname,empsalary order by empname) as rownum from Employee_Duplicate1
)
delete from cte where rownum>1

--The above query will delete the Duplicate Records

Explanation -Here we have used Row_Number function which gives rownumber and  partition by empname and salary mean if those name and salary appears second time in the records it will give those records with row num 2
The reason we have used CTE i.e comman table expression  as we cannot apply rownumber function
directly in where clause

 run this query to check

select empname,empsalary,ROW_NUMBER()over(partition by empname,empsalary order by empname) as rownum from Employee_Duplicate1
where ROW_NUMBER()over(partition by empname,empsalary order by empname)>2

Msg 4108, Level 15, State 1, Line 3
Windowed functions can only appear in the SELECT or ORDER BY clauses.

NOTE: when you want to apply Row_Number function or other Ranking function  in where clause always use CTE.








No comments:

Post a Comment