Introduction
SQL server allows you to use columns of tables referenced on a main query in a subquery. I came across a scenario where this can have a catastrophic effect.
Background
Look at the statement below:
DELETE Projects WHERE EmployeeId IN (
SELECT EmployeeId FROM Employees WHERE EmployeeType = 1)
where, [EmployeeId]
is foreign key field in Projects
table that references the primary key field [Id]
in Employees
table.
Note: There is NO field named [EmployeeId]
in Employees
table. This query will effectively clear the whole of your Projects
table. Actually, what you intended was:
DELETE Projects WHERE EmployeeId IN (
SELECT Id FROM Employees WHERE EmployeeType = 1)
Full Description
Step 1: Create Employees table
CREATE TABLE Employees (
Id INT, FullName NVARCHAR(128), EmployeeType INT)
Step 2: Create Projects table
CREATE TABLE Projects (
Id INT, EmployeeId INT, ProjectName NVARCHAR(256))
Step 3: Insert dummy data into Employees
table
INSERT INTO Employees VALUES (1, N'Jeff Hardy', 1)
INSERT INTO Employees VALUES (2, N'John Gathogo', 2)
INSERT INTO Employees VALUES (3, N'Jane Doe', 1)
Step 4: Insert dummy data into Projects
table
INSERT INTO Projects VALUES (1, 1, N'Project Alpha')
INSERT INTO Projects VALUES (2, 1, N'Project Beta')
INSERT INTO Projects VALUES (3, 3, N'Project Gamma')
INSERT INTO Projects VALUES (4, 2, N'Project Theta')
Step 5: Now, run the following queries, to reproduce the behaviour
-
SELECT * FROM Projects WHERE EmployeeId IN (
SELECT EmployeeId FROM Employees)
-
SELECT * FROM Projects WHERE EmployeeId IN (
SELECT EmployeeId FROM Employees WHERE EmployeeType = 1)
-
DELETE Projects WHERE EmployeeId IN (
SELECT EmployeeId FROM Employees WHERE EmployeeType = 1)
Results
1 | 1 | Project Alpha |
2 | 1 | Project Beta |
3 | 3 | Project Gamma |
4 | 2 | Project Theta |
1 | 1 | Project Alpha |
2 | 1 | Project Beta |
3 | 3 | Project Gamma |
4 | 2 | Project Theta |
-
(4 row(s) affected)
There a serious issue here - I believe that while giving you the flexibility, SQL Server should protect the user here. The Employees
table has no such column named EmployeeId,
and I believe it does not raise an error since the Projects
table referenced on the main query has an EmployeeId
field. If you look at the results,
SELECT * FROM Projects WHERE EmployeeId IN (
SELECT EmployeeId FROM Employees)
ends up giving you the same results as:
SELECT * FROM Projects WHERE EmployeeId IN (
SELECT EmployeeId FROM Employees WHERE EmployeeType = 1)
The issue is even more critical for the DELETE
statement since you end up clearing the whole of your Projects table.
Workaround
Obviously, if one had qualified the column names, then the parser would have helped uncover the not-so-obvious error in the query. Like in,
DELETE Projects WHERE EmployeeId IN (
SELECT Employees.EmployeeId FROM Employees WHERE Employees.EmployeeType = 1)
BSc. Degree Holder in Mathematics and Computer Science from Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya. Currently involved in development of systems running on (but not exclusively) .NET platform. Has deep interest in the field of Artifical Intelligence, particularly Genetic Algorithms and Neural Networks.