Optimizing SQL UPDATE Query with Conditional Data Retrieval from Multiple Tables
I have three table:
dup_IMCS_IMCS_MASTER_API
dup_IMCS_IMCS_MASTER
dup_IMCS_IMCS_MASTER_3PTY
My target table is table 1 i.e dup_IMCS_IMCS_MASTER_API
.
The source tables are 2. dup_IMCS_IMCS_MASTER
and 3. dup_IMCS_IMCS_MASTER_3PTY
.
The problem statment is as follows:
-
First, check for
ALIAS_PN
column indup_IMCS_IMCS_MASTER_API
. IfALIAS_PN IS NULL
, then data should come fromdup_IMCS_IMCS_MASTER
. Indup_IMCS_IMCS_MASTER
, data should be coming fromDESCRIPTION
column toSHORT_DESCRIPTION
column indup_IMCS_IMCS_MASTER_API
. -
If
ALIAS_PN IS NOT NULL
, the data should come from namely two columns from
dup_IMCS_IMCS_MASTER_3PTY
.
SHORT_DESC
and
DESC1
. But the condition as to which column that the data should be coming from is below.
- The data should be primarily coming from
SHORT_DESC
. But IfSHORT_DESC is NULL
fordup_IMCS_IMCS_MASTER_3PTY
, thenDESC1
column indup_IMCS_IMCS_MASTER_3PTY
should be assigned to target columnSHORT_DESCRIPTION
indup_IMCS_IMCS_MASTER_API
- The data should be primarily coming from
I have come up with this below SQL query:
-----v1.0
UPDATE AYaramati.dup_IMCS_IMCS_MASTER_API A
SET SHORT_DESCRIPTION = (
CASE
WHEN A.ALIAS_PN IS NULL THEN (SELECT DESCRIPTION FROM AYaramati.dup_IMCS_IMCS_MASTER WHERE dup_IMCS_IMCS_MASTER.PN = A.PN)
ELSE COALESCE((SELECT SHORT_DESC FROM AYaramati.dup_IMCS_IMCS_MASTER_3PTY WHERE dup_IMCS_IMCS_MASTER_3PTY.PN = A.PN), (SELECT DESC1 FROM AYaramati.dup_IMCS_IMCS_MASTER_3PTY WHERE dup_IMCS_IMCS_MASTER_3PTY.PN = A.PN))
END
);
I have further optimized the query to:
----v1.1
UPDATE AYaramati.dup_IMCS_IMCS_MASTER_API A
SET SHORT_DESCRIPTION =
COALESCE(
(SELECT M.DESCRIPTION FROM AYaramati.dup_IMCS_IMCS_MASTER M WHERE EXISTS (SELECT 1 FROM AYaramati.dup_IMCS_IMCS_MASTER MT WHERE MT.PN = A.PN) AND A.ALIAS_PN IS NULL),
(SELECT COALESCE(P.SHORT_DESC, P.DESC1) FROM AYaramati.dup_IMCS_IMCS_MASTER_3PTY P WHERE EXISTS (SELECT 1 FROM AYaramati.dup_IMCS_IMCS_MASTER_3PTY PT WHERE PT.PN = A.PN) AND A.ALIAS_PN IS NOT NULL)
);
When I try to run this, the UPDATE query is taking too much time to run.
- This might be because the source tables and target table both contains over half a million records each.
- Also, does duplicate PNs arise in the above subqueries?
- That might also be the reason as to why the update is taking too long. How can I optimize this query?
- If at all is the query valid?
Answers
Your approach to updating the SHORT_DESCRIPTION
column in dup_IMCS_IMCS_MASTER_API
seems valid, but it could be optimized further to improve performance, especially considering the large number of records in your tables. Here are a few suggestions:
-
Use JOINs Instead of Subqueries: Instead of using correlated subqueries, you can use JOINs to retrieve the data from the source tables. This can often be more efficient, especially for large datasets.
-
Ensure Indexes: Make sure that there are indexes on the columns you are using in the JOIN and WHERE clauses, such as
PN
in bothdup_IMCS_IMCS_MASTER
anddup_IMCS_IMCS_MASTER_3PTY
. -
Optimize COALESCE: The COALESCE function is fine, but make sure you're not unnecessarily querying both
SHORT_DESC
andDESC1
if you only need one. If one of them is preferred over the other, you can use it directly without COALESCE. -
Avoid Unnecessary EXISTS Checks: The EXISTS checks in your query might not be necessary. If you're joining based on the
PN
column, the existence of records in the source tables can be inferred from the JOIN condition itself.
Here's an optimized version of your query using JOINs and simplifying the conditions:
UPDATE AYaramati.dup_IMCS_IMCS_MASTER_API A
SET SHORT_DESCRIPTION =
COALESCE(
M.DESCRIPTION,
COALESCE(P.SHORT_DESC, P.DESC1)
)
FROM AYaramati.dup_IMCS_IMCS_MASTER M
LEFT JOIN AYaramati.dup_IMCS_IMCS_MASTER_3PTY P ON M.PN = P.PN
WHERE
(A.ALIAS_PN IS NULL AND A.PN = M.PN)
OR
(A.ALIAS_PN IS NOT NULL AND A.PN = P.PN);
In this version:
- We use LEFT JOINs to join
dup_IMCS_IMCS_MASTER
anddup_IMCS_IMCS_MASTER_3PTY
todup_IMCS_IMCS_MASTER_API
. - We use WHERE clauses to filter the records based on the conditions you provided.
- We directly reference the columns from the joined tables (
M.DESCRIPTION
,P.SHORT_DESC
,P.DESC1
) without subqueries. - Make sure to have indexes on the
PN
column in bothdup_IMCS_IMCS_MASTER
anddup_IMCS_IMCS_MASTER_3PTY
for efficient lookups.
This optimized query should perform better than the original version, especially for large datasets.