Optimizing SQL UPDATE Query with Conditional Data Retrieval from

ghz 8months ago ⋅ 108 views

Optimizing SQL UPDATE Query with Conditional Data Retrieval from Multiple Tables

I have three table:

  1. dup_IMCS_IMCS_MASTER_API
  2. dup_IMCS_IMCS_MASTER
  3. dup_IMCS_IMCS_MASTER_3PTY

My target table is table 1 i.e dup_IMCS_IMCS_MASTER_API. The source tables are 2. dup_IMCS_IMCS_MASTER and 3. dup_IMCS_IMCS_MASTER_3PTY.

The problem statment is as follows:

  1. First, check for ALIAS_PN column in dup_IMCS_IMCS_MASTER_API. If ALIAS_PN IS NULL, then data should come from dup_IMCS_IMCS_MASTER. In dup_IMCS_IMCS_MASTER, data should be coming from DESCRIPTION column to SHORT_DESCRIPTION column in dup_IMCS_IMCS_MASTER_API.

  2. If

    ALIAS_PN IS NOT NULL
    

    , the data should come from namely two columns from

    dup_IMCS_IMCS_MASTER_3PTY
    

    .

    SHORT_DESC
    

    and

    DESC1
    

    . But the condition as to which column that the data should be coming from is below.

    • The data should be primarily coming from SHORT_DESC. But If SHORT_DESC is NULL for dup_IMCS_IMCS_MASTER_3PTY, then DESC1 column in dup_IMCS_IMCS_MASTER_3PTY should be assigned to target column SHORT_DESCRIPTION in dup_IMCS_IMCS_MASTER_API

I have come up with this below SQL query:

-----v1.0
UPDATE AYaramati.dup_IMCS_IMCS_MASTER_API A
SET SHORT_DESCRIPTION = (
  CASE
    WHEN A.ALIAS_PN IS NULL THEN (SELECT DESCRIPTION FROM AYaramati.dup_IMCS_IMCS_MASTER WHERE dup_IMCS_IMCS_MASTER.PN = A.PN)
    ELSE COALESCE((SELECT SHORT_DESC FROM AYaramati.dup_IMCS_IMCS_MASTER_3PTY WHERE dup_IMCS_IMCS_MASTER_3PTY.PN = A.PN), (SELECT DESC1 FROM AYaramati.dup_IMCS_IMCS_MASTER_3PTY WHERE dup_IMCS_IMCS_MASTER_3PTY.PN = A.PN))
  END
);

I have further optimized the query to:

----v1.1
UPDATE AYaramati.dup_IMCS_IMCS_MASTER_API A
SET SHORT_DESCRIPTION = 
  COALESCE(
    (SELECT M.DESCRIPTION FROM AYaramati.dup_IMCS_IMCS_MASTER M WHERE EXISTS (SELECT 1 FROM AYaramati.dup_IMCS_IMCS_MASTER MT WHERE MT.PN = A.PN) AND A.ALIAS_PN IS NULL),
    (SELECT COALESCE(P.SHORT_DESC, P.DESC1) FROM AYaramati.dup_IMCS_IMCS_MASTER_3PTY P WHERE EXISTS (SELECT 1 FROM AYaramati.dup_IMCS_IMCS_MASTER_3PTY PT WHERE PT.PN = A.PN) AND A.ALIAS_PN IS NOT NULL)
  );

When I try to run this, the UPDATE query is taking too much time to run.

  1. This might be because the source tables and target table both contains over half a million records each.
  2. Also, does duplicate PNs arise in the above subqueries?
  3. That might also be the reason as to why the update is taking too long. How can I optimize this query?
  4. If at all is the query valid?

Answers

Your approach to updating the SHORT_DESCRIPTION column in dup_IMCS_IMCS_MASTER_API seems valid, but it could be optimized further to improve performance, especially considering the large number of records in your tables. Here are a few suggestions:

  1. Use JOINs Instead of Subqueries: Instead of using correlated subqueries, you can use JOINs to retrieve the data from the source tables. This can often be more efficient, especially for large datasets.

  2. Ensure Indexes: Make sure that there are indexes on the columns you are using in the JOIN and WHERE clauses, such as PN in both dup_IMCS_IMCS_MASTER and dup_IMCS_IMCS_MASTER_3PTY.

  3. Optimize COALESCE: The COALESCE function is fine, but make sure you're not unnecessarily querying both SHORT_DESC and DESC1 if you only need one. If one of them is preferred over the other, you can use it directly without COALESCE.

  4. Avoid Unnecessary EXISTS Checks: The EXISTS checks in your query might not be necessary. If you're joining based on the PN column, the existence of records in the source tables can be inferred from the JOIN condition itself.

Here's an optimized version of your query using JOINs and simplifying the conditions:

UPDATE AYaramati.dup_IMCS_IMCS_MASTER_API A
SET SHORT_DESCRIPTION = 
  COALESCE(
    M.DESCRIPTION,
    COALESCE(P.SHORT_DESC, P.DESC1)
  )
FROM AYaramati.dup_IMCS_IMCS_MASTER M
LEFT JOIN AYaramati.dup_IMCS_IMCS_MASTER_3PTY P ON M.PN = P.PN
WHERE 
  (A.ALIAS_PN IS NULL AND A.PN = M.PN)
  OR
  (A.ALIAS_PN IS NOT NULL AND A.PN = P.PN);

In this version:

  • We use LEFT JOINs to join dup_IMCS_IMCS_MASTER and dup_IMCS_IMCS_MASTER_3PTY to dup_IMCS_IMCS_MASTER_API.
  • We use WHERE clauses to filter the records based on the conditions you provided.
  • We directly reference the columns from the joined tables (M.DESCRIPTION, P.SHORT_DESC, P.DESC1) without subqueries.
  • Make sure to have indexes on the PN column in both dup_IMCS_IMCS_MASTER and dup_IMCS_IMCS_MASTER_3PTY for efficient lookups.

This optimized query should perform better than the original version, especially for large datasets.