SQL Comparison of Two Identical Tables for Differences

ghz 8months ago ⋅ 66 views

I have a site with over 60,000 products, and I get weekly updates from the manufacturer that I need to process so I can update my site. I have php code to load the various related tables.

I have two identical tables - one has my Old(Saved) weekly product updates from last week's app run, the other has the New weekly product updates. Both of these tables have 71 columns, so they're somewhat large. There is a column in both tables that holds the status - whether this is a new product, an existing product that has some changes (description, price, etc), or an existing product that has remained the same since last week.

I am working to find a SQL-only way to find the rows in my New table that:

  • are new and not in the Old table, and update the status column for those rows to contain "New"
  • are in both tables but the New table has updated/changed data compare to the Old table, and update the status column for those rows to "Changed"
  • are in both tables and are the same, and update the status column for those rows to "Same"

The objective is that ALL rows in the New table will have one of those three values in the status column after I run this elusive SQL, so I can load them into my site.

I already have the SQL to determine which products are in the Old table (last week's products run) that are NOT in this week's New table, and therefore need to be removed (no longer a current product) - so that last logical outcome is handled.

I currently have php code that goes and compares every product in the New table against the same product in the Old table to do this - there are 29 columns that I'm comparing for every row, so it takes several hours to complete, and is therefore a problem for our process.

Is there a SQL way to compare these two tables and update the status column? Even if that means running 3 separate SQL queries for each status result? I suspect there is and it would save hours, but can't work my way through this issue myself.

Update: Based on @Charlieface 's response below, I used the query suggested and changed it for my specific tables/columns/result info, and got the following:

UPDATE NewReport00 nr
LEFT JOIN LastReport00 lr ON lr.item_no = nr.item_no
SET nr.is_new_latest_run =
    CASE WHEN lr.item_no IS NULL THEN 'New'
         WHEN EXISTS (
             SELECT nr.is_family, nr.family_status
             EXCEPT
             SELECT lr.is_family, lr.family_status
         ) THEN 'Update'
         ELSE 'Same'
    END;

But this got me an error:

[2024-04-03 16:42:49] [42000][1064] You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELECT lr.is_family, lr.family_status
[2024-04-03 16:42:49] ) THEN 'Update'
[2024-04-03 16:42:49] ELSE 'Sa' at line 8
[2024-04-03 16:42:49] [42000][1064] You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELECT lr.is_family, lr.family_status
[2024-04-03 16:42:49] ) THEN 'Update'
[2024-04-03 16:42:49] ELSE 'Sa' at line 8

Line 8 is: SELECT lr.is_family, lr.family_status

Not quite sure where to go from here?? @Charlieface, do you have any guidance?

After some research, it turns out that MySQL does not support EXCEPT. Left Join with Distinct is suggested, but I have yet to logic my way through this..... As I'm using MariaDB in MySQL on a Mac M2 Silicon under MAMP, I'll have to work my way through this.

Thanks!

Answers

It seems like you're on the right track with your SQL query, but as you discovered, MySQL doesn't support the EXCEPT operator. However, you can achieve similar functionality using a LEFT JOIN with a DISTINCT clause.

Here's how you can modify your query to work in MySQL:

UPDATE NewReport00 nr
LEFT JOIN LastReport00 lr ON lr.item_no = nr.item_no
SET nr.is_new_latest_run =
    CASE 
        WHEN lr.item_no IS NULL THEN 'New'
        WHEN NOT EXISTS (
            SELECT DISTINCT nr.is_family, nr.family_status
            FROM NewReport00 nr
            WHERE nr.item_no = lr.item_no
            AND nr.is_family = lr.is_family
            AND nr.family_status = lr.family_status
        ) THEN 'Update'
        ELSE 'Same'
    END;

In this query:

  • We use a LEFT JOIN to match rows between the NewReport00 and LastReport00 tables based on the item_no column.
  • The CASE statement checks the conditions:
    • If lr.item_no is NULL, it means the item is new and not in the old report, so we set the status to 'New'.
    • If there are no matching rows in NewReport00 with different values for the specified columns (is_family, family_status), it means the item has been updated, so we set the status to 'Update'.
    • Otherwise, the item is the same as in the old report, so we set the status to 'Same'.
  • We use a DISTINCT clause in the subquery to ensure that we only compare distinct combinations of the specified columns.

This query should work in MySQL and help you update the status column based on the conditions you described.