Performance improvement for SQL for a specific case

ghz 8months ago ⋅ 118 views

I have 2 tables like below:

A

AId,BIdDate1Date2
1B11 Jan 202131 Dec 2021
1B11 Jan 202231 Dec 2022
1B11 Jan 202331 Dec 2030
2B21 Jan 202131 Dec 2030
3B31 Jan 202431 Dec 2030

B

BIdBVal
B1B1
B2B2
B3B3

I am joining these two tables A and B based on BId column. One input of Date type is coming in the query and we want to filter the data from table A based on that parameter and need the output of below data.

Input passed to the query (input_date): 1 Jan 2022

AIdBIdDate1Date2
1B11 Jan 202231 Dec 2022 (There are multiple records of this but it should only come once)
2B21 Jan 202131 Dec 2030
3B31 Jan 202431 Dec 2030 (This is starting in future still it should come in output)

I am using below query currently but its pulling duplicate records.

select AId, BId, Date1, Date2
from A,B
where A.BId = B.BId
and input_date <= A.Date2; -- We cant add check of Date1 as we want the record with AId '3' from table A.

But this is pulling all rows from A for the record with id '1'. We just want 1 row as mentioned.

One option I can think of is to add distinct or group by but wanted to check if there is any better alternate to achieve this requirement. I am thinking DISTINCT or GROUP BY would have a performance impact as the number of rows increase.

Can this be handled in a better way?

PS: These are just sample tables Actual tables are bigger with large data.

Edit: Condition is Inputdate<= date2. Datebase: Oracle

Answers

You can achieve the desired result by using the ROW_NUMBER() window function to partition the data by AId and order it by Date1. Then, you can select only the rows where the ROW_NUMBER() is 1, indicating the earliest Date1 for each AId. Here's how you can modify your query:

WITH A_B AS (
    SELECT A.AId, A.BId, A.Date1, A.Date2,
           ROW_NUMBER() OVER (PARTITION BY A.AId ORDER BY A.Date1) AS rn
    FROM A
    INNER JOIN B ON A.BId = B.BId
    WHERE input_date <= A.Date2
)
SELECT AId, BId, Date1, Date2
FROM A_B
WHERE rn = 1;

This query will first join tables A and B, filter the rows based on the input_date, and then assign a row number to each row within each AId partition, ordered by Date1. Finally, it selects only the rows where the row number is 1, ensuring that only the earliest Date1 for each AId is included in the result set.