How can I make DISTINCT and FIRST_VALUE executing faster?

ghz 8months ago ⋅ 106 views

I have the following query:

SELECT DISTINCT FIRST_VALUE(business_id)
       OVER (PARTITION BY b.sub_category_id
             ORDER BY AVG(stars) desc, COUNT(*) DESC) business_id,
       sub_category_id
FROM purchase_experience pe
JOIN businesses b ON b.id = pe.business_id
 AND b.status = 'active'
 AND b.sub_category_id IN (1010 ,1007 ,1034 ,1036)
WHERE pe.stars <> 0
GROUP BY business_id
LIMIT 4

It returns something like this:

business_id | sub_category_id
1744        | 1007
13215       | 1010
9231        | 1034
9103        | 1036

And here is the result of EXPLAIN:

enter image description here

The sad part is, it takes about 7secs to be executed. Any idea how can I make it optimize?

Answers

To optimize the query and improve its performance, you can try the following approaches:

  1. Indexes: Ensure that the columns used in joins and filtering (business_id, status, sub_category_id, stars) are indexed appropriately. This can significantly speed up the execution of the query.

  2. Optimize Window Function: The FIRST_VALUE window function can sometimes be resource-intensive. You can try alternative approaches to achieve the same result without using window functions, such as subqueries or correlated subqueries.

  3. Query Restructuring: Sometimes restructuring the query can lead to better performance. This involves breaking down the query into smaller parts, optimizing each part individually, and then combining them back together.

  4. Materialized Views or Caching: If the data is relatively static or does not need to be real-time, you can consider using materialized views or caching to store precomputed results, reducing the query execution time.

  5. Tuning Database Configuration: Ensure that the database server is properly configured for optimal performance, including appropriate memory allocation, disk I/O settings, and query optimizer settings.

Here's a revised version of your query that avoids the use of window functions and may perform better in some cases:

SELECT 
    business_id,
    sub_category_id
FROM (
    SELECT 
        b.business_id,
        b.sub_category_id,
        ROW_NUMBER() OVER (PARTITION BY b.sub_category_id ORDER BY AVG(pe.stars) DESC, COUNT(*) DESC) as rn
    FROM 
        purchase_experience pe
    JOIN 
        businesses b ON b.id = pe.business_id
    WHERE 
        b.status = 'active'
        AND b.sub_category_id IN (1010, 1007, 1034, 1036)
        AND pe.stars <> 0
    GROUP BY 
        b.business_id, b.sub_category_id
) AS subq
WHERE rn = 1
LIMIT 4;

This query avoids the use of the FIRST_VALUE window function by instead using ROW_NUMBER() within a subquery. This could potentially improve performance, especially if window functions are particularly costly in your database environment. However, the effectiveness of this approach depends on various factors, including your database engine and schema design.