I have the following query:
SELECT DISTINCT FIRST_VALUE(business_id)
OVER (PARTITION BY b.sub_category_id
ORDER BY AVG(stars) desc, COUNT(*) DESC) business_id,
sub_category_id
FROM purchase_experience pe
JOIN businesses b ON b.id = pe.business_id
AND b.status = 'active'
AND b.sub_category_id IN (1010 ,1007 ,1034 ,1036)
WHERE pe.stars <> 0
GROUP BY business_id
LIMIT 4
It returns something like this:
business_id | sub_category_id
1744 | 1007
13215 | 1010
9231 | 1034
9103 | 1036
And here is the result of EXPLAIN
:
The sad part is, it takes about 7secs to be executed. Any idea how can I make it optimize?
Answers
To optimize the query and improve its performance, you can try the following approaches:
-
Indexes: Ensure that the columns used in joins and filtering (
business_id
,status
,sub_category_id
,stars
) are indexed appropriately. This can significantly speed up the execution of the query. -
Optimize Window Function: The
FIRST_VALUE
window function can sometimes be resource-intensive. You can try alternative approaches to achieve the same result without using window functions, such as subqueries or correlated subqueries. -
Query Restructuring: Sometimes restructuring the query can lead to better performance. This involves breaking down the query into smaller parts, optimizing each part individually, and then combining them back together.
-
Materialized Views or Caching: If the data is relatively static or does not need to be real-time, you can consider using materialized views or caching to store precomputed results, reducing the query execution time.
-
Tuning Database Configuration: Ensure that the database server is properly configured for optimal performance, including appropriate memory allocation, disk I/O settings, and query optimizer settings.
Here's a revised version of your query that avoids the use of window functions and may perform better in some cases:
SELECT
business_id,
sub_category_id
FROM (
SELECT
b.business_id,
b.sub_category_id,
ROW_NUMBER() OVER (PARTITION BY b.sub_category_id ORDER BY AVG(pe.stars) DESC, COUNT(*) DESC) as rn
FROM
purchase_experience pe
JOIN
businesses b ON b.id = pe.business_id
WHERE
b.status = 'active'
AND b.sub_category_id IN (1010, 1007, 1034, 1036)
AND pe.stars <> 0
GROUP BY
b.business_id, b.sub_category_id
) AS subq
WHERE rn = 1
LIMIT 4;
This query avoids the use of the FIRST_VALUE
window function by instead using ROW_NUMBER()
within a subquery. This could potentially improve performance, especially if window functions are particularly costly in your database environment. However, the effectiveness of this approach depends on various factors, including your database engine and schema design.