Mysql 8.0.35 - I cannot get rid of filesort caused by a JSON col

ghz 8months ago ⋅ 114 views

I spent days researching, trying solutions etc with no success, filesort is still happening

My whole table is only 300MB and the part I am selecting is 767 rows and around 30MB

My query using filesort is the following

select * 
from finance_trans_deposits 
where finance_trans_deposits.id_customer = 123 
    and finance_trans_deposits.id_customer is not null 
    and status = 2 
order by created_at desc;

And this is EXPLAIN output

+----+-------------+------------------------+------------+------+---------------------------------+---------------------+---------+-------------+------+----------+----------------+
| id | select_type | table                  | partitions | type | possible_keys                  | key                | key_len | ref        | rows | filtered | Extra          |
+----+-------------+------------------------+------------+------+---------------------------------+---------------------+---------+-------------+------+----------+----------------+
|  1 | SIMPLE      | finance_trans_deposits | NULL      | ref  | idx_customer_status,id_customer | idx_customer_status | 9      | const,const |  767 |  100.00 | Using filesort |
+----+-------------+------------------------+------------+------+---------------------------------+---------------------+---------+-------------+------+----------+----------------+
1 row in set, 1 warning (0.00 sec)

The reason for this issue is 100% a single json column in my table, if I change "select *" to select "list of all columns excluding JSON column", it takes 0.01s (instead of 0.5s). Json data in each row is around 30kb, so all json data in whole select is around 20MB.

I tried adjusting sort_buffer_size to various values, I tried from 256MB up to 5GB (I have a lot of memory) but the issue remains

OPTIMIZER TRACE  shows the following
"filesort_summary": {
"memory_available": 2147483648,
"key_size": 9,
"row_size": 4294967295,
"max_rows_per_buffer": 0,
"num_rows_estimate": 767,
"num_rows_found": 767,
"num_initial_chunks_spilled_to_disk": 0,
"peak_memory_used": 28640470,
"sort_algorithm": "std::stable_sort",
"sort_mode": "<fixed_sort_key, packed_additional_fields>"
}

I am wondering what row_size is, it is 4GB but this table definitely has only 300MB, memory used is only 28 MB, I believe this is the real size of the selected dataset (my calculation was only estimate), why is it not being sorted in memory? I am starting to think this is a mysql bug

CREATE TABLE `finance_trans_deposits` (
  `id` bigint NOT NULL,
  `uuid` varchar(36) DEFAULT NULL,
  `trx_id` varchar(255) DEFAULT NULL,
  `account_id` bigint DEFAULT NULL,
  `validation_key` varchar(255) DEFAULT NULL,
  `project_id` bigint NOT NULL,
  `user_id` int DEFAULT NULL,
  `id_customer` bigint NOT NULL,
  `session_id` bigint DEFAULT NULL,
  `type` tinyint(1) NOT NULL,
  `amount` decimal(13,2) NOT NULL,
  `currency` varchar(3) NOT NULL,
  `amount_eur` decimal(13,4) NOT NULL,
  `payment_method_id` bigint NOT NULL,
  `provider_pm` varchar(255) DEFAULT NULL,
  `balance` decimal(13,2) NOT NULL,
  `status` tinyint(1) NOT NULL,
  `provider_status` varchar(255) DEFAULT NULL,
  `force_status` varchar(255) DEFAULT NULL,
  `risk_score` int DEFAULT NULL,
  `created_at` datetime(4) DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  `processed_at` timestamp NULL DEFAULT NULL,
  `bonus_want` tinyint(1) DEFAULT NULL,
  `payload` json DEFAULT NULL,
  `forwarded_at` timestamp NULL DEFAULT NULL,
  `dsw` decimal(13,4) DEFAULT NULL,
  `dsw_eur` decimal(13,4) DEFAULT NULL,
  `dsw_count` int DEFAULT NULL,
  `pre_notification_sent_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

ALTER TABLE finance_trans_deposits ADD PRIMARY KEY (id), ADD KEY id_customer_2 (id_customer,status,created_at);

I made a copy of my table to be able to alter it freely (its a production env). After copying I discovered the table has suddenly 470MB instead of 300 it had before (same number of rows, I verified).

I cleaned everything, I removed all indexes and created a new single index on these 3 columns

It has not changed anything in terms of query time (still 0.5s) however output of EXPLAIN has changed, now it shows Backward index scan

+----+-------------+-------+------------+------+---------------+---------------+---------+-------------+------+----------+---------------------+
| id | select_type | table | partitions | type | possible_keys | key          | key_len | ref        | rows | filtered | Extra              |
+----+-------------+-------+------------+------+---------------+---------------+---------+-------------+------+----------+---------------------+
|  1 | SIMPLE      | 2test | NULL      | ref  | id_customer_2 | id_customer_2 | 9      | const,const |  767 |  100.00 | Backward index scan |
+----+-------------+-------+------------+------+---------------+---------------+---------+-------------+------+----------+---------------------+

so you think filesort is gone? I am not convinced, because the query still takes 0.5s. When I drop the json column the same query takes 0.01s (just tested), why is JSON column causing this still, I do not believe it is the big amount of data in that column, like I said it is not more than 50kb per row, sometimes NULL

When I select on JSON column only (not SELECT *) its the same time 0.5

I am attaching below EXPLAIN ANALYZE, keep in mind this is after creating a new index on all 3 columns id_customer,status,created_at

+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Index lookup on finance_trans_deposits  using id_customer_2 (id_customer=123, status=2) (reverse)  (cost=844 rows=767) (actual time=0.157..7.37 rows=767 loops=1)
 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+

it shows that the seelct took 7 miliseconds, but when I actually run the query it shows 767 rows in set (0.53 sec)

Output of SHOW CREATE TABLE

finance_trans_deposits | CREATE TABLE `finance_trans_deposits` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `uuid` varchar(36) DEFAULT NULL,
  `trx_id` varchar(255) DEFAULT NULL,
  `account_id` bigint DEFAULT NULL,
  `validation_key` varchar(255) DEFAULT NULL,
  `project_id` bigint NOT NULL,
  `user_id` int DEFAULT NULL,
  `id_customer` bigint NOT NULL,
  `session_id` bigint DEFAULT NULL,
  `type` tinyint(1) NOT NULL,
  `amount` decimal(13,2) NOT NULL,
  `currency` varchar(3) NOT NULL,
  `amount_eur` decimal(13,4) NOT NULL,
  `payment_method_id` bigint NOT NULL,
  `provider_pm` varchar(255) DEFAULT NULL,
  `balance` decimal(13,2) NOT NULL,
  `status` tinyint(1) NOT NULL,
  `provider_status` varchar(255) DEFAULT NULL,
  `force_status` varchar(255) DEFAULT NULL,
  `risk_score` int DEFAULT NULL,
  `created_at` datetime(4) DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  `processed_at` timestamp NULL DEFAULT NULL,
  `bonus_want` tinyint(1) DEFAULT NULL,
  `payload` json DEFAULT NULL,
  `forwarded_at` timestamp NULL DEFAULT NULL,
  `dsw` decimal(13,4) DEFAULT NULL,
  `dsw_eur` decimal(13,4) DEFAULT NULL,
  `dsw_count` int DEFAULT NULL,
  `pre_notification_sent_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `finance_trans_deposits_payment_method_id_foreign` (`payment_method_id`),
  KEY `id_customer` (`id_customer`),
  KEY `id_customer_2` (`id_customer`,`status`,`created_at`),
  CONSTRAINT `finance_trans_deposits_payment_method_id_foreign` FOREIGN KEY (`payment_method_id`) REFERENCES `finance_payment_methods` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
) ENGINE=InnoDB AUTO_INCREMENT=27103 DEFAULT CHARSET=latin1 |

First time using profiler, I hope I used it correctly. I did SET profiling=1, then I run the query, and SHOW PROFILE. Here is output

+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000272 |
| Executing hook on transaction  | 0.000018 |
| starting                       | 0.000022 |
| checking permissions           | 0.000017 |
| Opening tables                 | 0.000134 |
| init                           | 0.000021 |
| System lock                    | 0.000027 |
| optimizing                     | 0.000048 |
| statistics                     | 0.000213 |
| preparing                      | 0.000069 |
| executing                      | 0.554983 |
| end                            | 0.000014 |
| query end                      | 0.000003 |
| waiting for handler commit     | 0.000006 |
| closing tables                 | 0.000013 |
| freeing items                  | 0.000011 |
| logging slow query             | 0.000038 |
| cleaning up                    | 0.000009 |
+--------------------------------+----------+

Interestingly it shows executing took 0.55s

Answers

It seems like you're experiencing performance issues with your MySQL query, particularly related to the sorting process when selecting data from a table with a JSON column. Let's break down the issues and potential solutions:

  1. Filesort operation: The presence of the JSON column seems to be triggering filesort even though the dataset is relatively small. This can be due to MySQL's handling of large columns like JSON.

  2. Increased table size after copying: The table size increased after copying, which suggests potential data inflation or differences in storage engines.

  3. Indexing: You've tried indexing the columns used in the query, which is a good step. However, it seems like the index is not being utilized efficiently, as indicated by the EXPLAIN output.

  4. Profiler output: The profiler output shows that the majority of the execution time is spent on executing the query itself, rather than on other tasks like opening tables or optimizing.

Given these observations, here are some steps you can take to improve performance:

  • Check JSON column usage: Review how the JSON column is being used in your query. If it's not critical for the query results, consider excluding it from the SELECT statement to see if performance improves.

  • Review indexing strategy: Experiment with different indexing strategies. Since you're filtering by id_customer and status and sorting by created_at, ensure that your index covers these columns effectively. You might try creating composite indexes or adjusting the order of columns in the index.

  • Optimize query: Ensure that your query is as optimized as possible. Avoid unnecessary operations and ensure that the WHERE clause conditions are selective enough to utilize indexes effectively.

  • Consider JSON column storage: If the JSON column is causing performance issues, consider alternatives such as storing JSON data in a separate table or using a different data type for the column if feasible.

  • Review server configuration: Ensure that your MySQL server is properly configured for your workload. This includes parameters like memory allocation, buffer sizes, and query cache settings.

  • Database schema design: Review your database schema design to ensure it's optimized for your queries and workload. Sometimes, restructuring tables or denormalizing data can improve performance.

  • Update MySQL: Ensure that you're using the latest stable version of MySQL, as newer versions may include performance improvements and bug fixes.

By carefully analyzing your query, indexing strategy, and server configuration, you should be able to pinpoint the root cause of the performance issues and take appropriate actions to address them. If you're still facing difficulties, consider consulting with a database administrator or MySQL expert for further assistance.