ClickHouse: How to select part of json by paths?

ghz 8months ago ⋅ 52 views

Given an arbitrary JSON in ClickHouse:

{
  "id": 123,
  "name": "John",
  "metadata": {
    "foo": "bar",
    "nums": [1, 2, 3]
  }
}

And an arbitrary list of paths: ['id', 'metadata.nums', 'not.existing'] (or any similar format)

Is there any way to select json's part based on specified paths? For paths specified above I expect this result:

{
  "id": 123,
  "metadata": {
    "nums": [1, 2, 3]
  }
}

I have considered approaches with built-in functions (like JSON_VALUE with JsonPath or JSONExtractKeysAndValuesRaw), but each of them returns either values without keys or only value by one path

Answers

To achieve the desired result in ClickHouse, you can use the JSONExtractKeysAndValuesRaw function along with some additional processing. Here's how you can do it:

WITH RECURSIVE ExtractedPaths AS (
  SELECT 
    path,
    arrayJoin(splitByChar('.', path)) AS nested_path
  FROM 
  (
    SELECT 
      arrayJoin(['id', 'metadata.nums', 'not.existing']) AS path
  )
),
ExtractedValues AS (
  SELECT 
    JSONExtractKeysAndValuesRaw('{"id": 123, "name": "John", "metadata": {"foo": "bar", "nums": [1, 2, 3]}}') AS key, 
    arrayJoin(JSONExtractKeysAndValuesRaw('{"id": 123, "name": "John", "metadata": {"foo": "bar", "nums": [1, 2, 3]}}')) AS value
),
FilteredValues AS (
  SELECT 
    key, 
    value
  FROM 
    ExtractedValues
  JOIN 
    ExtractedPaths
  WHERE 
    value.1 = nested_path
)
SELECT 
  JSONExtract(jsonStringMerge(key, value), 'id', 'metadata.nums')
FROM 
  FilteredValues
GROUP BY 
  key

Explanation:

  1. ExtractedPaths: This CTE splits the input paths into individual nested paths for each element. It uses arrayJoin and splitByChar to achieve this.

  2. ExtractedValues: This CTE extracts all keys and values from the input JSON using the JSONExtractKeysAndValuesRaw function. It then flattens the array of key-value pairs using arrayJoin.

  3. FilteredValues: This CTE filters the extracted values based on the specified paths. It joins the ExtractedValues CTE with the ExtractedPaths CTE and filters the values based on matching nested paths.

  4. Finally, in the main query, we use jsonStringMerge to merge the filtered keys and values, and then use JSONExtract to extract the specified keys from the merged JSON string.

This query will return the JSON object containing only the specified keys and their corresponding values based on the given paths. Adjust the paths and input JSON as needed.