What might be wrong with this attempt to highlight ES results?

ghz 8months ago ⋅ 157 views

I have a working version of an Elasticsearch 7.10.2 setup where the stemmed query results are highlighted in beautiful multiple colours. So if you have 4 words in your search query the results are delivered with the words in the results corresponding to the 4 different terms ... highlighted in 4 different colours.

ES v 7.16.8 (if memory serves...) broke this and I raised an issue with ES HQ, which in fact turned out to be a regression which others had spotted in relation to other problems. Later the problem was said to have been put right.

I'm now trying to get ES 8.6.2 to do this same thing (running on port 9500).

This is my mapping, applied to a new index before populating it (other fields omitted):

mappings = {
    'properties': {
        'text_content': {
            'type': 'text',
            'term_vector': 'with_positions_offsets',
            'fields': {
                'stemmed': {
                    'type': 'text',
                    'analyzer': 'english',
                    'term_vector': 'with_positions_offsets',
                }
            }
        }
    }
}

And this is how I run the query after populating the index:

data = {
    'query': {
        'simple_query_string': {
            'query': self.query_text,
            'fields': ['text_content.stemmed'] 
        }
    },
    'highlight': {
        'fields': {
            'text_content.stemmed': {
                'type': 'fvh',
                'pre_tags': [
                    '<span style="background-color: yellow">',
                    '<span style="background-color: skyblue">', 
                    '<span style="background-color: lightgreen">', 
                    '<span style="background-color: plum">', 
                    '<span style="background-color: lightcoral">', 
                    '<span style="background-color: silver">',
                ],
                'post_tags': ['</span>', '</span>', '</span>', 
                    '</span>', '</span>', '</span>',]
            }
        },
        'number_of_fragments': 0
    }
}        
search_url = f'{ES_URL}/{ALIAS_NAME}/_search'
headers = {'Content-type': 'application/json'}
success, deliverable = utilities.process_json_request(search_url, data=json.dumps(data), headers=headers) 

This currently fails with:

request failed. URL |https://localhost:9500/dev_my_documents/_search| command get deliverable.failure_reason unacceptable status code: 400
reason: [1:501] [highlight] failed to parse field [fields]

Examining the failure response's json() I get this:

{
  "error": {
    "root_cause": [
      {
        "type": "x_content_parse_exception",
        "reason": "[1:500] [highlight_field] failed to parse field [post_tags]"
      }
    ],
    "type": "x_content_parse_exception",
    "reason": "[1:500] [highlight] failed to parse field [fields]",
    "caused_by": {
      "type": "x_content_parse_exception",
      "reason": "[1:500] [fields] failed to parse field [text_content.stemmed]",
      "caused_by": {
        "type": "x_content_parse_exception",
        "reason": "[1:500] [highlight_field] failed to parse field [post_tags]",
        "caused_by": {
          "type": "json_e_o_f_exception",
          "reason": "Unexpected end-of-input in VALUE_STRING\n at [Source: (org.elasticsearch.common.io.stream.ByteBufferStreamInput); line: 1, column: 504]"
        }
      }
    }
  },
  "status": 400
}

This is the identical mapping and query searching which works with my ES 7.10.2. Can anyone explain what I'm doing wrong now and how to implement the multi-colour highlighting correctly with 8.6.2?

For clarification: if I comment out the whole "highlight" key/value in data, the query runs fine. With no highlighting at all, obviously.

Answers

It looks like you're facing an issue with the highlighting configuration in Elasticsearch 8.6.2. The error message suggests that there's a problem parsing the fields section of the highlight configuration.

In Elasticsearch 8.x, the highlight parameter has been changed to highlight.query for specifying the fields to be highlighted. Additionally, the type parameter is deprecated in favor of highlight.type.

Here's how you can modify your query to fix the highlighting configuration:

data = {
    'query': {
        'simple_query_string': {
            'query': self.query_text,
            'fields': ['text_content.stemmed']
        }
    },
    'highlight': {
        'query': {
            'number_of_fragments': 0,
            'fields': {
                'text_content.stemmed': {
                    'type': 'fvh',
                    'pre_tags': [
                        '<span style="background-color: yellow">',
                        '<span style="background-color: skyblue">', 
                        '<span style="background-color: lightgreen">', 
                        '<span style="background-color: plum">', 
                        '<span style="background-color: lightcoral">', 
                        '<span style="background-color: silver">'
                    ],
                    'post_tags': ['</span>', '</span>', '</span>', 
                                  '</span>', '</span>', '</span>']
                }
            }
        }
    }
}

Changes made:

  • highlight key changed to highlight.query.
  • number_of_fragments moved inside query object.
  • fields moved inside query object.
  • type changed to highlight.type (although it's deprecated, it might still work for now).
  • Removed the extra comma at the end of the pre_tags array.

With these changes, your query should work properly in Elasticsearch 8.6.2, and the multi-color highlighting should be applied as expected.