I have a working version of an Elasticsearch 7.10.2 setup where the stemmed query results are highlighted in beautiful multiple colours. So if you have 4 words in your search query the results are delivered with the words in the results corresponding to the 4 different terms ... highlighted in 4 different colours.
ES v 7.16.8 (if memory serves...) broke this and I raised an issue with ES HQ, which in fact turned out to be a regression which others had spotted in relation to other problems. Later the problem was said to have been put right.
I'm now trying to get ES 8.6.2 to do this same thing (running on port 9500).
This is my mapping, applied to a new index before populating it (other fields omitted):
mappings = {
'properties': {
'text_content': {
'type': 'text',
'term_vector': 'with_positions_offsets',
'fields': {
'stemmed': {
'type': 'text',
'analyzer': 'english',
'term_vector': 'with_positions_offsets',
}
}
}
}
}
And this is how I run the query after populating the index:
data = {
'query': {
'simple_query_string': {
'query': self.query_text,
'fields': ['text_content.stemmed']
}
},
'highlight': {
'fields': {
'text_content.stemmed': {
'type': 'fvh',
'pre_tags': [
'<span style="background-color: yellow">',
'<span style="background-color: skyblue">',
'<span style="background-color: lightgreen">',
'<span style="background-color: plum">',
'<span style="background-color: lightcoral">',
'<span style="background-color: silver">',
],
'post_tags': ['</span>', '</span>', '</span>',
'</span>', '</span>', '</span>',]
}
},
'number_of_fragments': 0
}
}
search_url = f'{ES_URL}/{ALIAS_NAME}/_search'
headers = {'Content-type': 'application/json'}
success, deliverable = utilities.process_json_request(search_url, data=json.dumps(data), headers=headers)
This currently fails with:
request failed. URL |https://localhost:9500/dev_my_documents/_search| command get deliverable.failure_reason unacceptable status code: 400
reason: [1:501] [highlight] failed to parse field [fields]
Examining the failure response's json() I get this:
{
"error": {
"root_cause": [
{
"type": "x_content_parse_exception",
"reason": "[1:500] [highlight_field] failed to parse field [post_tags]"
}
],
"type": "x_content_parse_exception",
"reason": "[1:500] [highlight] failed to parse field [fields]",
"caused_by": {
"type": "x_content_parse_exception",
"reason": "[1:500] [fields] failed to parse field [text_content.stemmed]",
"caused_by": {
"type": "x_content_parse_exception",
"reason": "[1:500] [highlight_field] failed to parse field [post_tags]",
"caused_by": {
"type": "json_e_o_f_exception",
"reason": "Unexpected end-of-input in VALUE_STRING\n at [Source: (org.elasticsearch.common.io.stream.ByteBufferStreamInput); line: 1, column: 504]"
}
}
}
},
"status": 400
}
This is the identical mapping and query searching which works with my ES 7.10.2. Can anyone explain what I'm doing wrong now and how to implement the multi-colour highlighting correctly with 8.6.2?
For clarification: if I comment out the whole "highlight" key/value in data
, the query runs fine. With no highlighting at all, obviously.
Answers
It looks like you're facing an issue with the highlighting configuration in Elasticsearch 8.6.2. The error message suggests that there's a problem parsing the fields
section of the highlight configuration.
In Elasticsearch 8.x, the highlight
parameter has been changed to highlight.query
for specifying the fields to be highlighted. Additionally, the type
parameter is deprecated in favor of highlight.type
.
Here's how you can modify your query to fix the highlighting configuration:
data = {
'query': {
'simple_query_string': {
'query': self.query_text,
'fields': ['text_content.stemmed']
}
},
'highlight': {
'query': {
'number_of_fragments': 0,
'fields': {
'text_content.stemmed': {
'type': 'fvh',
'pre_tags': [
'<span style="background-color: yellow">',
'<span style="background-color: skyblue">',
'<span style="background-color: lightgreen">',
'<span style="background-color: plum">',
'<span style="background-color: lightcoral">',
'<span style="background-color: silver">'
],
'post_tags': ['</span>', '</span>', '</span>',
'</span>', '</span>', '</span>']
}
}
}
}
}
Changes made:
highlight
key changed tohighlight.query
.number_of_fragments
moved insidequery
object.fields
moved insidequery
object.type
changed tohighlight.type
(although it's deprecated, it might still work for now).- Removed the extra comma at the end of the
pre_tags
array.
With these changes, your query should work properly in Elasticsearch 8.6.2, and the multi-color highlighting should be applied as expected.