Question
I have a table with ~500k rows; varchar(255) UTF8 column filename
contains a
file name;
I'm trying to strip out various strange characters out of the filename -
thought I'd use a character class: [^a-zA-Z0-9()_ .\-]
Now, is there a function in MySQL that lets you replace through a regular expression? I'm looking for a similar functionality to REPLACE() function - simplified example follows:
SELECT REPLACE('stackowerflow', 'ower', 'over');
Output: "stackoverflow"
/* does something like this exist? */
SELECT X_REG_REPLACE('Stackoverflow','/[A-Zf]/','-');
Output: "-tackover-low"
I know about REGEXP/RLIKE, but those only check if there is a match, not what the match is.
(I could do a "SELECT pkey_id,filename FROM foo WHERE filename RLIKE '[^a-zA-Z0-9()_ .\-]'
" from a PHP script, do a preg_replace
and then
"UPDATE foo ... WHERE pkey_id=...
", but that looks like a last-resort slow &
ugly hack)
Answer
With MySQL 8.0+ you could use natively REGEXP_REPLACE
function.
[12.5.2 Regular Expressions](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp- replace):
REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])
Replaces occurrences in the string expr that match the regular expression specified by the pattern pat with the replacement string repl , and returns the resulting string. If expr , pat , or repl is
NULL
, the return value isNULL
.
and Regular expression support:
Previously, MySQL used the Henry Spencer regular expression library to support regular expression operators (
REGEXP
,RLIKE
).Regular expression support has been reimplemented using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe. The
REGEXP_LIKE()
function performs regular expression matching in the manner of theREGEXP
andRLIKE
operators, which now are synonyms for that function. In addition, theREGEXP_INSTR()
,REGEXP_REPLACE()
, andREGEXP_SUBSTR()
functions are available to find match positions and perform substring substitution and extraction, respectively.
SELECT REGEXP_REPLACE('Stackoverflow','[A-Zf]','-',1,0,'c');
-- Output:
-tackover-low