I want to run a regexp find and replace on a wordpress database, specifically the in wp_posts table, in the post_content column. I have several affiliate URLs scattered in the text of the post_content column, and I need to replace them with new affiliate URLs. The problem is that the current URLs have multiple params that may vary depending on the URL, and they are not always in the same order. The only param that is consistent across all URLs id the store_id param.
Example of a current URL: https://www.affialite.com/redirect.php?tt=el&store_id=this_is_the_store_id¶m1=this_is_param_1&url=https://redirect_the_visitor_to_this_url.com&c_id=id_of_the_campaign
Example of how it should look after the find/replace operation: https://newaffiliate.net/c/1234/5678/9012?campaign=id_of_the_campaign&u=https://redirect_the_visitor_to_this_url.com
The query must find the affiliate links, replace the first part of the URL with the new affiliate url, capture the value of param url and match it to the new u param, and capture the value of the c_id param and match it to the new campaign param.
I have close do zero experience with mysql, other than the regular CRUD operations, so I was using ChatGPT to help me build a query for this. The closest I got is the one below, but the resulting URL has the params in all the wrong places.
UPDATE wp_posts
SET post_content =
REGEXP_REPLACE(
post_content,
'https:\/\/www\.affiliate\.com\/redirect\.php\?(?:.*?&|.*?&)(store_id=this_is_the_store_id)(?:&.*?|&)(.*?)',
CONCAT(
'https://newaffiliate.net/c/1234/5678/9012?',
'campaign=',
REGEXP_SUBSTR(
REGEXP_SUBSTR(CONCAT('&', '\\2', '&'), '&c_id=([^&]+)'),
'[^=]+$'
),
'&u=',
REGEXP_SUBSTR(
REGEXP_SUBSTR(CONCAT('&', '\\2', '&'), '&url=([^&]+)'),
'[^=]+$'
)
)
)
WHERE post_content REGEXP 'https:\/\/www\.affiliate\.com\/click\.php\?(?:.*?&|.*?&)(store_id=this_is_the_store_id)(?:&.*?|&)(.*?)';
Here's the example of a wrong resulting URL: https://newaffiliate.com?campaing=&u=param1=this_is_param_1&url=https://redirect_the_visitor_to_this_url.com&c_id=id_of_the_campaign
Any insight on what's wrong with the query? Any help is very much appreciated.
Thank you all.
Answers
It seems like the issue with your regular expression is causing the parameters to be misplaced in the resulting URL. Let's refine the regular expression to correctly capture the parameters and their values. Also, we need to ensure that the parameters are replaced in the correct order in the new URL format.
Here's an updated version of the query:
UPDATE wp_posts
SET post_content =
REGEXP_REPLACE(
post_content,
'https:\/\/www\.affiliate\.com\/redirect\.php\?[^&]*store_id=this_is_the_store_id([^&]*)',
CONCAT(
'https://newaffiliate.net/c/1234/5678/9012?',
'campaign=',
REGEXP_SUBSTR(
'\\1',
'&c_id=([^&]+)'
),
'&u=',
REGEXP_SUBSTR(
'\\1',
'&url=([^&]+)'
)
)
)
WHERE post_content REGEXP 'https:\/\/www\.affiliate\.com\/redirect\.php\?[^&]*store_id=this_is_the_store_id([^&]*)';
Changes made:
- Updated the regular expression to correctly capture the parameters following the
store_id
. - Changed the regex capture groups to capture the parameters following
store_id
. - Adjusted the
REGEXP_SUBSTR
functions to extract the values ofc_id
andurl
from the captured parameters. - Rearranged the parameters in the
CONCAT
function to match the desired URL format.
This should provide the correct replacement for your affiliate URLs in the post_content
column of the wp_posts
table. Make sure to run it on a backup of your database first, just to be safe.