Is it possible to use ^ operator in wildcards for Azure DataFactory delete activity?
In Azure DataFactory, I want to delete all files at a particular path that do not contain a substring (i.e. '2024-04-15').
I managed to do it with a loop activity but it is too expensive when the amount of files at the particular path is big (around 1000).
Thus I wanted to test wildcards, but I am stuck trying to use the not operator for regexes.
For example, if I have 3 files at the path: [lorem_2024-04-15.csv, lorem_2024-04-14_test.csv, lorem_2024-04-14.csv]
, I want to delete all files but lorem_2024-04-15.csv
. I tried:
Wildcard file name: *^(?!.*2024-04-15).*
But when trying to preview data for the dataset pointing to that path:
The escaped character ^( is not recognized for wildcard filter (*, ?).
If you want to use ^ in file name, specify ^^ to escape. Index: 2.
Answers
In Azure Data Factory, the wildcard pattern does not support regular expressions directly. Instead, it supports a limited set of wildcard characters like *
and ?
. However, you can achieve the desired result by using multiple wildcard filters combined with a logical OR operation.
Here's how you can approach it:
- Use two wildcard filters: one to match files containing the substring '2024-04-15', and another to match all files.
- Combine the two filters using a logical OR operation to delete files that do not contain the substring '2024-04-15'.
Here's an example:
Wildcard file name: *2024-04-15* | *
This wildcard pattern will match any file that contains the substring '2024-04-15' OR any file (i.e., all files). Then, you can use this pattern in your delete activity to delete the files that do not contain the specified substring.
Keep in mind that this approach may still not be as efficient as desired, especially if you have a large number of files. In such cases, you might need to consider alternative approaches or use a custom script to perform the file deletion operation.