Swift extract regex matches

ghz 1years ago ⋅ 870 views

Question

I want to extract substrings from a string that match a regex pattern.

So I'm looking for something like this:

func matchesForRegexInText(regex: String!, text: String!) -> [String] {
   ???
}

So this is what I have:

func matchesForRegexInText(regex: String!, text: String!) -> [String] {

    var regex = NSRegularExpression(pattern: regex, 
        options: nil, error: nil)

    var results = regex.matchesInString(text, 
        options: nil, range: NSMakeRange(0, countElements(text))) 
            as Array<NSTextCheckingResult>

    /// ???

    return ...
}

The problem is, that matchesInString delivers me an array of NSTextCheckingResult, where NSTextCheckingResult.range is of type NSRange.

NSRange is incompatible with Range<String.Index>, so it prevents me of using text.substringWithRange(...)

Any idea how to achieve this simple thing in swift without too many lines of code?


Answer

Even if the matchesInString() method takes a String as the first argument, it works internally with NSString, and the range parameter must be given using the NSString length and not as the Swift string length. Otherwise it will fail for "extended grapheme clusters" such as "flags".

As of Swift 4 (Xcode 9), the Swift standard library provides functions to convert between Range<String.Index> and NSRange.

func matches(for regex: String, in text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex)
        let results = regex.matches(in: text,
                                    range: NSRange(text.startIndex..., in: text))
        return results.map {
            String(text[Range($0.range, in: text)!])
        }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

Example:

let string = "🇩🇪€4€9"
let matched = matches(for: "[0-9]", in: string)
print(matched)
// ["4", "9"]

Note: The forced unwrap Range($0.range, in: text)! is safe because the NSRange refers to a substring of the given string text. However, if you want to avoid it then use

        return results.flatMap {
            Range($0.range, in: text).map { String(text[$0]) }
        }

instead.


(Older answer for Swift 3 and earlier:)

So you should convert the given Swift string to an NSString and then extract the ranges. The result will be converted to a Swift string array automatically.

(The code for Swift 1.2 can be found in the edit history.)

Swift 2 (Xcode 7.3.1) :

func matchesForRegexInText(regex: String, text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex, options: [])
        let nsString = text as NSString
        let results = regex.matchesInString(text,
                                            options: [], range: NSMakeRange(0, nsString.length))
        return results.map { nsString.substringWithRange($0.range)}
    } catch let error as NSError {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

Example:

let string = "🇩🇪€4€9"
let matches = matchesForRegexInText("[0-9]", text: string)
print(matches)
// ["4", "9"]

Swift 3 (Xcode 8)

func matches(for regex: String, in text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex)
        let nsString = text as NSString
        let results = regex.matches(in: text, range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range)}
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

Example:

let string = "🇩🇪€4€9"
let matched = matches(for: "[0-9]", in: string)
print(matched)
// ["4", "9"]