Programming / Rust

Rust Regular expressions

Like anything rust, rust regex are a little weird. First thing, they come in a crate. Quite a regular thing for a rust developer, but as a multilingual developer, I think that some basic features should be included. Secondly, the flavor of rust regex is… well… rustic.

First thing – the crate

In your toml file, you need to include regex and then run cargo build.

[dependencies] regex=""
Code language: TOML, also INI (ini)

Your cargo.lock will include the current version. If you want to be more precise, you can choose a version. It’s up to you to use just major, major and minor, or major minor and fixes.

Playing with matches

The first thing is to find a match. Does our string match the test regex? lets see how:

#[test] fn test_basic_match(){ let reg=regex::Regex::new("[\\w]+").unwrap(); let phrase="hello world"; assert!(reg.is_match(phrase)); }
Code language: PHP (php)

This test passes! You can try it yourself (cargo test). We don’t get anymore information in the case though. You might think that this is not a correct match, but it is. If you’re looking for a full match, you need to write it like this:

#[test] fn test_full_match(){ let reg=regex::Regex::new("^[\\w]+$").unwrap(); let phrase="hello world"; assert!(!reg.is_match(phrase)); }
Code language: Rust (rust)

Note that here we’re checking for a false match, since now we defined that we’re looking for a whole line, with no spaces, and our test phrase here doesn’t match to these settings.

Finding repetitions

Going back to the former example, in which there are two matches: how can we get all matches? The answer is rust iterators, using map:

#[test] fn test_repetitions_values_shorter_version(){ let reg=regex::Regex::new("[\\w]+").unwrap(); let phrase="hello world"; assert!(reg.is_match(phrase)); let str_vec:Vec<String>=reg.find_iter(phrase).map(|item|String::from(item.as_str())).collect(); assert_eq!("hello",str_vec[0]); assert_eq!("world",str_vec[1]); }
Code language: Rust (rust)

Note the double conversion to string there, as item is of the match type.

Dealing with numbers

Collecting numbers is pretty much the same. Instead of converting to string, you need to convert them to numeric values:

#[test] fn test_regex_to_numbers() { let reg = regex::Regex::new("[\\d]+").unwrap(); let mut results: Vec<u64> = reg.find_iter("1,3,4") .map(|item| item.as_str() .parse::<u64>() .unwrap()) .collect(); assert_eq!(vec!(1, 3, 4), results); }
Code language: Rust (rust)

The use of unwrap here is pretty safe – we should only get numbers. But if it makes you uncomfortable, you can use a match statement within the map code block.

Leave a Reply

Your email address will not be published. Required fields are marked *