Regex Tools

General Applications

Languages & Libraries

Boost

Delphi

Databases

Using Regular Expressions with Microsoft .NET

Microsoft .NET, which you can use with any .NET programming language such as C# (C sharp) or Visual Basic.NET, has solid support for regular expressions. .NET’s regex flavor is very feature-rich. The only noteworthy features that are lacking are possessive quantifiers and subroutine calls.

There are no differences in the regex flavor supported by the .NET Framework versions 2.0 through 4.8. There are no differences between this flavor and the flavor supported by any version of .NET Core either. That includes the original .NET Core 1.0.0 and the latest .NET 5.0.

There are a few differences between the regex flavor in the .NET Framework 1.x compared with later versions. The .NET Framework 2.0 fixes a few bugs. The Unicode categories \p{Pi} and \p{Pf} are no longer reversed. Unicode blocks with hyphens in their names are now handled correctly. One feature was added in .NET 2.0: character class subtraction. It works exactly the way it does in XML Schema regular expressions. The XML Schema standard first defined this feature and its syntax.

System.Text.RegularExpressions Overview (Using VB.NET Syntax)

The regex classes are located in the namespace System.Text.RegularExpressions. To make them available, place Imports System.Text.RegularExpressions at the start of your source code.

The Regex class is the one you use to compile a regular expression. For efficiency, regular expressions are compiled into an internal format. If you plan to use the same regular expression repeatedly, construct a Regex object as follows: Dim RegexObj as Regex = New Regex("regularexpression"). You can then call RegexObj.IsMatch("subject") to check whether the regular expression matches the subject string. The Regex allows an optional second parameter of type RegexOptions. You could specify RegexOptions.IgnoreCase as the final parameter to make the regex case insensitive. Other options are IgnorePatternWhitespace which makes the regex free-spacing, RegexOptions.Singleline which makes the dot to match newlines, RegexOptions.Multiline which makes the caret and dollar to match at embedded newlines in the subject string, and RegexOptions.ExplicitCapture which turns all unnamed groups into non-capturing groups.

Call RegexObj.Replace("subject", "replacement") to perform a search-and-replace using the regex on the subject string, replacing all matches with the replacement string. In the replacement string, you can use $& to insert the entire regex match into the replacement text. You can use $1, $2, $3, etc. to insert the text matched between capturing parentheses into the replacement text. Use $$ to insert a single dollar sign into the replacement text. To replace with the first backreference immediately followed by the digit 9, use ${1}9. If you type $19, and there are less than 19 backreferences, then $19 will be interpreted as literal text, and appear in the result string as such. To insert the text from a named capturing group, use ${name}. Improper use of the $ sign may produce an undesirable result string, but will never cause an exception to be raised.

RegexObj.Split("Subject") splits the subject string along regex matches, returning an array of strings. The array contains the text between the regex matches. If the regex contains capturing parentheses, the text matched by them is also included in the array. If you want the entire regex matches to be included in the array, simply place parentheses around the entire regular expression when instantiating RegexObj.

The Regex class also contains several static methods that allow you to use regular expressions without instantiating a Regex object. This reduces the amount of code you have to write, and is appropriate if the same regular expression is used only once or reused seldomly. Note that member overloading is used a lot in the Regex class. All the static methods have the same names (but different parameter lists) as other non-static methods.

Regex.IsMatch("subject", "regex") checks if the regular expression matches the subject string. Regex.Replace("subject", "regex", "replacement") performs a search-and-replace. Regex.Split("subject", "regex") splits the subject string into an array of strings as described above. All these methods accept an optional additional parameter of type RegexOptions, like the constructor.

The System.Text.RegularExpressions.Match Class

If you want more information about the regex match, call Regex.Match() to construct a Match object. If you instantiated a Regex object, use Dim MatchObj as Match = RegexObj.Match("subject"). If not, use the static version: Dim MatchObj as Match = Regex.Match("subject", "regex").

Either way, you will get an object of class Match that holds the details about the first regex match in the subject string. MatchObj.Success indicates if there actually was a match. If so, use MatchObj.Value to get the contents of the match, MatchObj.Length for the length of the match, and MatchObj.Index for the start of the match in the subject string. The start of the match is zero-based, so it effectively counts the number of characters in the subject string to the left of the match.

If the regular expression contains capturing parentheses, use the MatchObj.Groups collection. MatchObj.Groups.Count indicates the number of capturing parentheses. The count includes the zeroth group, which is the entire regex match. MatchObj.Groups(3).Value gets the text matched by the third pair of parentheses. MatchObj.Groups(3).Length and MatchObj.Groups(3).Index get the length of the text matched by the group and its index in the subject string, relative to the start of the subject string. MatchObj.Groups("name") gets the details of the named group “name”.

To find the next match of the regular expression in the same subject string, call MatchObj.NextMatch() which returns a new Match object containing the results for the second match attempt. You can continue calling MatchObj.NextMatch() until MatchObj.Success is False.

Note that after calling RegexObj.Match(), the resulting Match object is independent from RegexObj. This means you can work with several Match objects created by the same Regex object simultaneously.

Regular Expressions, Literal Strings and Backslashes

In literal C# strings, as well as in C++ and many other .NET languages, the backslash is an escape character. The literal string "\\" is a single backslash. In regular expressions, the backslash is also an escape character. The regular expression \\ matches a single backslash. This regular expression as a C# string, becomes "\\\\". That’s right: 4 backslashes to match a single one.

The regex \w matches a word character. As a C# string, this is written as "\\w".

To make your code more readable, you should use C# verbatim strings. In a verbatim string, a backslash is an ordinary character. This allows you to write the regular expression in your C# code as you would write it a tool like RegexBuddy or PowerGREP, or as the user would type it into your application. The regex to match a backlash is written as @"\\" when using C# verbatim strings. The backslash is still an escape character in the regular expression, so you still need to double it. But doubling is better than quadrupling. To match a word character, use the verbatim string @"\w".

RegexOptions.ECMAScript

Passing RegexOptions.ECMAScript to the Regex() constructor changes the behavior of certain regex features to follow the behavior prescribed in the ECMA-262 standard. This standard defines the ECMAScript language, which is better known as JavaScript. The table below compares the differences between canonical .NET (without the ECMAScript option) and .NET in ECMAScirpt mode. For reference the table also compares how JavaScript in modern browsers behaves in these areas.

Feature or Syntax	Canonical .NET	.NET in ECMAScript mode	JavaScript
RegexOptions.FreeSpacing	Supported	Only via `(?x)`	Not supported
RegexOptions.SingleLine	Supported	Only via `(?s)`	Not supported
RegexOptions.ExplicitCapture	Supported	Only via `(?n)`	Not supported
Escaped letter or underscore that does not form a regex token	Error	Literal letter or underscore
Escaped digit that is not a valid backreference	Error	Octal escape or literal 8 or 9
Escaped double digits that do not form a valid backreference	Error	Single digit backreference and literal digit if the single digit backreference is valid; otherwise single or double digit octal escape and/or literal 8 and 9
Backreference to non-participating group	Fails to match	Zero-length match
Forward reference	Supported	Error	Zero-length match
Backreference to group 0	Fails to match	Zero-length match	Syntactically not possible
`\s`	Unicode	ASCII	Unicode
`\d`	Unicode	ASCII
`\w`	Unicode	ASCII
`\b`	Unicode	ASCII

Though RegexOptions.ECMAScript brings the .NET regex engine a little bit closer to JavaScript’s, there are still significant differences between the .NET regex flavor and the JavaScript regex flavor. When creating web pages using ASP.NET on the server an JavaScript on the client, you cannot assume the same regex to work in the same way both on the client side and the server side even when setting RegexOptions.ECMAScript. The next table lists the more important differences between .NET and JavaScript. RegexOptions.ECMAScript has no impact on any of these.

The table also compares the XRegExp library for JavaScript. You can use this library to bring JavaScript’s regex flavor a little bit closer to .NET’s.

Feature or syntax	.NET	XRegExp	JavaScript
Dot	`[^\n]`	`[^\n\r\u2028\u2029]`
Anchors in multi-line mode	Treat only `\n` as a line break	Treat `\n`, `\r`, `\u2028`, and `\u2029` as line breaks
`$` without multi-line mode	Matches at very end of string	Matches before final line break and at very end of string
Permanent start and end of string anchors	Supported	Not supported
Empty character class	Syntactically not possible	Fails to match
Lookbehind	Supported without restrictions	Supported (without restrictions) since ECMAScript 2018
Mode modifiers	Anywhere	At start of regex only	Not supported
Comments	Supported		Not supported
Unicode properties	Categories and blocks		Not supported
Named capture and backreferences	Supported		Not supported
Balancing groups	Supported	Not supported
Conditionals	Supported	Not supported

Using Regular Expressions with Microsoft .NET

System.Text.RegularExpressions Overview (Using VB.NET Syntax)

The System.Text.RegularExpressions.Match Class

Regular Expressions, Literal Strings and Backslashes

RegexOptions.ECMAScript

Further Reading