RegexBuddy—Better than a regular expression reference!

Regular Expression Unicode Syntax Reference

This reference page explains what the Unicode tokens do when used outside character classes. All of these except \X can also be used inside character classes. Inside a character class, these tokens add the characters that they normally match to the character class.

Feature	Syntax	Description	Example	JGsoft	.NET	Java	Perl	PCRE	PCRE2	PHP	Delphi	R	JavaScript	VBScript	XRegExp	Python	Ruby	std::regex	Boost	Tcl ARE	POSIX BRE	POSIX ERE	GNU BRE	GNU ERE	Oracle	XML	XPath
Grapheme	`\X`	Matches a single Unicode grapheme, whether encoded as a single code point or multiple code points using combining marks. A grapheme most closely resembles the everyday concept of a “character”.	`\X` matches `à` encoded as U+0061 U+0300, `à` encoded as U+00E0, `©`, etc.	YES	no	9	YES	5.0	YES	5.0.5	YES	YES	no	no	no	no	2.0	no	ECMA extended egrep awk	no	no	no	no	no	no	no	no
Code point	`\uFFFF` where FFFF are 4 hexadecimal digits	Matches a specific Unicode code point.	`\u00E0` matches `à` encoded as U+00E0 only. `\u00A9` matches `©`	YES	YES	YES	no	no	no	no	no	no	YES	YES	YES	3.3 2.4 string	1.9	ECMA	no	YES	no	no	no	no	no	no	no
Code point	`\u{FFFF}` where FFFF are 1 to 4 hexadecimal digits	Matches a specific Unicode code point.	`\u{E0}` matches `à` encoded as U+00E0 only. `\u{A9}` matches `©`	V2	no	no	no	no	no	7.0.0 string	no	no	no	no	3	no	1.9	no	no	no	no	no	no	no	no	no	no
Code point	`\xFFFF` where FFFF are 4 hexadecimal digits	Matches a specific Unicode code point.	`\x00E0` matches `à` encoded as U+00E0 only. `\x00A9` matches `©`	no	no	no	no	no	no	no	no	no	no	no	no	no	no	string	no	8.4–8.5	no	no	no	no	no	no	no
Code point	`\x{FFFF}` where FFFF are 1 to 4 hexadecimal digits	Matches a specific Unicode code point.	`\x{E0}` matches `à` encoded as U+00E0 only. `\x{A9}` matches `©`	YES	no	7	YES	YES	YES	YES	YES	YES	no	no	no	no	no	no	ECMA extended egrep awk	no	no	no	no	no	no	no	no
Unicode category	`\pL` where L is a Unicode category	Matches a single Unicode code point in the specified Unicode category.	`\pL` matches `à` encoded as U+00E0; `\pS` matches `©`	YES	no	YES	YES	5.0	YES	5.0.5	YES	YES	no	no	3	no	no	no	no	no	no	no	no	no	no	no	no
Unicode category	`\PL` where L is a Unicode category	Matches a single Unicode code point that is not in the specified Unicode category.	`\PS` matches `à` encoded as U+00E0; `\PL` matches `©`	YES	no	YES	YES	5.0	YES	5.0.5	YES	YES	no	no	3	no	no	no	no	no	no	no	no	no	no	no	no
Unicode category	`\p{L}` where L is a Unicode category	Matches a single Unicode code point in the specified Unicode category.	`\p{L}` matches `à` encoded as U+00E0; `\p{S}` matches `©`	YES	YES	YES	YES	5.0	YES	5.0.5	YES	YES	no	no	YES	no	1.9	no	no	no	no	no	no	no	no	YES	YES
Unicode category	`\p{IsL}` where L is a Unicode category	Matches a single Unicode code point in the specified Unicode category.	`\p{IsL}` matches `à` encoded as U+00E0; `\p{IsS}` matches `©`	YES	no	YES	YES	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no
Unicode category	`\p{Category}`	Matches a single Unicode code point in the specified Unicode category.	`\p{Letter}` matches `à` encoded as U+00E0; `\p{Symbol}` matches `©`	YES	no	no	YES	no	no	no	no	no	no	no	YES	no	1.9	no	no	no	no	no	no	no	no	no	no
Unicode category	`\p{IsCategory}`	Matches a single Unicode code point in the specified Unicode category.	`\p{IsLetter}` matches `à` encoded as U+00E0; `\p{IsSymbol}` matches `©`	YES	no	no	YES	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no
Unicode script	`\p{Script}`	Matches a single Unicode code point that is part of the specified Unicode script. Each Unicode code point is part of exactly one script. Scripts never contain unassigned code points.	`\p{Greek}` matches `Ω`	YES	no	no	YES	6.5	YES	5.1.3	YES	YES	no	no	YES	no	1.9	no	no	no	no	no	no	no	no	no	no
Unicode script	`\p{IsScript}`	Matches a single Unicode code point that is part of the specified Unicode script. Each Unicode code point is part of exactly one script. Scripts never contain unassigned code points.	`\p{IsGreek}` matches `Ω`	YES	no	7	YES	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no
Unicode block	`\p{Block}`	Matches a single Unicode code point that is part of the specified Unicode block. Each Unicode code point is part of exactly one block. Blocks may contain unassigned code points.	`\p{Arrows}` matches any of the code points from U+2190 until U+21FF (`←` until `⇿`)	YES	no	no	YES	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no
Unicode block	`\p{InBlock}`	Matches a single Unicode code point that is part of the specified Unicode block. Each Unicode code point is part of exactly one block. Blocks may contain unassigned code points.	`\p{InArrows}` matches any of the code points from U+2190 until U+21FF (`←` until `⇿`)	YES	no	YES	YES	no	no	no	no	no	no	no	2–4	no	2.0	no	no	no	no	no	no	no	no	no	no
Unicode block	`\p{IsBlock}`	Matches a single Unicode code point that is part of the specified Unicode block. Each Unicode code point is part of exactly one block. Blocks may contain unassigned code points.	`\p{IsArrows}` matches any of the code points from U+2190 until U+21FF (`←` until `⇿`)	YES	YES	no	YES	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	YES	YES
Negated Unicode property	`\P{Property}`	Matches a single Unicode code point that does not have the specified property (category, script, or block).	`\P{L}` matches `©`	YES	YES	YES	YES	5.0	YES	5.0.5	YES	YES	no	no	YES	no	1.9	no	ECMA extended egrep awk	no	no	no	no	no	no	YES	YES
Negated Unicode property	`\p{^Property}`	Matches a single Unicode code point that does not have the specified property (category, script, or block).	`\p{^L}` matches `©`	YES	no	no	YES	5.0	YES	5.0.5	YES	YES	no	no	YES	no	1.9	no	no	no	no	no	no	no	no	no	no
Unicode property	`\P{^Property}`	Matches a single Unicode code point that does have the specified property (category, script, or block). Double negative is taken as positive.	`\P{^L}` matches `q`	V2	no	no	YES	5.0	YES	5.0.5	YES	YES	no	no	no	no	1.9	no	no	no	no	no	no	no	no	no	no
Feature	Syntax	Description	Example	JGsoft	.NET	Java	Perl	PCRE	PCRE2	PHP	Delphi	R	JavaScript	VBScript	XRegExp	Python	Ruby	std::regex	Boost	Tcl ARE	POSIX BRE	POSIX ERE	GNU BRE	GNU ERE	Oracle	XML	XPath