Regular Expression Unicode Syntax Reference

This reference page explains what the Unicode tokens do when used outside character classes. All of these except \X can also be used inside character classes. Inside a character class, these tokens add the characters that they normally match to the character class.

FeatureSyntaxDescriptionExampleJGsoft .NET Java Perl PCRE PCRE2 PHP Delphi R JavaScript VBScript XRegExp Python Ruby std::regex Boost Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE Oracle XML XPath
Grapheme \X Matches a single Unicode grapheme, whether encoded as a single code point or multiple code points using combining marks. A grapheme most closely resembles the everyday concept of a “character”. \X matches à encoded as U+0061 U+0300, à encoded as U+00E0, ©, etc. YESno9YES5.0YES5.0.5YESYESnononono2.0noECMA
Code point \uFFFF where FFFF are 4 hexadecimal digits Matches a specific Unicode code point. \u00E0 matches à encoded as U+00E0 only. \u00A9 matches © YESYESYESnonononononoYESYESYES3.3
2.4 string
Code point \u{FFFF} where FFFF are 1 to 4 hexadecimal digits Matches a specific Unicode code point. \u{E0} matches à encoded as U+00E0 only. \u{A9} matches © V2nonononono7.0.0 stringnononono3no1.9nononononononononono
Code point \xFFFF where FFFF are 4 hexadecimal digits Matches a specific Unicode code point. \x00E0 matches à encoded as U+00E0 only. \x00A9 matches © nonononononononononononononostringno8.4–8.5nonononononono
Code point \x{FFFF} where FFFF are 1 to 4 hexadecimal digits Matches a specific Unicode code point. \x{E0} matches à encoded as U+00E0 only. \x{A9} matches © YESno7YESYESYESYESYESYESnonononononoECMA
Unicode category \pL where L is a Unicode category Matches a single Unicode code point in the specified Unicode category. \pL matches à encoded as U+00E0; \pS matches © YESnoYESYES5.0YES5.0.5YESYESnono3nononononononononononono
Unicode category \PL where L is a Unicode category Matches a single Unicode code point that is not in the specified Unicode category. \PS matches à encoded as U+00E0; \PL matches © YESnoYESYES5.0YES5.0.5YESYESnono3nononononononononononono
Unicode category \p{L} where L is a Unicode category Matches a single Unicode code point in the specified Unicode category. \p{L} matches à encoded as U+00E0; \p{S} matches © YESYESYESYES5.0YES5.0.5YESYESnonoYESno1.9nonononononononoYESYES
Unicode category \p{IsL} where L is a Unicode category Matches a single Unicode code point in the specified Unicode category. \p{IsL} matches à encoded as U+00E0; \p{IsS} matches © YESnoYESYESnononononononononononononononononononono
Unicode category \p{Category} Matches a single Unicode code point in the specified Unicode category. \p{Letter} matches à encoded as U+00E0; \p{Symbol} matches © YESnonoYESnononononononoYESno1.9nononononononononono
Unicode category \p{IsCategory} Matches a single Unicode code point in the specified Unicode category. \p{IsLetter} matches à encoded as U+00E0; \p{IsSymbol} matches © YESnonoYESnononononononononononononononononononono
Unicode script \p{Script} Matches a single Unicode code point that is part of the specified Unicode script. Each Unicode code point is part of exactly one script. Scripts never contain unassigned code points. \p{Greek} matches Ω YESnonoYES6.5YES5.1.3YESYESnonoYESno1.9nononononononononono
Unicode script \p{IsScript} Matches a single Unicode code point that is part of the specified Unicode script. Each Unicode code point is part of exactly one script. Scripts never contain unassigned code points. \p{IsGreek} matches Ω YESno7YESnononononononononononononononononononono
Unicode block \p{Block} Matches a single Unicode code point that is part of the specified Unicode block. Each Unicode code point is part of exactly one block. Blocks may contain unassigned code points. \p{Arrows} matches any of the code points from U+2190 until U+21FF ( until ) YESnonoYESnononononononononononononononononononono
Unicode block \p{InBlock} Matches a single Unicode code point that is part of the specified Unicode block. Each Unicode code point is part of exactly one block. Blocks may contain unassigned code points. \p{InArrows} matches any of the code points from U+2190 until U+21FF ( until ) YESnoYESYESnonononononono2–4no2.0nononononononononono
Unicode block \p{IsBlock} Matches a single Unicode code point that is part of the specified Unicode block. Each Unicode code point is part of exactly one block. Blocks may contain unassigned code points. \p{IsArrows} matches any of the code points from U+2190 until U+21FF ( until ) YESYESnoYESnonononononononononononononononononoYESYES
Negated Unicode property \P{Property} Matches a single Unicode code point that does not have the specified property (category, script, or block). \P{L} matches © YESYESYESYES5.0YES5.0.5YESYESnonoYESno1.9noECMA
Negated Unicode property \p{^Property} Matches a single Unicode code point that does not have the specified property (category, script, or block). \p{^L} matches © YESnonoYES5.0YES5.0.5YESYESnonoYESno1.9nononononononononono
Unicode property \P{^Property} Matches a single Unicode code point that does have the specified property (category, script, or block). Double negative is taken as positive. \P{^L} matches q V2nonoYES5.0YES5.0.5YESYESnononono1.9nononononononononono
