Pitfalls |
Catastrophic Backtracking |
Too Many Repetitions |
Denial of Service |
Making Everything Optional |
Repeated Capturing Group |
Mixing Unicode & 8-bit |
Since regular expressions deal with text rather than with numbers, matching a number in a given range takes a little extra care. You can’t just write [0-255] to match a number between 0 and 255. Though a valid regex, it matches something entirely different. [0-255] is a character class with three elements: the character range 0-2, the character 5 and the character 5 (again). This character class matches a single digit 0, 1, 2 or 5, just like [0125].
Since regular expressions work with text, a regular expression engine treats 0 as a single character, and 255 as three characters. To match all characters from 0 to 255, we’ll need a regex that matches between one and three characters.
The regex [0-9] matches single-digit numbers 0 to 9. [1-9][0-9] matches double-digit numbers 10 to 99. That’s the easy part.
Matching the three-digit numbers is a little more complicated, since we need to exclude numbers 256 through 999. 1[0-9][0-9] takes care of 100 to 199. 2[0-4][0-9] matches 200 through 249. Finally, 25[0-5] adds 250 till 255.
As you can see, you need to split up the numeric range in ranges with the same number of digits, and each of those ranges that allow the same variation for each digit. In the 3-digit range in our example, numbers starting with 1 allow all 10 digits for the following two digits, while numbers starting with 2 restrict the digits that are allowed to follow.
Putting this all together using alternation we get: [0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5]. This matches the numbers we want, with one caveat: regular expression searches usually allow partial matches, so our regex would match 123 in 12345. There are two solutions to this.
If you’re searching for these numbers in a larger document or input string, use word boundaries to require a non-word character (or no character at all) to precede and to follow any valid match. The regex then becomes \b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b. Since the alternation operator has the lowest precedence of all, the parentheses are required to group the alternatives together. This way the regex engine will try to match the first word boundary, then try all the alternatives, and then try to match the second word boundary after the numbers it matched. Regular expression engines consider all alphanumeric characters, as well as the underscore, as word characters.
If you’re using the regular expression to validate input, you’ll probably want to check that the entire input consists of a valid number. To do this, replace the word boundaries with anchors to match the start and end of the string: ^([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])$.
Here are a few more common ranges that you may want to match:
| Quick Start | Tutorial | Tools & Languages | Examples | Reference | Book Reviews |
| Regular Expressions Examples | Numeric Ranges | Floating Point Numbers | Email Addresses | IP Addresses | Valid Dates | Numeric Dates to Text | Credit Card Numbers | Matching Complete Lines | Deleting Duplicate Lines | Programming | Two Near Words |
| Catastrophic Backtracking | Too Many Repetitions | Denial of Service | Making Everything Optional | Repeated Capturing Group | Mixing Unicode & 8-bit |
Page URL: https://www.regular-expressions.info/numericranges.html
Page last updated: 2 September 2021
Site last updated: 06 November 2024
Copyright © 2003-2024 Jan Goyvaerts. All rights reserved.