If the string matches perfectly, how do you extract the month field? You simply put parentheses around the month field, creating a group, and later retrieve the value using the ORO API (discussed in a following section).
The new "\s" notation is the space notation and matches all blank spaces, including tabs. Matches: All dates with the format of Month DD, YYYY The regular expression to match the string would be like the one in Figure 5: Figure 5. The typical birthdate is in the following format: June 26, 1951. Say you're trying to extract the birth month from a person's birthdate. Matches: All words except those that start with the letter X The parentheses and space notations For example, the expression in Figure 4 matches all words If used in brackets, "^" indicates the character you don't want to match. The "^" notation is also called the NOT notation. Matches: Typical US car plate numbers, such as 8836KV The NOT notation Figure 3 shows the complete regular expression: Figure 3. The regular expression first comprises the numeric part, "". One format for US car plate numbers consists of four numeric characters followed by two letters. Matches: All social security numbers of the forms 123-12-1221234 Figure 2 shows that regular expression: Figure 2. If, in your search, you wish to make the hyphen optional - if, say, you consider both 999-99-9999999 acceptable formats - you can use the "?" quantifier notation. Matches: All social security numbers of the form 123-12-1234 As a result, you must escape the "-" character with a forward slash ("\") when matching the literal hyphens in a social security number. In regular expressions, the hyphen ("-") notation has special meaning it indicates a range that would match any number from 0 to 9. The regular expression you would use to match this is shown in Figure 1. The format for US social security numbers is 999-99-9999. Let's say you want to search for a social security number in a text file. You can also use parentheses for groupings (more on that later):
You cannot use the bracket notation here because it will only match a single character. To match "toon", use the regular expression "t(a|e|i|o|oo)n". If you want to match "toon" in addition to all the words matched in the previous section, you can use the "|" notation, which is basically an OR operator. Matches: tan, Ten, tin, ton The OR operator "Toon" would not match because you can only match a single character within the bracket notation: Thus, "tn" would just match "tan", "Ten", "tin", and "ton". To solve the problem of the period's indiscriminate matches, you can specify characters you consider meaningful with the bracket ("") expression, so that only those characters would match the regular expression. Matches: tan, Ten, tin, ton, t n, t#n, tpn, etc. This is because the period character matches everything, including the space, the tab character, and even line breaks: The regular expression would then be "t.n" and would match "tan", "Ten", "tin", and "ton" it would also match "t#n", "tpn", and even "t n", as well as many other nonsensical words. To form such a regular expression, you would use a wildcard notation - the period (.) character. Imagine also that you have an English dictionary and will search through its entire contents for a match using a regular expression. Imagine you are playing Scrabble and need a three-letter word starting with the letter "t" and ending with the letter "n". Matches: cat, catalog, Catherine, sophisticated The period notation If your search is case-insensitive, the words "catalog", "Catherine", or "sophisticated" would also match: Suppose you want to search for a string with the word "cat" in it your regular expression would simply be "cat".
In this article, I'll first give you a short primer on regular expressions, and then I'll show you how to use regular expressions with the open source Jakarta-ORO API.
What about Java? At the time of this writing, a Java Specification Request that includes a regular expression library for text processing has been approved you can expect to see it in a future version of the JDK.īut what if you need a regular expression library now? Luckily, you can download the open source Jakarta ORO library from. Many languages, including Perl, PHP, Python, JavaScript, and JScript, now support regular expressions for text processing, and some text editors use regular expressions for powerful search-and-replace functionality. If you're unfamiliar with the term, a regular expression is simply a string of characters that defines a pattern used to search for a matching string. If you've programmed in Perl or any other language with built-in regular-expression capabilities, then you probably know how much easier regular expressions make text processing and pattern matching.