var re = new
RegExp(”pattern”[,”switch”]?).
//Here pattern - regular expression, switch – unrequired search options.
Object initializers, eg, var re = /ab+c/, are to be applied in cases, when the value of regular expression stays constant in script service. Such regular expressions compile during script loading, so they execute faster.
Builder call-up, eg, var re = new RegExp(”ab+c”), is to be used in cases when value of variable is going to change. If you are intended to use regular expression several times, then there has sense to compile it by “compile” method for much more effective samples search.
When creating regular expression it is necessary to take into account, that putting it into quotes implies necessity to use escape-consistency, as well as in any other string constant.
For example, two following expressions are equivalent:
var re = /\w+/g;
var re = new RegExp(”\w+”, “g”); // In the line “” changes to “”
Note: regular expression cannot be empty: two symbols // at a run fix the comment beginning. So in order to create empty regular expression use /.?/.
Regular expressions are used by methods ‘exec’ and ‘test’ of RegExp object and by methods ‘match’, ‘replace’, ’search’ and ’split’ of String object. If we need just to check whether assined string contains substring, relevant to the sample, the following methods are used:a href=”/cgi-bin/print.pl?id=118-4.html#505″ mce_href=”/cgi-bin/print.pl?id=118-4.html#505″ class=intext>test or ’search’. But if we need to extract subchain (or subchains) relevant to the sample, we need to use methods ‘exec’ or ‘match’. ‘Replace’ method provides search of assigned subchain and its change into another chain, and ’split’ method allows to break chain into several subchains, being based on regular expression or common text chain. Detailed information about using of regular expressions is displayed in descriptions of corresponding methods.
Syntax of regular expressions
Regular expression may consist of common symbols; in this case it will correspond to assigned combination of symbols in chain. For example, expression /com/ corresponds to marked subchains in the following chains: «clum», «sweet-tooth», «fleet headquater». Though, flexibility and power of regular expressions gives possibility to use special symbols, which are listed in the following table.
Special symbols in regular expressions:
For the symbols, which are usually interpreted by literal, means that next symbol is special. For example, /n/corresponds to letter n, and /\n/ corresponds to symbol of line advance.
For the symbols, which are usually interpreted as special, means that symbol has to be interpreted by literal. For example, /^/ means the beginning of the chain, and /\^/ just corresponds to symbol ^. /\/ corresponds to reverse slash.
^ - Corresponds to the chain beginning.
$ - Corresponds to the chain end.
* - Corresponds to iterate of previous symbol zero or more times.
+ - Corresponds to iterate of previous symbol one or more times.
? - Corresponds to iterate of previous symbol zero or one times.
. - Corresponds to any symbol, besides symbol of new chain.
(pattern) - Corresponds to chain pattern and memorizes found correspondence.
(?:pattern) - Corresponds to chain ‘pattern’, but does not memorize found correspondence. Is used for grouping of sample’s parts, e.g. /ca(?:t|ttle)/ -
is a short writing of the expression /cat|cattle/.
(?=pattern) - Corresponding with “foreseeing”, happens under matching of line pattern without memorizing of found matching. For example /Windows (?=95|98|NT|2000)/ corresponds to “Windows ” in chain “Windows 98″, but mismatches in chain “Windows 3.1″. After correlation search continues from the position coming next after found match without foreseeing.
(?!pattern) - Corresponding with «foreseeing», happens by mismatching of chain ‘pattern’ without memorizing of found correspondence. For example, /Windows (?!95|98|NT|2000)/ corresponds to “Windows ” in chain “Windows 3.1″, but mismatches in chain “Windows 98″. After correlation search continues from the position coming next after found mismatch, without foreseeing.
x|y - Corresponds to x or y.
{n} - n - nonnegative number. Corresponds equally to n occurrences of previous symbol.
{n,} - n - nonnegative number. Corresponds to n or more occurrences of previous symbol. /x{1,}/ is equivalent to /x+/. /x{0,}/ is equivalent to
/x*/.
{n,m} - n and m – nonnegative number. Corresponds not less than to n but not more than to m occurrences of previous symbol. /x{0,1}/ is equivalent to /x?/.
[xyz] - Corresponds to any symbol put into square brackets.
[^xyz] - besides ones put into square brackets.
[a-z] - Corresponds to any symbol in the indicated extend.
[^a-z] - Corresponds to any symbol, besides those in the indicated extend.
\b - Corresponds to word bounder that is position between word and space or line advance.
\B - Corresponds to any position besides word bounder.
\ñX - Corresponds to symbol Ctrl+X. E.g., /\cI/ is equivalent to /\t/.
\d - Corresponds to digit. Equivalent to [0-9].
\D - Corresponds to non-numerical character. Equivalent to [^0-9].
\f - Corresponds to format transfer symbol (FF).
\n - Corresponds to line feed symbol (LF).
\r - Corresponds to carriage return symbol (CR).
\s - Corresponds to space symbol. Equivalent to /[ \f\n\r\t\v]/.
\S - Corresponds to any non-space symbol. Equivalent to /[^ \f\n\r\t\v]/.
\t - Corresponds to tabulation symbol (HT).
\v - Corresponds to vertical tabulation symbol (VT).
\w - Corresponds to Latin letter, digit or flatworm. Equivalent to /[A-Za-z0-9_] /.
\W - Corresponds to any symbol, besides Latin letter, digit or flatworm. Equivalent to /[^A-Za-z0-9_] /.
\n - n - positive number. Corresponds to n memorized chain. Is calculated by counting left round brackets. It is equivalent to \0n, if quantity of left round brackets is less than n.
\0n - n - octal number, less than 377. Corresponds to symbol with octal code n. E.g., /\011/ is equivalent to /\t/.
\xn - n - hex number, consisting of two digits. Corresponds to symbol with hex code n. E.g., /\x31/ corresponds to /1/.
\un - n - hex number, consists of four digits. Corresponds to symbol Unicode with the hex number n. For example, /\u00A9/ is equivalent to /©/.
Regular expressions are calculated the same way as other JavaScript expressions, that is to say with account of operations priority: the operations with higher priority are performed first. If the operations have equal priority, they are performed from left to right. In the following table the operations of regular expressions are listed in descending order of their priority. The operations located in one chain have equal priority.
Operations:
\
() (?:) (?=) (?!) []
* + ? . {n} {n,} {n,m}
^ $ \metacharacter
|
Search options:
While creating of regular expression we can indicate additional search options:
* i (ignore case). Not to recognize lowercase and capital letters.
* g (global search). All sample occurrences global search.
* m (multiline). Multi-line search.
* Any combinations of these three options, e.g. ig or gim.
Let us give few examples. Regular expressions recognize lowercase and capital letters. So the following script
var s = “Learning JavaScript language”;
var re = /JAVA/;
var result = re.test(s) ? “” ” : “” not “;
document.write(”Chain “” + s + result + “corresponds to sample ” + re);
displays the following text on the screen:
Line “Learning JavaScript language” mismatches /JAVA/ sample
Now if we change the second line of the example to var re = /JAVA/i;, the following text appears on the screen:
Line “Learning JavaScript language” corresponds to /JAVA/i sample
Now let’s analyze global search option. Usually it is used by ‘replace’ method in sample search and changing found subchain to new one. The matter is that on default this method changes only fir found subchain and retrieves the received result. Let’s examine the following script:
var s = “We write script on JavaScript, ” +
“but JavaScript is not a unique script language.”;
var re = /JavaScript/;
document.write(s.replace(re, “VBScript”));
It displays the text, which for certain mismatches with the desired result:
We write scripts on VBScript, but JavaScript is not a unique script language.
In order to change all the occurrences of “JavaScript” chain to
“VBScript”, we need to change the meaning of regular expression to var
re = /JavaScript/g;. The resulting line looks as follows:
We write scripts in VBScript, but VBScript is not a unique script language.
At last, the multi-line search option allows making comparison with the line expression sample, connected by break line symbols. On default comparison with the sample stops, if break line symbol is found. This option overcomes specified limitation and provides sample search throughout all the initial line. It also influences some special symbols interpretation in regular expressions, namely:
* Usually symbol ^ is associated only with the first line element. If multi-line search option is included, it is compared with any line element, staying after break line symbol.
* Usually symbol $ is associated only with the last line element. If multi-line search option is included, it is compared with any line element, which is break line symbol.
Memorizing of found subchains
If the part of regular expression is put in round brackets, corresponding subchain is memorized for further use. For the access to memorized subchains use the attributes $1, :, $9 of RegExp object or elements of array variable, retrieved by exec and match methods. In the last case the quantity of found and memorized lines is unlimited.
For example, the following script uses replace method for derangement of the words in line. Attributes $1 and $2 are used to change found text.
var re = /(\w+)\s(\w+)/;
var str = “Regular Expressions”;
document.write(str.replace(re, “$2, $1″))
This script will display the following text on the screen:
Regular, Expressions