InternetUnicodeHTMLCSSScalable Vector Graphics (SVG)Extensible Markup Language (xml) ASP.Net TOCASP.NetMiscellaneous FeatureASP.NET ScriptingASP.NET Run-time Object System.Text.RegularExpressionsRegular Expression EngineRegular Expression ResultRegular Expression PatternRegex CompilationRegexOptions EnumRegexMatchTimeoutException Class Regular Expression Behavior Draft for Information Only
ContentRegular Expression Backtracking
Regular Expression BacktrackingBacktracking occurs when a regular expression pattern contains optional quantifiers or alternation constructs, and the regular expression engine returns to a previous saved state to continue its search for a match. Backtracking is central to the power of regular expressions; it makes it possible for expressions to be powerful and flexible, and to match very complex patterns. At the same time, this power comes at a cost. Backtracking is often the single most important factor that affects the performance of the regular expression engine. Fortunately, the developer has control over the behavior of the regular expression engine and how it uses backtracking. This topic explains how backtracking works and how it can be controlled. In general, a Nondeterministic Finite Automaton (NFA) engine like .NET regular expression engine places the responsibility for crafting efficient, fast regular expressions on the developer. This topic contains the following sections: Linear Comparison Without BacktrackingIf a regular expression pattern has no optional quantifiers or alternation constructs, the regular expression engine executes in linear time. That is, after the regular expression engine matches the first language element in the pattern with text in the input string, it tries to match the next language element in the pattern with the next character or group of characters in the input string. This continues until the match either succeeds or fails. In either case, the regular expression engine advances by one character at a time in the input string. If a regular expression pattern includes no optional quantifiers or alternation constructs, the maximum number of comparisons required to match the regular expression pattern with the input string is roughly equivalent to the number of characters in the input string.In other words, the regular expression engine runs in near-linear time if it contains no optional quantifiers or alternation constructs. Backtracking with Optional Quantifiers or Alternation ConstructsWhen a regular expression includes optional quantifiers or alternation constructs, the evaluation of the input string is no longer linear. Pattern matching with an NFA engine is driven by the language elements in the regular expression and not by the characters to be matched in the input string. Therefore, the regular expression engine tries to fully match optional or alternative subexpressions. When it advances to the next language element in the subexpression and the match is unsuccessful, the regular expression engine can abandon a portion of its successful match and return to an earlier saved state in the interest of matching the regular expression as a whole with the input string. This process of returning to a previous saved state to find a match is known as backtracking. Generally, if a regular expression pattern has a single alternation construct or a single optional quantifier, the number of comparison operations required to match the pattern is more than twice the number of characters in the input string. Backtracking with Nested Optional QuantifiersThe number of comparison operations required to match a regular expression pattern can increase exponentially if the pattern includes a large number of alternation constructs, if it includes nested alternation constructs, or, most commonly, if it includes nested optional quantifiers. the regular expression engine took about twice as long to find that an input string did not match the pattern as it did to identify a matching string. This is because an unsuccessful match always represents a worst-case scenario. The regular expression engine must use the regular expression to follow all possible paths through the data before it can conclude that the match is unsuccessful, and the nested parentheses create many additional paths through the data. Comparison of the input string with the regular expression continues in this way until the regular expression engine has tried all possible combinations of matches, and then concludes that there is no match. Because of the nested quantifiers, this comparison is an O(2n) or an exponential operation, where n is the number of characters in the input string. This means that in the worst case, an input string of 30 characters requires approximately 1,073,741,824 comparisons, and an input string of 40 characters requires approximately 1,099,511,627,776 comparisons. If you use strings of these or even greater lengths, regular expression methods can take an extremely long time to complete when they process input that does not match the regular expression pattern. Controlling BacktrackingBacktracking lets you create powerful, flexible regular expressions. However, as the previous section showed, these benefits may be coupled with unacceptably poor performance. To prevent excessive backtracking, you should define a time-out interval when you instantiate a Regex object or call a static regular expression matching method. This is discussed in the next section. In addition, .NET supports three regular expression language elements that limit or suppress backtracking and that support complex regular expressions with little or no performance penalty: nonbacktracking subexpressions, lookbehind assertions, and lookahead assertions. Defining a Time-out IntervalStarting with the .NET Framework 4.5, you can set a time-out value that represents the longest interval the regular expression engine will search for a single match before it abandons the attempt and throws a RegexMatchTimeoutException exception. You specify the time-out interval by supplying a TimeSpan value to the Regex.Regex(String, RegexOptions, TimeSpan) constructor for instance regular expressions. In addition, each static pattern matching method has an overload with a TimeSpan parameter that allows you to specify a time-out value. By default, the time-out interval is set to Regex.InfiniteMatchTimeout and the regular expression engine does not time out. We recommend that you always set a time-out interval if your regular expression relies on backtracking. A RegexMatchTimeoutException exception indicates that the regular expression engine was unable to find a match within the specified time-out interval but does not indicate why the exception was thrown. The reason might be excessive backtracking, but it is also possible that the time-out interval was set too low given the system load at the time the exception was thrown. When you handle the exception, you can choose to abandon further matches with the input string or increase the time-out interval and retry the matching operation. Nonbacktracking SubexpressionThe (?> subexpression) language element suppresses backtracking in a subexpression. It is useful for preventing the performance problems associated with failed matches. Lookbehind Assertions.NET includes two language elements, (?<=subexpression) and (?<!subexpression), that match the previous character or characters in the input string. Both language elements are zero-width assertions; that is, they determine whether the character or characters that immediately precede the current character can be matched by subexpression, without advancing or backtracking. (?<= subexpression ) is a positive lookbehind assertion; that is, the character or characters before the current position must match subexpression. (?<!subexpression) is a negative lookbehind assertion; that is, the character or characters before the current position must not match subexpression. Both positive and negative lookbehind assertions are most useful when subexpression is a subset of the previous subexpression. Lookahead Assertions.NET includes two language elements, (?=subexpression) and (?!subexpression), that match the next character or characters in the input string. Both language elements are zero-width assertions; that is, they determine whether the character or characters that immediately follow the current character can be matched by subexpression, without advancing or backtracking. (?= subexpression ) is a positive lookahead assertion; that is, the character or characters after the current position must match subexpression. (?!subexpression) is a negative lookahead assertion; that is, the character or characters after the current position must not match subexpression. Both positive and negative lookahead assertions are most useful when subexpression is a subset of the next subexpression.
Scanning of Valid Email FormatExamples of Scanning of Valid Email Format.
ASP.NET Code Input:
HTML Web Page Embedded Output: See also
Source/Referencee©sideway ID: 190800006 Last Updated: 8/6/2019 Revision: 0 Ref: ![]() References
![]() Latest Updated Links
![]() ![]() ![]() ![]() ![]() |
![]() Home 5 Business Management HBR 3 Information Recreation Hobbies 8 Culture Chinese 1097 English 339 Travel 18 Reference 79 Computer Hardware 254 Software Application 213 Digitization 37 Latex 52 Manim 205 KB 1 Numeric 19 Programming Web 289 Unicode 504 HTML 66 CSS 65 SVG 46 ASP.NET 270 OS 431 DeskTop 7 Python 72 Knowledge Mathematics Formulas 8 Set 1 Logic 1 Algebra 84 Number Theory 206 Trigonometry 31 Geometry 34 Calculus 67 Engineering Tables 8 Mechanical Rigid Bodies Statics 92 Dynamics 37 Fluid 5 Control Acoustics 19 Natural Sciences Matter 1 Electric 27 Biology 1 |
Copyright © 2000-2025 Sideway . All rights reserved Disclaimers last modified on 06 September 2019