InternetUnicodeHTMLCSSScalable Vector Graphics (SVG)Extensible Markup Language (xml) ASP.Net TOCASP.NetMiscellaneous FeatureASP.NET ScriptingASP.NET Run-time Object System.Text.RegularExpressionsRegular Expression EngineRegular Expression ResultRegular Expression PatternRegex CompilationRegexOptions EnumRegexMatchTimeoutException Class Regular Expression BehaviorBacktrackingCompilation and ReuseThread SafetyRegular Expression Object Model Draft for Information Only
ContentRegular Expression Best Practices
Regular Expression Best PracticesThe regular expression engine in .NET is a text searching tool that processes text based on pattern matches rather than on comparing and matching literal text. However, in some cases, the regular expression engine can appear to be very slow due to the excessive backtracking processes caused by the regular expression pattern design. Input SourceIn general, there are two types of input string, constrained or unconstrained. Constrained input is text that originates from a known or reliable source and follows a predefined format. Unconstrained input is text that originates from an unreliable source, such as a web user, and may not follow a predefined or expected format. Regular expression patterns are typically written to match valid input. That is, developers examine the text that they want to match and then write a regular expression pattern that matches it. Developers then determine whether this pattern requires correction or further elaboration by testing it with multiple valid input items. When the pattern matches all presumed valid inputs, it is declared to be production-ready and can be included in a released application. This makes a regular expression pattern suitable for matching constrained input. However, it does not make it suitable for matching unconstrained input. To match unconstrained input, a regular expression must be able to efficiently handle three kinds of text:
The last text type is especially problematic for a regular expression that has been written to handle constrained input. If that regular expression also relies on extensive backtracking, the regular expression engine can spend an inordinate amount of time (in some cases, many hours or days) processing seemingly innocuous text. Because this regular expression was developed solely by considering the format of input to be matched, it fails to take account of input that does not match the pattern. This, in turn, can allow unconstrained input that nearly matches the regular expression pattern to significantly degrade performance. To solve this problem, you can do the following:
Handle Object Instantiation AppropriatelyAt the heart of .NET’s regular expression object model is the System.Text.RegularExpressions.Regex class, which represents the regular expression engine. Often, the single greatest factor that affects regular expression performance is the way in which the Regex engine is used. Defining a regular expression involves tightly coupling the regular expression engine with a regular expression pattern. That coupling process, whether it involves instantiating a Regex object by passing its constructor a regular expression pattern or calling a static method by passing it the regular expression pattern along with the string to be analyzed, is by necessity an expensive one. For a more detailed discussion of the performance implications of using interpreted and compiled regular expressions, see Optimizing Regular Expression Performance, Part II: Taking Charge of Backtracking in the BCL Team blog. You can couple the regular expression engine with a particular regular expression pattern and then use the engine to match text in several ways:
The particular way in which you call regular expression matching methods can have a significant impact on your application. The following sections discuss when to use static method calls, interpreted regular expressions, and compiled regular expressions to improve your application's performance. The form of the method call (static, interpreted, compiled) affects performance if the same regular expression is used repeatedly in method calls, or if an application makes extensive use of regular expression objects. Static Regular ExpressionsStatic regular expression methods are recommended as an alternative to repeatedly instantiating a regular expression object with the same regular expression. Unlike regular expression patterns used by regular expression objects, either the operation codes or the compiled Microsoft intermediate language (MSIL) from patterns used in instance method calls is cached internally by the regular expression engine. A very inefficient implementation of the IsValidCurrency method is shown in the following example. Note that each method call reinstantiates a Regex object with the same pattern. This, in turn, means that the regular expression pattern must be recompiled each time the method is called. You should replace this inefficient code with a call to the static Regex.IsMatch(String, String) method. This eliminates the need to instantiate a Regex object each time you want to call a pattern-matching method, and enables the regular expression engine to retrieve a compiled version of the regular expression from its cache. By default, the last 15 most recently used static regular expression patterns are cached. For applications that require a larger number of cached static regular expressions, the size of the cache can be adjusted by setting the Regex.CacheSize property. Interpreted vs. Compiled Regular ExpressionsRegular expression patterns that are not bound to the regular expression engine through the specification of the Compiled option are interpreted. When a regular expression object is instantiated, the regular expression engine converts the regular expression to a set of operation codes. When an instance method is called, the operation codes are converted to MSIL and executed by the JIT compiler. Similarly, when a static regular expression method is called and the regular expression cannot be found in the cache, the regular expression engine converts the regular expression to a set of operation codes and stores them in the cache. It then converts these operation codes to MSIL so that the JIT compiler can execute them. Interpreted regular expressions reduce startup time at the cost of slower execution time. Because of this, they are best used when the regular expression is used in a small number of method calls, or if the exact number of calls to regular expression methods is unknown but is expected to be small. As the number of method calls increases, the performance gain from reduced startup time is outstripped by the slower execution speed. Regular expression patterns that are bound to the regular expression engine through the specification of the Compiled option are compiled. This means that, when a regular expression object is instantiated, or when a static regular expression method is called and the regular expression cannot be found in the cache, the regular expression engine converts the regular expression to an intermediary set of operation codes, which it then converts to MSIL. When a method is called, the JIT compiler executes the MSIL. In contrast to interpreted regular expressions, compiled regular expressions increase startup time but execute individual pattern-matching methods faster. As a result, the performance benefit that results from compiling the regular expression increases in proportion to the number of regular expression methods called. To summarize, we recommend that you use interpreted regular expressions when you call regular expression methods with a specific regular expression relatively infrequently. You should use compiled regular expressions when you call regular expression methods with a specific regular expression relatively frequently. The exact threshold at which the slower execution speeds of interpreted regular expressions outweigh gains from their reduced startup time, or the threshold at which the slower startup times of compiled regular expressions outweigh gains from their faster execution speeds, is difficult to determine. It depends on a variety of factors, including the complexity of the regular expression and the specific data that it processes. To determine whether interpreted or compiled regular expressions offer the best performance for your particular application scenario, you can use the Stopwatch class to compare their execution times. Regular Expressions: Compiled to an Assembly.NET also enables you to create an assembly that contains compiled regular expressions. This moves the performance hit of regular expression compilation from run time to design time. However, it also involves some additional work: You must define the regular expressions in advance and compile them to an assembly. The compiler can then reference this assembly when compiling source code that uses the assembly’s regular expressions. Each compiled regular expression in the assembly is represented by a class that derives from Regex. To compile regular expressions to an assembly, you call the Regex.CompileToAssembly(RegexCompilationInfo[], AssemblyName) method and pass it an array of RegexCompilationInfo objects that represent the regular expressions to be compiled, and an AssemblyName object that contains information about the assembly to be created. We recommend that you compile regular expressions to an assembly in the following situations:
If you are using compiled regular expressions to optimize performance, you should not use reflection to create the assembly, load the regular expression engine, and execute its pattern-matching methods. This requires that you avoid building regular expression patterns dynamically, and that you specify any pattern-matching options (such as case-insensitive pattern matching) at the time the assembly is created. It also requires that you separate the code that creates the assembly from the code that uses the regular expression. Take Charge of BacktrackingOrdinarily, the regular expression engine uses linear progression to move through an input string and compare it to a regular expression pattern. However, when indeterminate quantifiers such as *, +, and ? are used in a regular expression pattern, the regular expression engine may give up a portion of successful partial matches and return to a previously saved state in order to search for a successful match for the entire pattern. This process is known as backtracking. For more information on backtracking, see Details of Regular Expression Behavior and Backtracking. For a detailed discussion of backtracking, see Optimizing Regular Expression Performance, Part II: Taking Charge of Backtracking in the BCL Team blog. Support for backtracking gives regular expressions power and flexibility. It also places the responsibility for controlling the operation of the regular expression engine in the hands of regular expression developers. Because developers are often not aware of this responsibility, their misuse of backtracking or reliance on excessive backtracking often plays the most significant role in degrading regular expression performance. In a worst-case scenario, execution time can double for each additional character in the input string. In fact, by using backtracking excessively, it is easy to create the programmatic equivalent of an endless loop if input nearly matches the regular expression pattern; the regular expression engine may take hours or even days to process a relatively short input string. Often, applications pay a performance penalty for using backtracking despite the fact that backtracking is not essential for a match. Because a word boundary is not the same as, or a subset of, a word character, there is no possibility that the regular expression engine will cross a word boundary when matching word characters. This means that for this regular expression, backtracking can never contribute to the overall success of any match -- it can only degrade performance, because the regular expression engine is forced to save its state for each successful preliminary match of a word character. If you determine that backtracking is not necessary, you can disable it by using the (?>subexpression) language element. The following example parses an input string by using two regular expressions. The first, \b\p{Lu}\w*\b, relies on backtracking. The second, \b\p{Lu}(?>\w*)\b, disables backtracking. As the output from the example shows, they both produce the same result. In many cases, backtracking is essential for matching a regular expression pattern to input text. However, excessive backtracking can severely degrade performance and create the impression that an application has stopped responding. In particular, this happens when quantifiers are nested and the text that matches the outer subexpression is a subset of the text that matches the inner subexpression. In addition to avoiding excessive backtracking, you should use the timeout feature to ensure that excessive backtracking does not severely degrade regular expression performance. For more information, see the Use Time-out Values section. In these cases, you can optimize regular expression performance by removing the nested quantifiers and replacing the outer subexpression with a zero-width lookahead or lookbehind assertion. Lookahead and lookbehind assertions are anchors; they do not move the pointer in the input string, but instead look ahead or behind to check whether a specified condition is met. The regular expression language in .NET includes the following language elements that you can use to eliminate nested quantifiers. For more information, see Grouping Constructs.
Use Time-out ValuesIf your regular expressions processes input that nearly matches the regular expression pattern, it can often rely on excessive backtracking, which impacts its performance significantly. In addition to carefully considering your use of backtracking and testing the regular expression against near-matching input, you should always set a time-out value to ensure that the impact of excessive backtracking, if it occurs, is minimized. The regular expression time-out interval defines the period of time that the regular expression engine will look for a single match before it times out. The default time-out interval is Regex.InfiniteMatchTimeout, which means that the regular expression will not time out. You can override this value and define a time-out interval as follows:
If you have defined a time-out interval and a match is not found at the end of that interval, the regular expression method throws a RegexMatchTimeoutException exception. In your exception handler, you can choose to retry the match with a longer time-out interval, abandon the match attempt and assume that there is no match, or abandon the match attempt and log the exception information for future analysis. Capture Only When NecessaryRegular expressions in .NET support a number of grouping constructs, which let you group a regular expression pattern into one or more subexpressions. The most commonly used grouping constructs in .NET regular expression language are (subexpression), which defines a numbered capturing group, and (?<name>subexpression), which defines a named capturing group. Grouping constructs are essential for creating backreferences and for defining a subexpression to which a quantifier is applied. However, the use of these language elements has a cost. They cause the GroupCollection object returned by the Match.Groups property to be populated with the most recent unnamed or named captures, and if a single grouping construct has captured multiple substrings in the input string, they also populate the CaptureCollection object returned by the Group.Captures property of a particular capturing group with multiple Capture objects. Often, grouping constructs are used in a regular expression only so that quantifiers can be applied to them, and the groups captured by these subexpressions are not subsequently used. When you use subexpressions only to apply quantifiers to them, and you are not interested in the captured text, you should disable group captures. For example, the (?:subexpression) language element prevents the group to which it applies from capturing matched substrings. You can disable captures in one of the following ways:
Scanning of Valid Email FormatExamples of Scanning of Valid Email Format.
ASP.NET Code Input:
HTML Web Page Embedded Output: See also
Source/Referencee©sideway ID: 190800010 Last Updated: 8/10/2019 Revision: 0 Ref: ![]() References
![]() Latest Updated Links
![]() ![]() ![]() ![]() ![]() |
![]() Home 5 Business Management HBR 3 Information Recreation Hobbies 8 Culture Chinese 1097 English 339 Travel 18 Reference 79 Computer Hardware 254 Software Application 213 Digitization 37 Latex 52 Manim 205 KB 1 Numeric 19 Programming Web 289 Unicode 504 HTML 66 CSS 65 SVG 46 ASP.NET 270 OS 431 DeskTop 7 Python 72 Knowledge Mathematics Formulas 8 Set 1 Logic 1 Algebra 84 Number Theory 206 Trigonometry 31 Geometry 34 Calculus 67 Engineering Tables 8 Mechanical Rigid Bodies Statics 92 Dynamics 37 Fluid 5 Control Acoustics 19 Natural Sciences Matter 1 Electric 27 Biology 1 |
Copyright © 2000-2025 Sideway . All rights reserved Disclaimers last modified on 06 September 2019