使用正则表达式过滤字符串列表,但使用通配符(* 和 ?)

本文关键字:通配符 正则表达式 过滤 字符串 列表 | 更新日期: 2023-09-27 18:34:05

我想知道是否可以修改使用 *? 的通配符表达式将其转换为正则表达式以验证它是否与某些字符串匹配。

换句话说,如果我使用过滤器(不区分大小写(:*bl?e*这些字符串:

["Blue", "Black", "Red", "Light blue", "Light black"]

我想得到:

["Blue, "Light blue"].

有谁知道如何用正则表达式做到这一点?除了使用正则表达式之外,还有更好的方法吗?

添加以更好地澄清我的想法...

还行!。。。和往常一样,我以为我问了一个非常清晰的问题,并通过答案意识到我完全搞砸了我的问题。我想做一个函数,该函数将根据与dos("*"?(相同的规则的表达式(作为我的函数的参数(过滤集合。我认为使用正则表达式是个好主意。我是对的,正则表达式是什么?也。。。我正在使用 C#,我想知道我是否无法访问任何可以直接完成工作的东西?

我还看了(很好的答案(如何在 c# 正则表达式语句中指定通配符(对于任何字符(?

我终于在.net Patterns and Practices library中使用了Glob类。

但作为参考,这是我将 Glob exp 转换为正则表达式的代码:

using System.Text;
using System.Text.RegularExpressions;
namespace HQ.Util.General
{
    public class RegexUtil
    {
        public const string RegExMetaChars = @"*?(){}[]+-^$.|'"; // Do not change the order. Algo depends on it (2 first chars should be dos like wildcard char)
        // ******************************************************************
        /// <summary>
        /// Convert an filter expression with '*' (wildcard any char) and '?' (wildcard on char) into a valid regex and
        /// strip any special regex character
        /// </summary>
        /// <param name="dosLikeExpressionFilter"></param>
        /// <returns></returns>
        public static string DosLikeExpressionFilterToRegExFilterExpression(string dosLikeExpressionFilter)
        {
            StringBuilder regex = new StringBuilder();
            regex.Append("(?i)"); // Case insensitive
            int startIndex = 0;
            int count = dosLikeExpressionFilter.Length;
            while (startIndex < count)
            {
                int metaIndex = RegExMetaChars.IndexOf(dosLikeExpressionFilter[startIndex]);
                if (metaIndex >= 0)
                {
                    if (metaIndex == 0)
                    {
                        regex.Append(".*");
                    }
                    else if (metaIndex == 1)
                    {
                        regex.Append(".");
                    }
                    else
                    {
                        regex.Append("''");
                        regex.Append(dosLikeExpressionFilter[startIndex]);
                    }
                }
                else
                {
                    regex.Append(dosLikeExpressionFilter[startIndex]);
                }
                startIndex++;
            }
            return regex.ToString();
        }
        // ******************************************************************
        /// <summary>
        /// See 'DosLikeExpressionFilterToRegExFilterExpression' description to see what type of Regex is returned
        /// </summary>
        /// <param name="dosLikeExpressionFilter"></param>
        /// <returns></returns>
        public static Regex DosLikeExpressionFilterToRegEx(string dosLikeExpressionFilter)
        {
            return new Regex(DosLikeExpressionFilterToRegExFilterExpression(dosLikeExpressionFilter));
        }
        // ******************************************************************
    }
}

使用正则表达式过滤字符串列表,但使用通配符(* 和 ?)

               Any single character    Any number of characters   Character range
Glob syntax            ?                           *                    [0-9]
Regex syntax           .                           .*                   [0-9]

所以Bl?e(glob(变成了Bl.e(正则表达式(,*Bl?e*变成了.*Bl.e.*

正如 Joey 正确指出的那样,您可以(通常,取决于正则表达式引擎(在正则表达式前面加上(?i)以使其不区分大小写。

但是请注意,许多在通配模式中没有特殊含义的字符在正则表达式中具有特殊含义,因此您不能只是从 glob 到正则表达式进行简单的搜索和替换。

需要解决相同的问题(使用用户输入中的 * 和 ? 通配符模式来过滤任意字符串列表(,但扩展名也可能转义星号或问号以进行搜索。

由于 SQL LIKE 运算符(其中通配符是 % 和 _(通常提供反斜杠进行转义,因此我采取了相同的方法。这使得使用Regex.Escape((并将*替换为.*和?跟。使用正则表达式(请参阅该问题的许多其他答案(。

以下代码概述了为某些通配符模式提供正则表达式的方法。它作为 C# 字符串的扩展方法实现。文档标签和注释应完整解释代码:

using System.Text.RegularExpressions;
public static class MyStringExtensions
{
    /// <summary>Interpret this string as wildcard pattern and create a corresponding regular expression. 
    /// Rules for simple wildcard matching are:
    /// * Matches any character zero or more times.
    /// ? Matches any character exactly one time.
    /// ' Backslash can be used to escape above wildcards (and itself) for an explicit match,
    /// e.g. '* would then match a single star, '? matches a question mark and '' matches a backslash.
    /// If ' is not followed by star, question mark or backslash it also matches a single backslash.
    /// Character set matching (by use of rectangular braces []) is NOT used and regarded in this implementation.
    /// </summary>
    /// <param name="wildPat">This string to be used as wildcard match pattern.</param>
    /// <param name="caseSens">Optional parameter for case sensitive matching - default is case insensitive.</param>
    /// <returns>New instance of a regular expression performing the requested matches.
    /// If input string is null or empty, null is returned.</returns>
    public static Regex CreateWildcardRegEx(this string wildPat, bool caseSens = false)
    {
        if (string.IsNullOrEmpty(wildPat))
           return null;
        // 1. STEP: Escape all special characters used in Regex later to avoid unwanted behavior.
        // Regex.Escape() prepends a backslash to any of following characters: '*+?|{[()^$.# and white space 
        wildPat = Regex.Escape(wildPat);
        // 2. STEP: Replace all three possible occuring escape sequences defined for our 
        // wildcard pattern with temporary sub strings that CANNOT exist after 1. STEP anymore.
        // Prepare some constant strings used below - @ in C# makes literal strings really literal - a backslash needs not be repeated!
        const string esc    = @"''";    // Matches a backslash in a Regex
        const string any    = @"'*";    // Matches a star in a Regex
        const string sgl    = @"'?";    // Matches a question mark in a Regex
        const string tmpEsc = @"||'";   // Instead of doubled | any character Regex.Escape() escapes would do (except ' itself!)
        const string tmpAny =  "||*";   // See comment above
        const string tmpSgl =  "||?";   // See comment above
        // Watch that string.Replace() in C# will NOT stop replacing after the first match but continues instead...
        wildPat = wildPat.Replace(Regex.Escape(esc), tmpEsc)
                         .Replace(Regex.Escape(any), tmpAny)
                         .Replace(Regex.Escape(sgl), tmpSgl);
        // 3. STEP: Substitute our (in 1. STEP escaped) simple wildcards with the Regex counterparts.
        const string regAny = ".*";             // Matches any character zero or more times in a Regex
        wildPat = wildPat.Replace(any, regAny)
                         .Replace(sgl, ".");    // . matches any character in a Regex
        // 4. STEP: Revert the temporary replacements of 2. STEP (in reverse order) and replace with what a Regex really needs to match
        wildPat = wildPat.Replace(tmpSgl, sgl)
                         .Replace(tmpAny, any)
                         .Replace(tmpEsc, esc);
        // 5. STEP: (Optional, for performance) - Simplify multiply occuring * wildcards (cases of ******* or similar)
        // Replace with the regAny string - Use a single Regex.Replace() instead of string.Contains() with string.Replace() in a while loop 
        wildPat = Regex.Replace(wildPat, @"('.'*){2,}", regAny);
        // 6. STEP: Finalize the Regex with begin and end line tags
        return new Regex('^' + wildPat + '$', caseSens ? RegexOptions.None : RegexOptions.IgnoreCase);
        // 2. and 4. STEP would be obsolete if we don't wanted to have the ability to escape * and ? characters for search
    }
}

试试这个正则表达式:

^(['w,'s]*bl'we['w,'s]*) 

它基本上识别任何一组单词和空格,其中包含以"bl"开头并以"e"结尾的单词,中间有一个字符。或

^(['w,'s]*bl('w+)e['w,'s]*)

如果您想识别任何以"bl"开头并以"e"结尾的单词。

另一种方法是对字符串使用一些不精确的匹配算法。不确定这是否正是您要找的。

仅作为参考...我实际上使用该代码:

using System.Text;
using System.Text.RegularExpressions;
namespace HQ.Util.General
{
    /*
        Usage:
           _glob = new FilterGlob(filterExpression, _caseSensitive);            

            public bool IsMatch(string s)
            {
                return _glob.IsMatch(s);
            }
    */

    /// <summary>
    /// Glob stand for: Pattern matching. Supported character are "?" and "*".
    /// </summary>
    public class FilterGlob
    {
        private readonly Regex pattern;
        /// <summary>
        /// Constructs a new <see cref="T:Microsoft.Practices.Unity.InterceptionExtension.Glob"/> instance that matches the given pattern.
        /// 
        /// </summary>
        /// <param name="pattern">The pattern to use. See <see cref="T:Microsoft.Practices.Unity.InterceptionExtension.Glob"/> summary for
        ///             details of the patterns supported.</param><param name="caseSensitive">If true, perform a case sensitive match.
        ///             If false, perform a case insensitive comparison.</param>
        public FilterGlob(string pattern, bool caseSensitive = true)
        {
            this.pattern = FilterGlob.GlobPatternToRegex(pattern, caseSensitive);
        }
        /// <summary>
        /// Checks to see if the given string matches the pattern.
        /// 
        /// </summary>
        /// <param name="s">String to check.</param>
        /// <returns>
        /// True if it matches, false if it doesn't.
        /// </returns>
        public bool IsMatch(string s)
        {
            return this.pattern.IsMatch(s);
        }
        private static Regex GlobPatternToRegex(string pattern, bool caseSensitive)
        {
            StringBuilder stringBuilder = new StringBuilder(pattern);
            string[] strArray = new string[9]
            {
                "''",
                ".",
                "$",
                "^",
                "{",
                "(",
                "|",
                ")",
                "+"
            };
            foreach (string oldValue in strArray)
            {
                stringBuilder.Replace(oldValue, "''" + oldValue);
            }
            stringBuilder.Replace("*", ".*");
            stringBuilder.Replace("?", ".");
            stringBuilder.Insert(0, "^");
            stringBuilder.Append("$");
            RegexOptions options = caseSensitive ? RegexOptions.None : RegexOptions.IgnoreCase;
            return new Regex(((object)stringBuilder).ToString(), options);
        }
    }
}