Regex查找字符串中出现的所有模式
本文关键字:模式 查找 字符串 Regex | 更新日期: 2023-09-27 18:21:48
我在查找字符串中出现的所有模式时遇到问题。
检查此字符串:
string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
我想返回2次出现(以便稍后解码):
=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?=
和
=?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?="
使用以下regex代码,它只返回1次:完整字符串。
var charSetOccurences = new Regex(@"='?.*'?B'?.*'?=", RegexOptions.IgnoreCase);
var charSetMatches = charSetOccurences.Matches(input);
foreach (Match match in charSetMatches)
{
charSet = match.Groups[0].Value.Replace("=?", "").Replace("?B?", "").Replace("?b?", "");
}
你知道我缺了什么吗?
当regexp
解析器看到.*
字符序列时,它会将所有字符匹配到字符串的末尾,然后逐字符返回(贪婪匹配)。因此,为了避免这个问题,您可以使用非贪婪匹配或显式定义可以出现在字符串中的字符。
"='?[a-zA-Z0-9?=-]*'?B'?[a-zA-Z0-9?=-]*'?="
非正则表达式方式:
string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
string[] charSetOccurences = msg.Split(new string[]{ " " }, StringSplitOptions.None);
foreach (string s in charSetOccurences)
{
string charSet = s.Replace("=?", "").Replace("?B?", "").Replace("?b?", "");
Console.WriteLine(charSet);
}
看一个表意文字。
如果您仍然想使用regex,那么应该通过添加?
来使.*
变懒。之前的用户已经提到了这一点,但你似乎没有得到匹配?
string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
var charSetOccurences = new Regex(@"='?.*?'?B'?.*?'?=", RegexOptions.IgnoreCase);
var charSetMatches = charSetOccurences.Matches(msg);
foreach (Match match in charSetMatches)
{
string charSet = match.Groups[0].Value.Replace("=?", "").Replace("?B?", "").Replace("?b?", "");
Console.WriteLine(charSet);
}
查看另一个表意文字。
两种情况下的输出相同:
windows-1258UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?=
windows-1258IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=
编辑:根据更新,查看问题的一体化解决方案
string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
var charSetOccurences = new Regex(@"='?.*?'?[BQ]'?.*?'?=", RegexOptions.IgnoreCase);
MatchCollection matches = charSetOccurences.Matches(msg);
foreach (Match match in matches)
{
string[] encoding = match.Groups[0].Value.Split(new string[]{ "?" }, StringSplitOptions.None);
string charSet = encoding[1];
string encodeType = encoding[2];
string encodedString = encoding[3];
Console.WriteLine("Charset: " + charSet);
Console.WriteLine("Encoding type: " + encodeType);
Console.WriteLine("Encoded String: " + encodedString + "'n");
}
退货:
Charset: windows-1258
Encoding type: B
Encoded String: UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz
Charset: windows-1258
Encoding type: B
Encoded String: IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=
看看这个。
或者,由于我们已经有了正则表达式,我们可以使用:
string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
var charSetOccurences = new Regex(@"='?(.*?)'?([BQ])'?(.*?)'?=", RegexOptions.IgnoreCase);
MatchCollection matches = charSetOccurences.Matches(msg);
foreach (Match match in matches)
{
Console.WriteLine("Charset: " + match.Groups[1].Value);
Console.WriteLine("Encoding type: " + match.Groups[2].Value);
Console.WriteLine("Encoded String: " + match.Groups[3].Value + "'n");
}
返回相同的输出。
.*
是贪婪的,将匹配从第一个?
到最后一个?B?
的所有内容。
您需要使用非贪婪匹配
='?.*?'?B'?.*?'?=
或从字符列表中排除?
='?[^?]*'?B'?[^?]*'?=