Regex提取“;从“;来自邮件档案,包括名称和换行符
本文关键字:包括名 档案 换行符 提取 Regex | 更新日期: 2023-09-27 18:24:00
我得到了以下格式的邮件存档,我的目标是解析它们并将它们存储在数据库中。我在下面的例子中加入了多个样本来演示数据。唯一需要观察的是"来自"行
发件人:FirstName LastName<FirstName.MiddleName.LastName@someemail.com>答复:<fc7b93ca4dab.531f4e68@my.bcit.ca>-------------------------------------------------发件人:"FirstName.MiddleName=?iso-8859-1?b?TWFydO1uZXo=?=LastName"<somemeail@something.otherthing.es>主题:回复:一些随机数据答复:<42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local>-------------------------------------------------发件人:"FirstName MiddleName LastName"<LastName@someemail.com>主题:一些随机主题-------------------------------------------------发件人:"FirstName.MiddleName=?iso-8859-1?b?TWFydO1uZXo=?=LastName"<somemeail@something.otherthing.es>主题:回复:一些随机数据答复:<42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local>-------------------------------------------------发件人:"FirstName.MiddleName=?iso-8859-1?b?TWFydO1uZXo=?=LastName"<somemeail@something.otherthing.es>主题:回复:一些随机数据答复:<42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local>
到目前为止,我注意到除了"From"之外的所有标题都是一致的,并且它们总是出现在同一行,但是"From 我在C#代码中使用以下正则表达式来提取"From"。 我也试过下面的表达式,但它把其他记录搞砸了。 我想做以下事情 我很难想出这个表达方式。 我已经完成了匹配换行符的正则表达式,c-sharp-regex-match-any-text-between-tags-included-new-lines,但无法在我的代码中实现它。 完整的样本代码match = Regex.Match(msg, @"(?<=From:)", RegexOptions.Multiline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
match = Regex.Match(msg, @"(?<=From:).*.'s*.*'s*(>)", RegexOptions.Multiline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
-抓取以From:开头的行,但不要捕获它,即(?<=From:)
-现在继续,直到您到达">",它必须包括空格、换行符等所有内容 class Program
{
static void Main(string[] args)
{
foreach (var demoText in TestData())
{
var match = Regex.Match(demoText, @"(?<=From:).*", RegexOptions.Multiline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
if (match.Success)
{
string fromField = match.Value.Replace(System.Environment.NewLine, " ");
// Found From - extract the email address
match = Regex.Match(fromField, @"(?<=<)+[^<>]+(?=>)+", RegexOptions.Multiline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
Console.WriteLine("Email Address:" + match.Value);
// Extract the name
match = Regex.Match(fromField, @".*(?=<)", RegexOptions.Multiline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
Console.WriteLine("Name:" + match.Value);
}
else
{
Console.WriteLine("*** Match not found in data: " + demoText);
}
}
Console.WriteLine("All done, press any key to close.");
Console.ReadLine();
}
static IEnumerable<string> TestData()
{
return @"
From: FirstName LastName <FirstName.MiddleName.LastName@someemail.com>
In-Reply-To: <fc7b93ca4dab.531f4e68@my.bcit.ca>ñ
From: ""FirstName. MiddleName =?iso-8859-1?b?TWFydO1uZXo=?= LastName""
<somemeail@something.otherthing.es>
Subject: Re: Some Randome Data
In-Reply-To: <42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local>ñ
From: ""FirstName MiddleName LastName"" <LastName@someemail.com>
Subject: Some Randome Subject ñ
From: ""FirstName. MiddleName =?iso-8859-1?b?TWFydO1uZXo=?= LastName""
<somemeail@something.otherthing.es
>
Subject: Re: Some Randome Data
In-Reply-To: <42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local>ñ
From: ""FirstName. MiddleName =?iso-8859-1?b?TWFydO1uZXo=?= LastName""
<
somemeail@something.otherthing.es
>
Subject: Re: Some Randome Data
In-Reply-To: <42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local>
".Split('ñ').Select(item => item.Trim());
(?<=From:)((?:(?!>).)*)>
试试这个。不要忘记设置s
或DOTALL
标志。请参阅演示。
http://regex101.com/r/kM7rT8/14
假设名称部分不能包含任何可以使用的尖括号:
(?<='bFrom:)[^>]+>
注意:除了不区分大小写的选项(如果需要)之外,您不需要特定的选项来使其工作。
如果你想做同样的事情,一次提取姓名和电子邮件,你可以使用这个:
'bFrom:'s*(?:"(?<name>[^"]+)"|(?<name>[^<]+?))'s+<'s*(?<email>[^>]+?)'s*>