Regex提取“;从“;来自邮件档案,包括名称和换行符

本文关键字:包括名 档案 换行符 提取 Regex | 更新日期: 2023-09-27 18:24:00

我得到了以下格式的邮件存档,我的目标是解析它们并将它们存储在数据库中。我在下面的例子中加入了多个样本来演示数据。唯一需要观察的是"来自"行

发件人:FirstName LastName<FirstName.MiddleName.LastName@someemail.com>答复:<fc7b93ca4dab.531f4e68@my.bcit.ca>-------------------------------------------------发件人:"FirstName.MiddleName=?iso-8859-1?b?TWFydO1uZXo=?=LastName"<somemeail@something.otherthing.es>主题:回复:一些随机数据答复:<42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local>-------------------------------------------------发件人:"FirstName MiddleName LastName"<LastName@someemail.com>主题:一些随机主题-------------------------------------------------发件人:"FirstName.MiddleName=?iso-8859-1?b?TWFydO1uZXo=?=LastName"<somemeail@something.otherthing.es>主题:回复:一些随机数据答复:<42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local>-------------------------------------------------发件人:"FirstName.MiddleName=?iso-8859-1?b?TWFydO1uZXo=?=LastName"<somemeail@something.otherthing.es>主题:回复:一些随机数据答复:<42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local>

到目前为止,我注意到除了"From"之外的所有标题都是一致的,并且它们总是出现在同一行,但是"From

我在C#代码中使用以下正则表达式来提取"From"。

match = Regex.Match(msg, @"(?<=From:)", RegexOptions.Multiline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

我也试过下面的表达式,但它把其他记录搞砸了。

match = Regex.Match(msg, @"(?<=From:).*.'s*.*'s*(>)", RegexOptions.Multiline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

我想做以下事情
-抓取以From:开头的行,但不要捕获它,即(?<=From:)
-现在继续,直到您到达">",它必须包括空格、换行符等所有内容

我很难想出这个表达方式。

我已经完成了匹配换行符的正则表达式,c-sharp-regex-match-any-text-between-tags-included-new-lines,但无法在我的代码中实现它。

完整的样本代码

    class Program
        {
            static void Main(string[] args)
            {
                foreach (var demoText in TestData())
                {
                    var match = Regex.Match(demoText, @"(?<=From:).*", RegexOptions.Multiline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
                    if (match.Success)
                    {
                        string fromField = match.Value.Replace(System.Environment.NewLine, " ");
                        // Found From - extract the email address
                        match = Regex.Match(fromField, @"(?<=<)+[^<>]+(?=>)+", RegexOptions.Multiline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
                        Console.WriteLine("Email Address:" + match.Value);
                        // Extract the name
                        match = Regex.Match(fromField, @".*(?=<)", RegexOptions.Multiline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
                        Console.WriteLine("Name:" + match.Value);
                    }
                    else
                    {
                        Console.WriteLine("*** Match not found in data: " + demoText);
                    }
                }
                Console.WriteLine("All done, press any key to close.");
                Console.ReadLine();
            }
        static IEnumerable<string> TestData()
        {
            return @"
From:         FirstName LastName <FirstName.MiddleName.LastName@someemail.com>
In-Reply-To:  <fc7b93ca4dab.531f4e68@my.bcit.ca>ñ

From:         ""FirstName. MiddleName =?iso-8859-1?b?TWFydO1uZXo=?= LastName""
                <somemeail@something.otherthing.es> 
Subject:      Re: Some Randome Data 
In-Reply-To: <42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local>ñ

From:         ""FirstName MiddleName LastName"" <LastName@someemail.com>
Subject:      Some Randome Subject ñ
From:         ""FirstName. MiddleName =?iso-8859-1?b?TWFydO1uZXo=?= LastName""
                <somemeail@something.otherthing.es
                > 
Subject:      Re: Some Randome Data 
In-Reply-To: <42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local>ñ

From:         ""FirstName. MiddleName =?iso-8859-1?b?TWFydO1uZXo=?= LastName""
                <
                somemeail@something.otherthing.es
                > 
Subject:      Re: Some Randome Data 
In-Reply-To: <42043F8EC804DB48A3C4AF477195328F272CB9@exchange.something.local>
".Split('ñ').Select(item => item.Trim());

Regex提取“;从“;来自邮件档案,包括名称和换行符

(?<=From:)((?:(?!>).)*)>

试试这个。不要忘记设置sDOTALL标志。请参阅演示。

http://regex101.com/r/kM7rT8/14

假设名称部分不能包含任何可以使用的尖括号:

(?<='bFrom:)[^>]+>

注意:除了不区分大小写的选项(如果需要)之外,您不需要特定的选项来使其工作。

如果你想做同样的事情,一次提取姓名和电子邮件,你可以使用这个:

'bFrom:'s*(?:"(?<name>[^"]+)"|(?<name>[^<]+?))'s+<'s*(?<email>[^>]+?)'s*>