最简单的方法获得电子邮件(文本文件)的每个单词到数组c#

本文关键字:单词 数组 文本 方法 电子邮件 最简单 文件 | 更新日期: 2023-09-27 18:02:02

我正试图为一个类项目构建一个网络钓鱼扫描仪,我被困在试图得到保存在文本文件中的电子邮件,以正确地复制到一个数组中供以后处理。我想要的是每个单词都有自己的数组索引

以下是我的电子邮件示例:

Subject: Insufficient Funds Notice
Date: September 25, 2013
Insufficient Funds Notice
Unfortunately, on 09/25/2013 your available balance in your Wells Fargo account XXXXXX4653 was insufficient to cover one or more of your checks, Debit Card purchases, or other transactions. 
An important notice regarding one or more of your payments is now available in your Messages & Alerts inbox. 
To read the message, click here, and first confirm your identity. 
Please make deposits to cover your payments, fees, and any other withdrawals or transactions you have initiated. If you have already taken care of this, please disregard this notice. 
We appreciate your business and thank you for your prompt attention to this matter. 
If you have questions after reading the notice in your inbox, please refer to the contact information in the notice. Please do not reply to this automated email. 
Sincerely, 
Wells Fargo Online Customer Service 
wellsfargo.com | Fraud Information Center
4f57e44c-5d00-4673-8eae-9123909604b6

我不需要任何标点符号,我只需要单词和数字。

这是目前为止我为它写的代码。

    StreamReader sr1 = new StreamReader(lblDisplaySelectedFilePath.Text);
    string line = sr1.ReadToEnd();
    words = line.Split(' ');
    int wordslowercount = 0;
    foreach (string word in words)
    {
        words[wordslowercount] = word.ToLower();
        wordslowercount = wordslowercount + 1;   
    }

上面代码的问题是,我一直得到的单词要么串在一起,要么在数组中有"'r"或"'n"。下面是一个我不想要的数组的例子。

"notice'r'ndate:"不需要'r、'n或:。而且这两个词应该在不同的索引中

最简单的方法获得电子邮件(文本文件)的每个单词到数组c#

正则表达式'W将允许您拆分字符串并创建单词列表。它使用了单词边界,所以不会包含标点符号。

Regex.Split(inputString, "''W").Where(x => !string.IsNullOrWhiteSpace(x));
using System;
using System.Text.RegularExpressions;
public class Example
{
    static string CleanInput(string strIn)
    {
        // Replace invalid characters with empty strings. 
        try {
           return Regex.Replace(strIn, @"[^'w'.@-]", "", 
                                RegexOptions.None, TimeSpan.FromSeconds(1.5)); 
        }
        // If we timeout when replacing invalid characters,  
        // we should return Empty. 
        catch (RegexMatchTimeoutException) {
           return String.Empty;   
        }
    }
}

使用line.Split(null)将在空白处分割。从c#字符串。分割方法文档:

如果分隔符参数为空或不包含字符,则假定空白字符为分隔符。空白字符由Unicode标准定义,如果传递给Char则返回true。IsWhiteSpace方法。