Regex聊天消息检测
本文关键字:检测 消息 聊天 Regex | 更新日期: 2023-09-27 18:22:05
我目前正在尝试开发一个软件来正确查看以.txt格式保存的WhatsApp消息(通过电子邮件发送),并正在尝试制作一个解析器。在过去的3个小时里,我一直在试用Regex,但一直没有找到解决方案,因为我以前几乎没有使用过Regex。
消息如下:
16.08.2015, 18:30 - Person 1: Some multiline text here
still in the message
16.08.2015, 18:31 - Person 2: some other message which could be multiline
16.08.2015, 18:33 - Person 1: once again
我正试图通过与Regex匹配来正确拆分它们(像这样)
List<string> messages = new List<string>();
messages = Regex.Matches(parseable, @"REGEXHERE").Cast<Match>().Select(m => m.Value).ToList();
他们最终会变成这样的
messages[0]="16.08.2015, 18:30 - Person 1: Some multiline text here'nstill in the message";
messages[1]="16.08.2015, 18:31 - Person 2: some other message which could be multiline";
messages[2]="16.08.2015, 18:33 - Person 1: once again";
我一直在尝试使用非常混乱的正则表达式,它看起来像'd'd''.'d'd''. [...]
我不会使用一个RegEx。相反,我只使用StreadReader
或StreamReader
;你必须检查当前处理行是否是"聊天开始"行(使用RegEx),如果是,请检查以下行是否不是"聊天开始"行,并跟踪你是否应该追加或生成新行。我写了一个快速扩展方法来证明这一点:
public static class ChatReader
{
static string pattern = @"'d'd'.'d'd'.'d'd'd'd, 'd'd:'d'd - .*?:";
static Regex rgx = new Regex(pattern);
static string prevLine = "";
static string currLine = "";
public static IEnumerable<string> ReadChatMessages(this TextReader reader)
{
prevLine = reader.ReadLine();
currLine = reader.ReadLine();
bool isPrevChatMsg = rgx.IsMatch(prevLine);
while (currLine != null)
{
bool isCurrChatMsg = rgx.IsMatch(currLine);
if (isPrevChatMsg && isCurrChatMsg)
{
yield return prevLine;
prevLine = currLine;
}
else if (isCurrChatMsg)
{
yield return currLine;
prevLine = currLine;
}
else
{
prevLine += ''n' + currLine;
}
currLine = reader.ReadLine();
}
yield return prevLine;
}
}
可以像一样使用
List<string> chatMessages = reader.ReadChatMessages().ToList();