Regex:如何从字符串中获取单词、空格和标点符号
本文关键字:单词 空格 标点符号 获取 字符串 Regex | 更新日期: 2023-09-27 18:13:44
基本上我想遍历所有句子,例如:
string sentence = "How was your day - Andrew, Jane?";
string[] separated = SeparateSentence(sentence);
separated
输出如下:
[1] = "How"
[2] = "
[3] = "was"[4] = "
[5] = "your"
[6] = "
[7] = "day"
[8] = "
[9] = "-"
[10] = "
[11] = "Andrew"
[12] = ";
[13] = "
[14] = "
[15] = "?"
目前我只能抓取单词,使用"'w(?<!'d)['w'-]*"
Regex。如何根据输出示例将句子分成更小的部分?
Edit:字符串不包含以下任何内容:
。
固体形态
8、1、2
看看这个:
string pattern = @"^('s+|'d+|'w+|[^'d's'w])+$";
string input = "How was your 7 day - Andrew, Jane?";
List<string> words = new List<string>();
Regex regex = new Regex(pattern);
if (regex.IsMatch(input))
{
Match match = regex.Match(input);
foreach (Capture capture in match.Groups[1].Captures)
words.Add(capture.Value);
}
为什么不这样呢?它是为您的测试用例量身定制的,但如果您添加标点符号,这可能就是您想要的。
('w+|[,-?])
编辑:啊,从安德烈的回答中偷取,这就是我所设想的:
string pattern = @"('w+|[,-?])";
string input = "How was your 7 day - Andrew, Jane?";
List<string> words = new List<string>();
Regex regex = new Regex(pattern);
if (regex.IsMatch(input))
{
MatchCollection matches = regex.Matches(input);
foreach (Match m in matches)
words.Add(m.Groups[1].Value);
}
我建议您实现一个简单的词法分析器(如果存在这样的东西),它将每次读取一个字符并生成您正在寻找的输出。虽然不是最简单的解决方案,但它的优点是在您的用例像@AndreCalil建议的那样变得更复杂时可伸缩。