Regex:如何从字符串中获取单词、空格和标点符号

本文关键字:单词 空格 标点符号 获取 字符串 Regex | 更新日期: 2023-09-27 18:13:44

基本上我想遍历所有句子,例如:

string sentence = "How was your day - Andrew, Jane?";
string[] separated = SeparateSentence(sentence);

separated输出如下:

[1] = "How"

[2] = "

[3] = "was"

[4] = "

[5] = "your"

[6] = "

[7] = "day"

[8] = "

[9] = "-"

[10] = "

[11] = "Andrew"

[12] = ";

[13] = "

[14] = "

[15] = "?"

目前我只能抓取单词,使用"'w(?<!'d)['w'-]*" Regex。如何根据输出示例将句子分成更小的部分?

Edit:字符串不包含以下任何内容:

  • 固体形态

  • 8、1、2

Regex:如何从字符串中获取单词、空格和标点符号

看看这个:

        string pattern = @"^('s+|'d+|'w+|[^'d's'w])+$";
        string input = "How was your 7 day - Andrew, Jane?";
        List<string> words = new List<string>();
        Regex regex = new Regex(pattern);
        if (regex.IsMatch(input))
        {
            Match match = regex.Match(input);
            foreach (Capture capture in match.Groups[1].Captures)
                words.Add(capture.Value);
        }

为什么不这样呢?它是为您的测试用例量身定制的,但如果您添加标点符号,这可能就是您想要的。

('w+|[,-?])
编辑:啊,从安德烈的回答中偷取,这就是我所设想的:
string pattern = @"('w+|[,-?])";
string input = "How was your 7 day - Andrew, Jane?";
List<string> words = new List<string>();
Regex regex = new Regex(pattern);
if (regex.IsMatch(input))
{
    MatchCollection matches = regex.Matches(input);
    foreach (Match m in matches)
        words.Add(m.Groups[1].Value);
}

我建议您实现一个简单的词法分析器(如果存在这样的东西),它将每次读取一个字符并生成您正在寻找的输出。虽然不是最简单的解决方案,但它的优点是在您的用例像@AndreCalil建议的那样变得更复杂时可伸缩。