需要c# Regex来获得句子中的单词对

本文关键字:单词 句子 Regex 需要 | 更新日期: 2023-09-27 18:02:19

是否有一个正则表达式可以接受以下句子:

"I want this split up into pairs"

并生成以下列表:

"我想要","希望","分裂","分手","成","成双"

需要c# Regex来获得句子中的单词对

由于单词需要被重用,因此需要前瞻性断言:

Regex regexObj = new Regex(
    @"(     # Match and capture in backreference no. 1:
     'w+    # one or more alphanumeric characters
     's+    # one or more whitespace characters.
    )       # End of capturing group 1.
    (?=     # Assert that there follows...
     ('w+)  # another word; capture that into backref 2.
    )       # End of lookahead.", 
    RegexOptions.IgnorePatternWhitespace);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
    resultList.Add(matchResult.Groups[1].Value + matchResult.Groups[2].Value);
    matchResult = matchResult.NextMatch();
}

对于三组:

Regex regexObj = new Regex(
    @"(     # Match and capture in backreference no. 1:
     'w+    # one or more alphanumeric characters
     's+    # one or more whitespace characters.
    )       # End of capturing group 1.
    (?=     # Assert that there follows...
     (      # and capture...
      'w+   # another word,
      's+   # whitespace,
      'w+   # word.
     )      # End of capturing group 2.
    )       # End of lookahead.", 
    RegexOptions.IgnorePatternWhitespace);

等。

你可以做

var myWords = myString.Split(' ');
var myPairs = myWords.Take(myWords.Length - 1)
    .Select((w, i) => w + " " + myWords[i + 1]);

您可以使用string.Split()并合并结果:

var words = myString.Split(new char[] { ' ' });
var pairs = new List<string>();
for (int i = 0; i < words.Length - 1; i++)
{
    pairs.Add(words[i] + words[i+1]);
}

要只使用RegEx而不进行后处理,我们可以重用Tim Pietzcker的答案,但传递两个连续的RegEx

我们可以传递原始的Tim Pietzcker的答案和相同的后面看,这将使regex从第二个单词开始捕获。

如果您将两个正则表达式的结果组合在一起,您将得到文本中的所有对。

Regex regexObj1 = new Regex(
    @"(     # Match and capture in backreference no. 1:
     'w+    # one or more alphanumeric characters
     's+    # one or more whitespace characters.
    )       # End of capturing group 1.
    (?=     # Assert that there follows...
     ('w+)  # another word; capture that into backref 2.
    )       # End of lookahead.", 
    RegexOptions.IgnorePatternWhitespace);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
    resultList.Add(matchResult.Groups[1].Value + matchResult.Groups[2].Value);
    matchResult = matchResult.NextMatch();
}
Regex regexObj2 = new Regex(
    @"(?<=  # Assert that there preceds and will not be captured
     'w+'s+ # the first word followed by any space
    )
    (     # Match and capture in backreference no. 1:
     'w+    # one or more alphanumeric characters
     's+    # one or more whitespace characters.
    )       # End of capturing group 1.
    (?=     # Assert that there follows...
     ('w+)  # another word; capture that into backref 2.
    )       # End of lookahead.", 
    RegexOptions.IgnorePatternWhitespace);
Match matchResult1 = regexObj1.Match(subjectString);
Match matchResult2 = regexObj2.Match(subjectString);

对于三组:

您需要在程序中添加第三个RegEx:

Regex regexObj3 = new Regex(
        @"(?<=  # Assert that there preceds and will not be captured
         'w+'s+'w+'s+ # the first and second word followed by any space
        )
        (     # Match and capture in backreference no. 1:
         'w+    # one or more alphanumeric characters
         's+    # one or more whitespace characters.
        )       # End of capturing group 1.
        (?=     # Assert that there follows...
         ('w+)  # another word; capture that into backref 2.
        )       # End of lookahead.", 
        RegexOptions.IgnorePatternWhitespace);
    Match matchResult1 = regexObj1.Match(subjectString);
    Match matchResult2 = regexObj2.Match(subjectString);
    Match matchResult3 = regexObj3.Match(subjectString);