需要c# Regex来获得句子中的单词对
本文关键字:单词 句子 Regex 需要 | 更新日期: 2023-09-27 18:02:19
是否有一个正则表达式可以接受以下句子:
"I want this split up into pairs"
并生成以下列表:
"我想要","希望","分裂","分手","成","成双"
由于单词需要被重用,因此需要前瞻性断言:
Regex regexObj = new Regex(
@"( # Match and capture in backreference no. 1:
'w+ # one or more alphanumeric characters
's+ # one or more whitespace characters.
) # End of capturing group 1.
(?= # Assert that there follows...
('w+) # another word; capture that into backref 2.
) # End of lookahead.",
RegexOptions.IgnorePatternWhitespace);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
resultList.Add(matchResult.Groups[1].Value + matchResult.Groups[2].Value);
matchResult = matchResult.NextMatch();
}
对于三组:
Regex regexObj = new Regex(
@"( # Match and capture in backreference no. 1:
'w+ # one or more alphanumeric characters
's+ # one or more whitespace characters.
) # End of capturing group 1.
(?= # Assert that there follows...
( # and capture...
'w+ # another word,
's+ # whitespace,
'w+ # word.
) # End of capturing group 2.
) # End of lookahead.",
RegexOptions.IgnorePatternWhitespace);
等。
你可以做
var myWords = myString.Split(' ');
var myPairs = myWords.Take(myWords.Length - 1)
.Select((w, i) => w + " " + myWords[i + 1]);
您可以使用string.Split()
并合并结果:
var words = myString.Split(new char[] { ' ' });
var pairs = new List<string>();
for (int i = 0; i < words.Length - 1; i++)
{
pairs.Add(words[i] + words[i+1]);
}
要只使用RegEx而不进行后处理,我们可以重用Tim Pietzcker的答案,但传递两个连续的RegEx
我们可以传递原始的Tim Pietzcker的答案和相同的后面看,这将使regex从第二个单词开始捕获。
如果您将两个正则表达式的结果组合在一起,您将得到文本中的所有对。
Regex regexObj1 = new Regex(
@"( # Match and capture in backreference no. 1:
'w+ # one or more alphanumeric characters
's+ # one or more whitespace characters.
) # End of capturing group 1.
(?= # Assert that there follows...
('w+) # another word; capture that into backref 2.
) # End of lookahead.",
RegexOptions.IgnorePatternWhitespace);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
resultList.Add(matchResult.Groups[1].Value + matchResult.Groups[2].Value);
matchResult = matchResult.NextMatch();
}
Regex regexObj2 = new Regex(
@"(?<= # Assert that there preceds and will not be captured
'w+'s+ # the first word followed by any space
)
( # Match and capture in backreference no. 1:
'w+ # one or more alphanumeric characters
's+ # one or more whitespace characters.
) # End of capturing group 1.
(?= # Assert that there follows...
('w+) # another word; capture that into backref 2.
) # End of lookahead.",
RegexOptions.IgnorePatternWhitespace);
Match matchResult1 = regexObj1.Match(subjectString);
Match matchResult2 = regexObj2.Match(subjectString);
等对于三组:
您需要在程序中添加第三个RegEx:
Regex regexObj3 = new Regex(
@"(?<= # Assert that there preceds and will not be captured
'w+'s+'w+'s+ # the first and second word followed by any space
)
( # Match and capture in backreference no. 1:
'w+ # one or more alphanumeric characters
's+ # one or more whitespace characters.
) # End of capturing group 1.
(?= # Assert that there follows...
('w+) # another word; capture that into backref 2.
) # End of lookahead.",
RegexOptions.IgnorePatternWhitespace);
Match matchResult1 = regexObj1.Match(subjectString);
Match matchResult2 = regexObj2.Match(subjectString);
Match matchResult3 = regexObj3.Match(subjectString);