如何得到有重叠的3 × 3字符串的单词
本文关键字:字符串 单词 重叠 何得 | 更新日期: 2023-09-27 18:13:56
假设我有一个这样的句子:
从特定位置从字符串中取出单词的正则表达式
我需要写一个正则表达式,结合for循环,在第一个(0)循环中从句子开头取出前3个单词。
随着循环的进行,正则表达式将移动到句子的下一部分,正则表达式将跳过第一个单词并接受字符串中接下来的3个单词。
例如:
1st loop I'd get: "Regex for taking";
2nd loop I'd get: "for taking out";
3rd loop I'd get: "taking out words";
等等,直到字符串结束
我已经知道如何从字符串中取出第一个单词,但这几乎就是它了,我是Regex的新手,我已经这样做了:
^(['w'-]+)
这是一个非正则表达式的解决方案。
public static IEnumerable<List<string>> StrangeLoop(string source)
{
// If word separators are anything other than whitespaces
// then change parameters for Split
var words = source.Split(null);
for (int i = 0; i < words.Length - 2; i++)
{
yield return new List<string>() { words[i], words[i + 1], words[i + 2] };
}
}
var sentence = "Regex for taking out words out of a string from a specific position";
foreach (var triad in StrangeLoop(sentence))
{
//use triad
}
我建议分隔数据生成(正则表达式甚至只是一个Split(' ')
)和数据表示(滑动窗口):
public static IEnumerable<T[]> SlidingWindow<T>(this IEnumerable<T> source,
int windowSize) {
if (null == source)
throw new ArgumentException("source");
else if (windowSize <= 0)
throw new ArgumentOutOfRangeException("windowSize",
"Window size must be positive value");
List<T> window = new List<T>(windowSize);
foreach (var item in source) {
if (window.Count >= windowSize) {
yield return window.ToArray();
window.RemoveAt(0);
}
window.Add(item);
}
// Or (window.Count >= windowSize) if you don't want partial windows
if (window.Count > 0)
yield return window.ToArray();
}
使用SlidingWindow
,您所要做的就是像往常一样生成匹配,然后以不同的方式表示它们(只是额外的一行)。
var sentence = "Regex for taking out words out of a string from a specific position";
// Regex solution: get all matches as usual...
var result = Regex
.Matches(sentence, @"['w'-]+") // you don't want ^ anchor
.OfType<Match>()
.Select(match => match.Value)
.SlidingWindow(3); // and represent them as sliding windows..
var test = String.Join(Environment.NewLine, result
.Select(line => $"[{string.Join(" ", line)}]"));
Console.Write(test);
输出为
[Regex for taking]
[for taking out]
[taking out words]
[out words out]
[words out of]
[out of a]
[of a string]
[a string from]
[string, from, a]
[from a specific]
[a specific position]
如果你碰巧从正则表达式转变为一个简单的Split
,你会很容易做到:
// Split solution: as usual + final representation as sliding window
var result = sentence
.Split(' ') // just split...
.SlidingWindow(3); // ... and represent as sliding windows