如何得到有重叠的3 × 3字符串的单词

本文关键字:字符串 单词 重叠 何得 | 更新日期: 2023-09-27 18:13:56

假设我有一个这样的句子:

从特定位置从字符串中取出单词的正则表达式

我需要写一个正则表达式,结合for循环,在第一个(0)循环中从句子开头取出前3个单词。

随着循环的进行,正则表达式将移动到句子的下一部分,正则表达式将跳过第一个单词并接受字符串中接下来的3个单词。

例如:

1st loop I'd get: "Regex for taking";
2nd loop I'd get: "for taking out";
3rd loop I'd get: "taking out words";

等等,直到字符串结束

我已经知道如何从字符串中取出第一个单词,但这几乎就是它了,我是Regex的新手,我已经这样做了:

^(['w'-]+)

如何得到有重叠的3 × 3字符串的单词

这是一个非正则表达式的解决方案。

public static IEnumerable<List<string>> StrangeLoop(string source)
{
    // If word separators are anything other than whitespaces 
    // then change parameters for Split
    var words = source.Split(null); 
    for (int i = 0; i < words.Length - 2; i++)
    {
        yield return new List<string>() { words[i], words[i + 1], words[i + 2] };
    }
}
var sentence = "Regex for taking out words out of a string from a specific position";
foreach (var triad in StrangeLoop(sentence))
{
    //use triad
}

我建议分隔数据生成(正则表达式甚至只是一个Split(' '))和数据表示(滑动窗口):

public static IEnumerable<T[]> SlidingWindow<T>(this IEnumerable<T> source,
                                                int windowSize) {
  if (null == source)
    throw new ArgumentException("source");
  else if (windowSize <= 0)
    throw new ArgumentOutOfRangeException("windowSize", 
      "Window size must be positive value");
  List<T> window = new List<T>(windowSize);
  foreach (var item in source) {
    if (window.Count >= windowSize) {
      yield return window.ToArray();
      window.RemoveAt(0);
    }
    window.Add(item);
  }
  // Or (window.Count >= windowSize) if you don't want partial windows 
  if (window.Count > 0)
    yield return window.ToArray();
}

使用SlidingWindow,您所要做的就是像往常一样生成匹配,然后以不同的方式表示它们(只是额外的一行)。

var sentence = "Regex for taking out words out of a string from a specific position";
// Regex solution: get all matches as usual...
var result = Regex
  .Matches(sentence, @"['w'-]+") // you don't want ^ anchor
  .OfType<Match>()
  .Select(match => match.Value)
  .SlidingWindow(3); // and represent them as sliding windows..
var test = String.Join(Environment.NewLine, result
  .Select(line => $"[{string.Join(" ", line)}]")); 
Console.Write(test);

输出为

[Regex for taking]
[for taking out]
[taking out words]
[out words out]
[words out of]
[out of a]
[of a string]
[a string from]
[string, from, a]
[from a specific]
[a specific position]

如果你碰巧从正则表达式转变为一个简单的Split,你会很容易做到:

// Split solution: as usual + final representation as sliding window 
var result = sentence
  .Split(' ')        // just split...
  .SlidingWindow(3); // ... and represent as sliding windows