在不打乱顺序的情况下计算字符串中重复单词的数量

本文关键字:单词 字符串 计算 顺序 情况下 | 更新日期: 2023-09-27 18:02:36

我有一个像var str = "S3;S4;S3;S4;S5;S5;S4;S4;S4"这样的字符串,我想把它分成一个像这样的列表

{ {"S3" : 1}, {"S4" : 1}, {"S3" : 1}, {"S4" : 1}, {"S5" : 2}, {"S4" : 3} }

基本上是对序列中每个单词的计数。我试着使用LINQ group by,但它只会给我一个唯一单词的排序列表。有没有一种方法可以让我保持秩序,只计算一个单词的重复次数?

感谢您的任何建议或帮助!

这就是我目前拥有的

var text = "S3;S4;S5;S5;S4;S4;S3;S3;S3;S4;";
var list = text.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
var grouped = from state in list group state by state.ToState() into g select new { Name = g.Key, Count = g.Count() };

顺便说一下,我正在尝试使用LINQ。。。

请看一下serhiyb对非LINQ/Regex的回答和Xiaoy312对LINQ/Regex-非常好的解决方案的回答!

在不打乱顺序的情况下计算字符串中重复单词的数量

可以通过混合Regex和少量LINQ:来完成

Regex.Matches("S3;S4;S5;S5;S4;S4;S3;S3;S3;S4;", @"(?<key>.+?)(?<repeated>;'k<key>)*;")
    .Cast<Match>()
    .Select(x => new
    {
        Key = x.Groups["key"].Value,
        Count = 1 + x.Groups["repeated"].Captures.Count
    })

Regex匹配以下内容:

  • (?<key>.+?)匹配任何内容并将其放入命名组key
  • (?<repeated>;'k<key>)匹配先前匹配密钥的任意重复次数

结果:

Key Count
S3 1 
S4 1 
S5 2 
S4 2 
S3 3 
S4 1 

LINQ不太适合这样的任务。唯一允许您在处理序列元素时保持某种状态的LINQ方法是Aggregate,但它只是执行foreach循环的一种LINQ方式。不管怎样,它在这里:

var result = list.Aggregate(
    Enumerable.Repeat(new { Name = default(string), Count = default(int) }, 0).ToList(),
    (res, name) =>
    {
        int last = res.Count - 1;
        if (last >= 0 && res[last].Name == name)
            res[last] = new { Name = name, Count = res[last].Count + 1 };
        else
            res.Add(new { Name = name, Count = 1 });
        return res;
    });
var text = "S3;S4;S5;S5;S4;S4;S3;S3;S3;S4;";
var list = text.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
var result = new List<KeyValuePair<string, int>>();
var current = list[0];
var len = 1;
for (int i = 1; i < list.Length; ++i)
{
    if (current == list[i]){
        ++len;
    }
    else{
        result.Add(new KeyValuePair<string, int>(current, len));
        current = list[i];
        len = 1;
    }
}
result.Add(new KeyValuePair<string, int>(current, len));
Console.WriteLine(string.Join(",", result.Select(p => "{" +p.Key + "," + p.Value + "}" )));

现场演示:https://dotnetfiddle.net/aOOrHb

和解决方案与linq"黑客":

    var text = "S3;S4;S5;S5;S4;S4;S3;S3;S3;S4;";
    var list = text.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries).Concat(new [] {string.Empty});
    var groupIndex = 1;
    var result = list
        .Skip(1)
        .Zip(list, (cur, prev) => new KeyValuePair<string, int>(cur != prev && groupIndex > 1 ? string.Empty : string.IsNullOrEmpty(prev) ? cur : prev, cur == prev ? ++groupIndex : (groupIndex = 1) )).ToList()
        .Where(p => !string.IsNullOrEmpty(p.Key));
    Console.WriteLine(string.Join(",", result.Select(p => "{" +p.Key + "," + p.Value + "}" )));

实现这一点的真正LINQ方法不需要将整个序列存储在内存中。这种方式需要一个辅助方法和更多的行,但我相信更容易阅读和理解IMHO。

var repetitions = "S3;S4;S5;S5;S4;S4;S3;S3;S3;S4"
    .Split(";".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
    .CountRepetitions();
    foreach (var kv in repetitions)
    {
        Console.WriteLine(kv.ToString());
    }

CountRepetitions方法:

static class RepetitionHelper
{ 
    public static IEnumerable<KeyValuePair<string, int>> CountRepetitions(this IEnumerable<string> list)
    {
        string last = null;
        int count = 1;
        foreach(string current in list)
        {
            if (last == null)
                last = current; // first element in the sequence
            else if (last == current)
                count++;        // repetition
            else
            {
                yield return new KeyValuePair<string, int>(last, count);
                count = 1;
                last = current;
            }
        }
        if (last != null) 
            yield return new KeyValuePair<string, int>(last, count); 
    } 
}