在不打乱顺序的情况下计算字符串中重复单词的数量
本文关键字:单词 字符串 计算 顺序 情况下 | 更新日期: 2023-09-27 18:02:36
我有一个像var str = "S3;S4;S3;S4;S5;S5;S4;S4;S4"
这样的字符串,我想把它分成一个像这样的列表
{ {"S3" : 1}, {"S4" : 1}, {"S3" : 1}, {"S4" : 1}, {"S5" : 2}, {"S4" : 3} }
基本上是对序列中每个单词的计数。我试着使用LINQ group by,但它只会给我一个唯一单词的排序列表。有没有一种方法可以让我保持秩序,只计算一个单词的重复次数?
感谢您的任何建议或帮助!
这就是我目前拥有的
var text = "S3;S4;S5;S5;S4;S4;S3;S3;S3;S4;";
var list = text.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
var grouped = from state in list group state by state.ToState() into g select new { Name = g.Key, Count = g.Count() };
顺便说一下,我正在尝试使用LINQ。。。
请看一下serhiyb对非LINQ/Regex的回答和Xiaoy312对LINQ/Regex-非常好的解决方案的回答!
可以通过混合Regex
和少量LINQ
:来完成
Regex.Matches("S3;S4;S5;S5;S4;S4;S3;S3;S3;S4;", @"(?<key>.+?)(?<repeated>;'k<key>)*;")
.Cast<Match>()
.Select(x => new
{
Key = x.Groups["key"].Value,
Count = 1 + x.Groups["repeated"].Captures.Count
})
Regex
匹配以下内容:
(?<key>.+?)
匹配任何内容并将其放入命名组key
(?<repeated>;'k<key>)
匹配先前匹配密钥的任意重复次数
结果:
Key Count
S3 1
S4 1
S5 2
S4 2
S3 3
S4 1
LINQ不太适合这样的任务。唯一允许您在处理序列元素时保持某种状态的LINQ方法是Aggregate
,但它只是执行foreach
循环的一种LINQ方式。不管怎样,它在这里:
var result = list.Aggregate(
Enumerable.Repeat(new { Name = default(string), Count = default(int) }, 0).ToList(),
(res, name) =>
{
int last = res.Count - 1;
if (last >= 0 && res[last].Name == name)
res[last] = new { Name = name, Count = res[last].Count + 1 };
else
res.Add(new { Name = name, Count = 1 });
return res;
});
var text = "S3;S4;S5;S5;S4;S4;S3;S3;S3;S4;";
var list = text.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
var result = new List<KeyValuePair<string, int>>();
var current = list[0];
var len = 1;
for (int i = 1; i < list.Length; ++i)
{
if (current == list[i]){
++len;
}
else{
result.Add(new KeyValuePair<string, int>(current, len));
current = list[i];
len = 1;
}
}
result.Add(new KeyValuePair<string, int>(current, len));
Console.WriteLine(string.Join(",", result.Select(p => "{" +p.Key + "," + p.Value + "}" )));
现场演示:https://dotnetfiddle.net/aOOrHb
和解决方案与linq"黑客":
var text = "S3;S4;S5;S5;S4;S4;S3;S3;S3;S4;";
var list = text.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries).Concat(new [] {string.Empty});
var groupIndex = 1;
var result = list
.Skip(1)
.Zip(list, (cur, prev) => new KeyValuePair<string, int>(cur != prev && groupIndex > 1 ? string.Empty : string.IsNullOrEmpty(prev) ? cur : prev, cur == prev ? ++groupIndex : (groupIndex = 1) )).ToList()
.Where(p => !string.IsNullOrEmpty(p.Key));
Console.WriteLine(string.Join(",", result.Select(p => "{" +p.Key + "," + p.Value + "}" )));
实现这一点的真正LINQ方法不需要将整个序列存储在内存中。这种方式需要一个辅助方法和更多的行,但我相信更容易阅读和理解IMHO。
var repetitions = "S3;S4;S5;S5;S4;S4;S3;S3;S3;S4"
.Split(";".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.CountRepetitions();
foreach (var kv in repetitions)
{
Console.WriteLine(kv.ToString());
}
和CountRepetitions
方法:
static class RepetitionHelper
{
public static IEnumerable<KeyValuePair<string, int>> CountRepetitions(this IEnumerable<string> list)
{
string last = null;
int count = 1;
foreach(string current in list)
{
if (last == null)
last = current; // first element in the sequence
else if (last == current)
count++; // repetition
else
{
yield return new KeyValuePair<string, int>(last, count);
count = 1;
last = current;
}
}
if (last != null)
yield return new KeyValuePair<string, int>(last, count);
}
}