LINQ 聚合 30 分钟间隔到小时
本文关键字:小时 分钟 聚合 LINQ | 更新日期: 2023-09-27 18:33:41
我不是 LINQ 的超级专家,我有第三方提供的以下数据:数据
Start: 6:00
End: 6:30
value: 1
Start: 7:00
End: 7:30
value: 1
Start: 8:00
End: 8:30
value: 1
Start: 9:00
End: 9:30
value: 1
Start: 10:00
End: 10:30
value: 1
Start: 11:00
End: 11:30
value: 1
Start: 12:00
End: 12:30
value: 1
Start: 13:00
End: 13:30
value: 1
Start: 14:00
End: 14:30
value: 1
...
Start: 05:00
End: 05:30
value: 1
这些数据持续一周,然后是 30 天和 365 天。
我需要将每个 30 分钟的块转换为一个小时。
例如
Start: 6:00
End: 7:00
Value: 2
Start:7:00
End: 8:00
Value:2
......
假设开始、结束和价值是一行,有人可以帮助如何实现上述目标吗?
此查询能够按给定AggregationType
分组,并且能够使用第二个参数 checkType
过滤掉不完整的组。
private enum AggerationType { Year = 1, Month = 2, Day = 3, Hour = 4 }
private IList<Data> RunQuery(AggerationType groupType, AggerationType checkType)
{
// The actual query which does to trick
var result =
from d in testList
group d by new {
d.Start.Year,
Month = (int)groupType >= (int)AggerationType.Month ? d.Start.Month : 1,
Day = (int)groupType >= (int)AggerationType.Day ? d.Start.Day : 1,
Hour = (int)groupType >= (int)AggerationType.Hour ? d.Start.Hour : 1
} into g
// The where clause checks how much data needs to be in the group
where CheckAggregation(g.Count(), checkType)
select new Data() { Start = g.Min(m => m.Start), End = g.Max(m => m.End), Value = g.Sum(m => m.Value) };
return result.ToList();
}
private bool CheckAggregation(int groupCount, AggerationType checkType)
{
int requiredCount = 1;
switch(checkType)
{
// For year all data must be multiplied by 12 months
case AggerationType.Year:
requiredCount = requiredCount * 12;
goto case AggerationType.Month;
// For months all data must be multiplied by days in month
case AggerationType.Month:
// I use 30 but this depends on the given month and year
requiredCount = requiredCount * 30;
goto case AggerationType.Day;
// For days all data need to be multiplied by 24 hour
case AggerationType.Day:
requiredCount = requiredCount * 24;
goto case AggerationType.Hour;
// For hours all data need to be multiplied by 2 (because slots of 30 minutes)
case AggerationType.Hour:
requiredCount = requiredCount * 2;
break;
}
return groupCount == requiredCount;
}
如果需要,这里有一些测试数据:
class Data
{
public DateTime Start { get; set; }
public DateTime End { get; set; }
public int Value { get; set; }
}
// Just setup some test data simulary to your example
IList<Data> testList = new List<Data>();
DateTime date = DateTime.Parse("6:00");
// This loop fills just some data over several years, months and days
for (int year = date.Year; year > 2010; year--)
{
for(int month = date.Month; month > 0; month--)
{
for (int day = date.Day; day > 0; day--)
{
for(int hour = date.Hour; hour > 0; hour--)
{
DateTime testDate = date.AddHours(-hour).AddDays(-day).AddMonths(-month).AddYears(-(date.Year - year));
testList.Add(new Data() { Start = testDate, End = testDate.AddMinutes(30), Value = 1 });
testList.Add(new Data() { Start = testDate.AddMinutes(30), End = testDate.AddHours(1), Value = 1 });
}
}
}
}
下面是代码。由于switch
的说法,它似乎有点丑陋。最好重构它,但它应该显示这个想法。
var items = input.Split(''n');
Func<string, string> f = s =>
{
var strings = s.Split(new[] {':'}, 2);
var key = strings[0];
var value = strings[1];
switch (key.ToLower())
{
case "start":
return s;
case "value":
return String.Format("{0}: {1}", key, Int32.Parse(value) + 1);
case "end":
return String.Format("{0}: {1:h:mm}", key,
DateTime.Parse(value) +
TimeSpan.FromMinutes(30));
default:
return "";
}
};
var resultItems = items.Select(f);
Console.Out.WriteLine("result = {0}",
String.Join(Environment.NewLine, resultItems));
实际上很难
用纯 LINQ 完全解决这个问题。为了使生活更轻松,需要编写至少一个允许转换枚举的帮助程序方法。请看下面的例子。在这里,我利用了TimeInterval
的IEnumerable
,并有一个自定义的Split
方法(使用 C# 迭代器实现(,该方法将两个元素连接在一个Tuple
中:
class TimeInterval
{
DateTime Start;
DateTime End;
int Value;
}
IEnumerable<TimeInterval> ToHourlyIntervals(
IEnunumerable<TimeInterval> halfHourlyIntervals)
{
return
from pair in Split(halfHourlyIntervals)
select new TimeInterval
{
Start = pair.Item1.Start,
End = pair.Item2.End,
Value = pair.Item1.Value + pair.Item2.Value
};
}
static IEnumerable<Tuple<T, T>> Split<T>(
IEnumerable<T> source)
{
using (var enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext())
{
T first = enumerator.Current;
if (enumerator.MoveNext())
{
T second = enumerator.Current;
yield return Tuple.Create(first, second);
}
}
}
}
同样可以应用于问题的第一部分(从字符串列表中提取半小时TimeInterval
(:
IEnumerable<TimeInterval> ToHalfHourlyIntervals(
IEnumerable<string> inputLines)
{
return
from triple in TripleSplit(inputLines)
select new TimeInterval
{
Start = DateTime.Parse(triple.Item1.Replace("Start: ", "")),
End = DateTime.Parse(triple.Item2.Replace("End: ", "")),
Value = Int32.Parse(triple.Item3)
};
}
在这里,我使用了一个返回Tuple<T, T, T>
的自定义TripleSplit
方法(这将很容易编写(。完成此操作后,完整的解决方案将如下所示:
// Read data lazilzy from disk (or any other source)
var lines = File.ReadLines(path);
var halfHourlyIntervals = ToHalfHourlyIntervals(lines);
var hourlyIntervals = ToHourlyIntervals(halfHourlyIntervals);
foreach (var interval in hourlyIntervals)
{
// process
}
这个解决方案的好处是它完全推迟了。它一次处理一行,这允许您无限期地处理大源,而不会有任何内存不足异常的危险,考虑到您的给定要求,这似乎很重要:
这些数据持续一周,然后是 30 天和 365 天。