查询以从每个组中删除早于n个活动日期的记录

本文关键字:活动日 活动 日期 记录 删除 查询 | 更新日期: 2023-09-27 18:25:10

我有一个数据集,需要每天修剪。它是从一个定期将记录写入表的进程中填充的。

我目前有一个简单的查询可以做到这一点:

DELETE FROM dataTable WHERE entryDate < dateadd(day, -5, GETDATE())

但问题是,这个过程是不可靠的;可能有几天根本没有写入任何数据。

因此,我真正需要的是一个查询,它可以追溯到写入数据的5天(可能是非连续的),而不是5个日历天。

例如,如果我运行以下查询:

SELECT cast(entryDate  as date) as LogDate
  FROM dataTable
  group by category, cast(entryDate as date)
  order by cast(entryDate as date) desc

我可能会得到一个结果:

Category    Date
Foo        2015-11-30
Foo        2015-11-29
Foo        2015-11-26
Foo        2015-11-25
Foo        2015-11-21
Foo        2015-11-20  <-- Start Pruning here, not the 25th.
Foo        2015-11-19
Foo        2015-11-18
Bar        2015-11-30
Bar        2015-11-29
Bar        2015-11-28
Bar        2015-11-27
Bar        2015-11-26
Bar        2015-11-25  <-- This one is OK to prune at the 25th.
Bar        2015-11-24
Bar        2015-11-23

在删除之前,我需要查询一直返回到第20个。

查询以从每个组中删除早于n个活动日期的记录

您可以使用row_number来获取表有条目的最后5天。然后根据生成的数字进行删除。

SQL Fiddle

with rownums as (SELECT row_number() over(partition by category order by cast(entryDate as date) desc) as rn
                 ,*
                 FROM dataTable
)
delete from rownums where rn <= 5 --use > 5 for records prior to the last 5 days

如果每天可以有多个条目,请使用dense_rank对行进行编号。

with rownums as (SELECT dense_rank() over(partition by category order by cast(entryDate as date) desc) as rn
                     ,*
                 FROM dataTable)
delete from rownums where rn > 5;

试试这样的方法。

;WITH orderedDates (LogDate, RowNum)
AS 
(
SELECT [CACHEDATE] AS LogDate, ROW_NUMBER() OVER (ORDER BY CACHEDATE DESC) AS RowNum
FROM
dataTable
GROUP BY CACHEDATE
)
DELETE dataTable
WHERE CACHEDATE IN
(SELECT LogDate FROM orderedDates
WHERE ROWNUM > 5) --or however many you need to go back