分割逗号分隔的字符串并选择top " searchtag "按计数排序

本文关键字:quot searchtag 排序 top 选择 分隔 字符 串并 字符串 分割 | 更新日期: 2023-09-27 18:19:03

我正在使用以下数据集:

ID  SearchTags
1   Cats,Birds,Dogs,Snakes,Roosters
2   Mice,Chickens,Cats,Lizards
3   Birds,Zebras,Sheep,Horses,Monkeys,Chimps
4   Lions,Tigers,Bears,Chickens
5   Cats,Goats,Pandas
6   Birds,Zebras,Sheep,Horses
7   Rats,Dogs,Hawks,Eagles,Tigers
8   Cats,Tigers,Dogs,Pandas
9   Dogs,Beavers,Sharks,Vultures
10  Cats,Bears,Bats,Leopards,Chickens

我需要查询出最流行的搜索标签列表。

我有一个查询,它将返回最流行的搜索标签,但它返回整个单词列表。(这是我预料到的)。是否有可能分割(,)上的SearchTags列并生成最受欢迎的标签列表,以便我最终得到如下列表/计数?:

Cats    5
Dogs    4
Chickens    3
Tigers  3
Bears   2
Sharks  1
etc...

而不是我现在得到的:

Cats,Birds,Dogs,Snakes,Roosters 1
Dogs,Beavers,Sharks,Vultures    1
Cats,Bears,Bats,Leopards,Chickens 1
etc...

下面是返回单词列表的查询。

SELECT SearchTags, COUNT(*) AS TagCount
FROM Animals
GROUP BY SearchTags
ORDER BY TagCount DESC

我正在使用SQL Server。我更喜欢查询,但如果需要,可以创建存储过程。

感谢您提供的任何帮助

分割逗号分隔的字符串并选择top " searchtag "按计数排序

你已经用c#和LINQ标记了这个问题,如果你有DataTable中的数据,那么你可以这样做:

DataTable dt = GetDataTableFromDB();
var query = dt.AsEnumerable()
               .Select(r => r.Field<string>("SearchTags").Split(','))
               .SelectMany(r => r)
               .GroupBy(r => r)
               .Select(grp => new
                   {
                       Key = grp.Key,
                       Count = grp.Count()
                   });

如果你设置了LINQ TO SQL,那么你可以这样做:

var query = db.YourTable
               .Select(r=> r.SearchTags)
               .AsEnumerable()
               .Where(r=> !string.IsNullOrWhiteSpace(r))
               .Select(r => r.Split(','))
               .SelectMany(r => r)
               .GroupBy(r => r)
               .Select(grp => new
                   {
                       Key = grp.Key,
                       Count = grp.Count()
                   });
           });

这将加载内存中的所有SearchTags,然后您将能够应用Split

您还可以在数据库端过滤掉SearchTags的null或空字符串值,如:

var query = db.YourTable
               .Where(r=> r.SearchTags != null && r.SearchTags.Trim() != "")
               .Select(r=> r.SearchTags)
               .AsEnumerable()
               .Select(r => r.Split(','))
               .SelectMany(r => r)
               .GroupBy(r => r)
               .Select(grp => new
                   {
                       Key = grp.Key,
                       Count = grp.Count()
                   });
           });

以上将从数据库端返回的集合中过滤掉空字符串或空字符串/仅为空白,并且将更有效地工作。

要过滤日期,请执行:

DateTime dt = DateTime.Today.AddDays(-14);
var query = db.YourTable
               .Where(r=> r.SearchTags != null && 
                      r.SearchTags.Trim() != "" &&
                      r.MediaDate >= dt)
               .Select(r=> r.SearchTags)
               .AsEnumerable()
               .Select(r => r.Split(','))
               .SelectMany(r => r)
               .GroupBy(r => r)
               .Select(grp => new
                   {
                       Key = grp.Key,
                       Count = grp.Count()
                   });
           });

假设您想要TSQL…

有许多用于分割字符串的TSQL函数,但是任何使用XQuery的函数都是迄今为止最快的,而不是过多的循环函数。

我在一个有10-15K CSV值的表的生产系统中使用了类似的东西,它在几秒钟内运行,而旧的循环函数有时需要一分钟。

无论如何,这里有一个快速的演示来帮助你。

DECLARE @DATA TABLE (ID INT, SEARCHTAGS VARCHAR(100))
INSERT INTO @DATA
SELECT 1,'Cats,Birds,Dogs,Snakes,Roosters' UNION ALL
SELECT 2,'Mice,Chickens,Cats,Lizards' UNION ALL
SELECT 3,'Birds,Zebras,Sheep,Horses,Monkeys,Chimps' UNION ALL
SELECT 4,'Lions,Tigers,Bears,Chickens' UNION ALL
SELECT 5,'Cats,Goats,Pandas' UNION ALL
SELECT 6,'Birds,Zebras,Sheep,Horses' UNION ALL
SELECT 7,'Rats,Dogs,Hawks,Eagles,Tigers' UNION ALL
SELECT 8,'Cats,Tigers,Dogs,Pandas' UNION ALL
SELECT 9,'Dogs,Beavers,Sharks,Vultures' UNION ALL
SELECT 10,'Cats,Bears,Bats,Leopards,Chickens'
;WITH TagList AS
(
SELECT ID, Split.a.value('.', 'VARCHAR(max)') AS String
FROM  (SELECT ID, 
              CAST ('<M>' + REPLACE(CAST(SEARCHTAGS AS VARCHAR), ',', '</M><M>') + '</M>' AS XML) AS String  
       FROM @DATA) AS A 
CROSS APPLY String.nodes ('/M') AS Split(a)
)
SELECT TOP (10) String, COUNT(*) AS [SearchCount]
FROM TagList
GROUP BY String
ORDER BY [SearchCount] DESC

注意:如果你能在c#中处理,任何与字符串操作有关的事情几乎总是更快…因此,Habib的答案可能比TSQL解决方案更有效。