SQL Server匹配单词短语和顺序相关性的最佳方法
本文关键字:相关性 顺序 最佳 方法 短语 Server 单词 SQL | 更新日期: 2023-09-27 18:00:34
根据参数中单词的数量(计数)/匹配情况对sql varchar列进行排名的最佳方法是什么,有四个不同的唯一标准。这可能不是一个微不足道的问题,但我面临的挑战是使用我的标准根据"最佳匹配"对行进行排序。
列:description varchar(100)参数:@MyParameter varchar(100)
具有此订单偏好的输出:
- 完全匹配(整个字符串匹配)-总是第一个
- 以开头(根据匹配的参数长度递减)
- 单词数排名,相邻单词排名越高,匹配的单词数相同
- 单词在任何位置匹配(不连续)
单词可能不完全匹配,因为一个单词的部分匹配是允许的,而且很可能,出租人的值应该应用于部分单词以进行排名,但不是关键的(例如,pot将匹配以下每个单词:pot、potter、potholder、depot、depotting)。以其他单词开头的匹配应该比没有后续匹配的匹配排名更高,但这不是交易杀手/超级重要。
我想要一个方法来对列"以"参数中的值开始的位置进行排序。假设我有以下字符串:
'This is my value string as a test template to rank on.'
在第一种情况下,我希望得到字数最多的列/行的排名。
排名第二的是一开始根据出现次数(最佳匹配)排名为:
'This is my string as a test template to rank on.' - first
'This is my string as a test template to rank on even though not exact.'-second
'This is my string as a test template to rank' - third
'This is my string as a test template to' - next
'This is my string as a test template' - next etc.
第二:(可能是第一组数据之后的第二组数据(以开头)-这是所需的
我想根据@MyParameter中出现的单词数对行进行排名(排序),其中连续单词的排名高于相同的单词数。
因此,对于上面的示例字符串,由于具有相同字数的连续字符串(单词在一起)的"更好匹配",'is my string as shown'
的排名将高于'is not my other string as'
。匹配度较高的行(出现的字数)将按降序排列为最匹配的行。
如果可能的话,我希望在一个查询中完成此操作。
结果中不应出现两行。
出于性能考虑,表中的行数不会超过10000行。
表中的值是相当静态的,几乎没有变化,但并非完全如此
我现在无法更改结构,但稍后会考虑(如单词/短语表)
为了稍微复杂一点,单词列表在两个表中——但我可以为此创建一个视图,但在给定相同匹配的情况下,一个表结果(较小的列表)应该出现在第二个较大的数据集结果之前——这些表和一个表中都会有重复,我只想要不同的值。选择DISTINCT并不容易,因为我想返回一列(sourceTable),这很可能会使行不同,在这种情况下,只从第一个(较小的)表中选择,但所有其他列都需要DISTINCT(不要在"不同"评估中考虑该列。
表中的Psuedo列:
procedureCode VARCHAR(50),
description VARCHAR(100), -- this is the sort/evaluation column
category VARCHAR(50),
relvu VARCHAR(50),
charge VARCHAR(15),
active bit
sourceTable VARCHAR(50) - just shows which table it comes from of the two
不存在像ID列那样的唯一索引
在要排除的第三个表中匹配NOT SELECT * FROM (select * from tableone where procedureCode not in (select procedureCode from tablethree))
UNION ALL
(select * from tabletwo where procedureCode not in (select procedureCode from tablethree))
编辑:为了解决这个问题,我创建了一个表值参数,如下所示:
0 Gastric Intubation & Aspiration/Lavage, Treatmen
1 Gastric%Intubation%Aspiration%Lavage%Treatmen
2 Gastric%Intubation%Aspiration%Lavage
3 Gastric%Intubation%Aspiration
4 Gastric%Intubation
5 Gastric
6 Intubation%Aspiration%Lavage%Treatmen
7 Intubation%Aspiration%Lavage
8 Intubation%Aspiration
9 Intubation
10 Aspiration%Lavage%Treatmen
11 Aspiration%Lavage
12 Aspiration
13 Lavage%Treatmen
14 Lavage
15 Treatmen
其中实际短语在第0行
这是我目前的尝试:
CREATE PROCEDURE [GetProcedureByDescription]
(
@IncludeMaster BIT,
@ProcedureSearchPhrases CPTFavorite READONLY
)
AS
DECLARE @myIncludeMaster BIT;
SET @myIncludeMaster = @IncludeMaster;
CREATE TABLE #DistinctMatchingCpts
(
procedureCode VARCHAR(50),
description VARCHAR(100),
category VARCHAR(50),
rvu VARCHAR(50),
charge VARCHAR(15),
active VARCHAR(15),
sourceTable VARCHAR(50),
sequenceSet VARCHAR(2)
)
IF @myIncludeMaster = 0
BEGIN -- Excluding master from search
INSERT INTO #DistinctMatchingCpts (sourceTable, procedureCode, description , category ,charge, active, rvu, sequenceSet
)
SELECT DISTINCT sourceTable, procedureCode, description, category ,charge, active, rvu, sequenceSet
FROM (
SELECT TOP 1
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''01'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] = PP.[LEVEL]
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
ORDER BY PP.CODE
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM([CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''02'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%''
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''03'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%''
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
) AS CPTS
ORDER BY
procedureCode, sourceTable, [description]
END -- Excluded master from search
ELSE
BEGIN -- Including master in search, but present favorites before master for each code
-- Get matching procedures, ordered by code, source (favorites first), and description.
-- There probably will be procedures with duplicated code+description, so we will filter
-- duplicates shortly.
INSERT INTO #DistinctMatchingCpts (sourceTable, procedureCode, description , category ,charge, active, rvu, sequenceSet)
SELECT DISTINCT sourceTable, procedureCode, description, category ,charge, active, rvu, sequenceSet
FROM (
SELECT TOP 1
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''00'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] = PP.[LEVEL]
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
ORDER BY PP.CODE
UNION ALL
SELECT TOP 1
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[CATEGORY])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''2MasterCPT'' AS sourceTable,
''00'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [MASTERCPT] AS CPT
ON CPT.[LEVEL] = PP.[LEVEL]
WHERE
CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
ORDER BY PP.CODE
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''01'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] = PP.[LEVEL]
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[CATEGORY])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''2MasterCPT'' AS sourceTable,
''01'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [MASTERCPT] AS CPT
ON CPT.[LEVEL] = PP.[LEVEL]
WHERE
CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT TOP 1
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''02'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%''
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
ORDER BY PP.CODE
UNION ALL
SELECT TOP 1
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[CATEGORY])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''2MasterCPT'' AS sourceTable,
''02'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [MASTERCPT] AS CPT
ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%''
WHERE
CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
ORDER BY PP.CODE
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''03'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%''
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[CATEGORY])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''2MasterCPT'' AS sourceTable,
''03'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [MASTERCPT] AS CPT
ON CPT.[LEVEL] LIKE PP.[LEVEL] + ''%''
WHERE
CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[COMBO])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
''True'' AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''0CPTMore'' AS sourceTable,
''04'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [CPTMORE] AS CPT
ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%''
WHERE
(CPT.[COMBO] IS NULL OR CPT.[COMBO] NOT IN (''Editor'',''MOD'',''CATEGORY'',''Types'',''Bundles''))
AND CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
UNION ALL
SELECT
LTRIM(RTRIM(CPT.[CODE])) AS procedureCode,
LTRIM(RTRIM(CPT.[LEVEL])) AS description,
LTRIM(RTRIM(CPT.[CATEGORY])) AS category,
LTRIM(RTRIM(CPT.[CHARGE])) AS charge,
COALESCE(CASE [ACTIVE] WHEN 1 THEN ''True'' WHEN 0 THEN ''False'' WHEN '''' THEN ''False'' ELSE ''False'' END,''True'') AS active,
LTRIM(RTRIM([RVU])) AS rvu,
''2MasterCPT'' AS sourceTable,
''04'' AS sequenceSet
FROM
@ProcedureSearchPhrases PP
INNER JOIN [MASTERCPT] AS CPT
ON CPT.[LEVEL] LIKE ''%'' + PP.[LEVEL] + ''%''
WHERE
CPT.[CODE] IS NOT NULL
AND CPT.[CODE] NOT IN (''0'', '''')
AND CPT.[CODE] NOT IN (SELECT CPTE.[CODE] FROM CPT AS CPTE WHERE CPTE.[CODE] IS NOT NULL)
) AS CPTS
ORDER BY
sequenceSet, sourceTable, [description]
END
/* Final select - uses artificial ordering from the insertion ORDER BY */
SELECT procedureCode, description, category, rvu, charge, active FROM
(
SELECT TOP 500 *-- procedureCode, description, category, rvu, charge, active
FROM #DistinctMatchingCpts
ORDER BY sequenceSet, sourceTable, description
) AS CPTROWS
DROP TABLE #DistinctMatchingCpts
然而,这不符合单词计数的最佳匹配标准(如样本中的第1行值),该单词计数应与该行中找到的最佳(最多)单词计数相匹配。
如果有区别的话,我可以完全控制表值参数的形式/格式。
如果有用的话,我会将这个结果返回到c#程序中。
您需要能够拆分字符串来解决这个问题。我更喜欢TSQL中的数字表方法来拆分字符串
为了让我下面的代码工作(以及我的分割功能),你需要做一个时间表设置:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
一旦设置了Numbers表,就创建这个分割函数:
CREATE FUNCTION [dbo].[FN_ListToTable]
(
@SplitOn char(1) --REQUIRED, the character to split the @List string on
,@List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
----------------
--SINGLE QUERY-- --this will not return empty rows
----------------
SELECT
ListValue
FROM (SELECT
LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(@SplitOn, List2, number+1)-number - 1))) AS ListValue
FROM (
SELECT @SplitOn + @List + @SplitOn AS List2
) AS dt
INNER JOIN Numbers n ON n.Number < LEN(dt.List2)
WHERE SUBSTRING(List2, number, 1) = @SplitOn
) dt2
WHERE ListValue IS NOT NULL AND ListValue!=''
);
GO
您可以随意创建自己的split函数,但您仍然需要Numbers表才能使我的解决方案发挥作用。
现在,您可以轻松地将CSV字符串拆分成一个表并加入其中:
select * from dbo.FN_ListToTable(',','1,2,3,,,4,5,6777,,,')
输出:
ListValue
-----------------------
1
2
3
4
5
6777
(6 row(s) affected)
现在试试这个:
DECLARE @BaseTable table (RowID int primary key, RowValue varchar(100))
set nocount on
INSERT @BaseTable VALUES ( 1,'The cows came home empty handed')
INSERT @BaseTable VALUES ( 2,'This is my string as a test template to rank') -- third
INSERT @BaseTable VALUES ( 3,'pencil pen paperclip eraser')
INSERT @BaseTable VALUES ( 4,'wow')
INSERT @BaseTable VALUES ( 5,'no dice here')
INSERT @BaseTable VALUES ( 6,'This is my string as a test template to rank on even though not exact.') -- second
INSERT @BaseTable VALUES ( 7,'apple banana pear grape lemon orange kiwi strawberry peach watermellon')
INSERT @BaseTable VALUES ( 8,'This is my string as a test template') -- 5th
INSERT @BaseTable VALUES ( 9,'rat cat bat mat sat fat hat pat ')
INSERT @BaseTable VALUES (10,'house home pool roll')
INSERT @BaseTable VALUES (11,'This is my string as a test template to') -- 4th
INSERT @BaseTable VALUES (12,'talk wisper yell scream sing hum')
INSERT @BaseTable VALUES (13,'This is my string as a test template to rank on.') -- first
INSERT @BaseTable VALUES (14,'aaa bbb ccc ddd eee fff ggg hhh')
INSERT @BaseTable VALUES (15,'three twice three once twice three')
set nocount off
DECLARE @SearchValue varchar(100)
SET @SearchValue='This is my value string as a test template to rank on.'
;WITH SplitBaseTable AS --expand each @BaseTable row into one row per word
(SELECT
b.RowID, b.RowValue, s.ListValue
FROM @BaseTable b
CROSS APPLY dbo.FN_ListToTable(' ',b.RowValue) AS s
)
, WordMatchCount AS --for each @BaseTable row that has has a word in common withe the search string, get the count of matching words
(SELECT
s.RowID,COUNT(*) AS CountOfWordMatch
FROM dbo.FN_ListToTable(' ',@SearchValue) v
INNER JOIN SplitBaseTable s ON v.ListValue=s.ListValue
GROUP BY s.RowID
HAVING COUNT(*)>0
)
, SearchLen AS --get one row for each possible length of the search string
(
SELECT
n.Number,SUBSTRING(@SearchValue,1,n.Number) AS PartialSearchValue
FROM Numbers n
WHERE n.Number<=LEN(@SearchValue)
)
, MatchLen AS --for each @BaseTable row, get the max starting length that matches the search string
(
SELECT
b.RowID,MAX(l.Number) MatchStartLen
FROM @BaseTable b
LEFT OUTER JOIN SearchLen l ON LEFT(b.RowValue,l.Number)=l.PartialSearchValue
GROUP BY b.RowID
)
SELECT --return the final search results
b.RowValue,w.CountOfWordMatch,m.MatchStartLen
FROM @BaseTable b
LEFT OUTER JOIN WordMatchCount w ON b.RowID=w.RowID
LEFT OUTER JOIN MatchLen m ON b.RowID=m.RowID
WHERE w.CountOfWordMatch>0
ORDER BY w.CountOfWordMatch DESC,m.MatchStartLen DESC,LEN(b.RowValue) DESC,b.RowValue ASC
输出:
RowValue CountOfWordMatch MatchStartLen
----------------------------------------------------------------------- ---------------- -------------
This is my string as a test template to rank on. 11 11
This is my string as a test template to rank on even though not exact. 10 11
This is my string as a test template to rank 10 11
This is my string as a test template to 9 11
This is my string as a test template 8 11
(5 row(s) affected)
它对字符串单词的开头进行匹配,这有点不同,因为它查看字符串开头匹配的字符数。
一旦你做到了这一点,你可以尝试通过为SplitBaseTable创建一些静态索引表来优化它。可能在@BaseTable上使用触发器。
听起来您正在寻找一种匹配算法,如果不使用存储过程,可能很难创建该算法。根据过去的经验,有一些编辑距离算法(如Levenstein)在确定相似性方面非常有用。这些返回一个数字,有时是字符串之间的差值,您可以在这个数字上创建自己的加权方程来打分。然后,您可以创建排名或分数阈值,以降低假阴性/阳性。
我不久前也有类似的问题。我试图回答的问题是,在两个不同的列之间有多少单词匹配,以及根据匹配单词的最高百分比进行排名。我无法理解,但马丁给了我一个绝妙的答复。
在这里查看他对我问题的回答。
所有问题的一个答案是:使用sphynxhttp://sphinxsearch.com并且不要在SQL中解决这个问题。
Sphynx是开源的,可与所有数据库和操作系统配合使用。
这就是craigslist正在使用的。
这是在发表这篇文章时最好的外部全文搜索系统。它将根据您所要求的相关性对结果进行排序,并且您不需要花哨的SQL表或SQL过程。试试看。