C# 全文搜索字符串格式:字符串删除所有相邻的重复项并附加“AND”“OR”
本文关键字:字符串 OR AND 搜索 文搜索 格式 删除 | 更新日期: 2023-09-27 18:30:22
在调用SQL查询之前,我正在寻找一种在c#中格式化用户输入的搜索字符串的解决方案
在表上启用了全文索引,查询如下所示
select [title] from publications where contains([title], @searchString)
主要问题:
1) add 'OR' by default between two words (ex C and C-1 below)
1) remove adjacent duplicate from search string<br>( ex a,b,b-1, e below)
2) remove 'AND' 'OR' at the end of the string (ex d below)
例子:
输入 => 输出
a) "oyster and oyster or fish and clean water" => "oyster or fish and clean OR water"<br>
b) "oyster and and fish and clean water" => "oyster and fish and clean OR water"<br>
b-1) "oyster oyster fish fish clean and water"=> "oyster or fish or clean and water"
c) "oyster fish" => "oyster or fish"<br>
c-1) "oyster fish clean water" => "oyster or fish or clean or water"
d) "oyster and" => "oyster"<br>
e) "oyster and oyster" => "oyster"<br>
当前代码(WCH在案例A,B和B-1中失败;适用于C-1,D,E)
string Format(string str)
{
List<string> searchKeywords = new List<string> { "and", "or" };
//convert to lower case
str = str.Replace(",", " ").ToLower();
Regex regex = new Regex(@"[ ]{2,}", RegexOptions.None);
//remove extra whitespace with space
str = regex.Replace(str, @" ");
//split string
string[] strArray = str.Split(' ');
List<string> outputArray = new List<string>();
string output = "";
string prevStr = "";
string currStr = "";
bool keywordFlag = false;
bool duplicateFlag = false;
//remove adjacent keyword or same words
foreach (var item in strArray)
{
currStr = item.Trim();
keywordFlag = searchKeywords.Contains(prevStr) && searchKeywords.Contains(currStr);
duplicateFlag = outputArray.Contains(currStr) && !searchKeywords.Contains(currStr);
if (!currStr.Equals(prevStr) && !keywordFlag && !duplicateFlag)
{
outputArray.Add(currStr);
prevStr = currStr;
}
}
if (outputArray.Count() == 2 && searchKeywords.Contains(outputArray[1]))
{
outputArray.Remove(outputArray[1]);
}
output = string.Join(" ", outputArray);
if (output.Contains(" ") && !output.Contains("and") && !output.Contains("or"))
{
return string.Join(" or ", output.Split(' ').Select(I => I.Trim()));
}
return output;
}
![电流输出][1]
牡蛎和鱼和干净的水
牡蛎和鱼和干净的水
牡蛎鱼清洁和水
牡蛎或鱼或清洁或水
牡蛎或鱼
牡蛎
牡蛎
不确定这个答案是否正确,非常感谢@saggio,提供建议。
private string FormatSearchString(string str)
{
List<string> searchKeywords = new List<string> { "and", "or" };
//convert to lower case
str = str.Replace(",", " ").ToLower();
Regex regex = new Regex(@"[ ]{2,}", RegexOptions.None);
//remove extra whitespace with space
str = regex.Replace(str, @" ");
//split string
string[] strArray = str.Split(' ');
List<string> outputArray = new List<string>();
string output = "";
string prevStr = "";
string currStr = "";
bool keywordFlag = false;
bool duplicateFlag = false;
//remove adjacent keyword or same words
foreach (var item in strArray)
{
currStr = item.Trim();
keywordFlag = searchKeywords.Contains(prevStr) && searchKeywords.Contains(currStr);
duplicateFlag = outputArray.Contains(currStr) && !searchKeywords.Contains(currStr);
if (!currStr.Equals(prevStr) && !keywordFlag && !duplicateFlag)
{
if (!searchKeywords.Contains(prevStr) && !searchKeywords.Contains(currStr) && prevStr != "")
{
outputArray.Add("or");
}
outputArray.Add(currStr);
prevStr = currStr;
}
}
if (outputArray.Count() == 2)
{
if (searchKeywords.Contains(outputArray[0]))
outputArray.Remove(outputArray[0]);
else
outputArray.Remove(outputArray[1]);
}
output = string.Join(" ", outputArray);
return output;
}
由于您还没有展示到目前为止所做的工作,因此我假设您尚未开始解决方案,因此这里有一个高级算法:
在这种情况下,请使用 String.Split(' ')
按每个空格拆分searchstring
。
对生成的字符串数组使用 foreach
循环并使用字符串连接来完成,如果之前已经使用了某个单词,则不是or
或and
,请不要将其添加到生成的字符串中。如果上一个单词是or
或and
,而当前单词也是,请不要将其添加到生成的字符串中。如果上一个单词不是or
或and
,而当前单词不是,请将or
添加到生成的字符串中。
编辑:现在代码已经发布,我可以看到出了什么问题
此条件:
if (output.Contains(" ") && !output.Contains("and") && !output.Contains("or"))
{
return string.Join(" or ", output.Split(' ').Select(I => I.Trim()));
}
仅当输出不包含任何and
或or
实例时才被调用
foreach
循环中添加or
,并摆脱该条件
例如:
foreach (var item in strArray)
{
currStr = item.Trim();
keywordFlag = searchKeywords.Contains(prevStr) && searchKeywords.Contains(currStr);
duplicateFlag = outputArray.Contains(currStr) && !searchKeywords.Contains(currStr);
if (!currStr.Equals(prevStr) && !keywordFlag && !duplicateFlag)
{
if (!searchKeywords.Contains(prevStr) && !searchKeywords.Contains(currStr) && prevStr != "")
{
outputArray.Add("or");
}
outputArray.Add(currStr);
prevStr = currStr;
}
}
此外,当您检查数组中是否只有 2 个标记时,您只考虑它们是否在单词后放置or
或and
,如果他们输入or Oyster
作为输入字符串会发生什么? 生成的字符串将只是or
您需要考虑这一点:
if (outputArray.Count() == 2)
{
if(searchKeywords.Contains(outputArray[0]))
outputArray.Remove(outputArray[0]);
else if(searchKeywords.Contains(outputArray[1]))
outputArray.Remove(outputArray[1]);
}