C# 全文搜索字符串格式:字符串删除所有相邻的重复项并附加“AND”“OR”

本文关键字:字符串 OR AND 搜索 文搜索 格式 删除 | 更新日期: 2023-09-27 18:30:22

在调用SQL查询之前,我正在寻找一种在c#中格式化用户输入的搜索字符串的解决方案

在表上启用了全文索引,查询如下所示

select [title] from publications where contains([title], @searchString)

主要问题:

1) add 'OR' by default between two words (ex C and C-1 below)
1) remove adjacent duplicate from search string<br>( ex a,b,b-1, e below)
2) remove 'AND' 'OR' at the end of the string (ex d below)

例子:
输入 => 输出

a)   "oyster and oyster or fish and clean water" => "oyster or fish and clean OR water"<br>
b)   "oyster and and fish and clean water" => "oyster and fish and clean OR water"<br>
b-1) "oyster oyster fish fish clean and water"=> "oyster or fish or clean and water"
c)   "oyster fish" => "oyster or fish"<br>
c-1) "oyster fish clean water" => "oyster or fish or clean or water"
d)   "oyster and" => "oyster"<br>
e)   "oyster and oyster" => "oyster"<br>

当前代码(WCH在案例A,B和B-1中失败;适用于C-1,D,E)

 string Format(string str)
    {
        List<string> searchKeywords = new List<string> { "and", "or" };
        //convert to lower case
        str = str.Replace(",", " ").ToLower();
        Regex regex = new Regex(@"[ ]{2,}", RegexOptions.None);
        //remove extra whitespace with space
        str = regex.Replace(str, @" ");
        //split string 
        string[] strArray = str.Split(' ');
        List<string> outputArray = new List<string>();
        string output = "";
        string prevStr = "";
        string currStr = "";
        bool keywordFlag = false;
        bool duplicateFlag = false;
        //remove adjacent keyword or same words
        foreach (var item in strArray)
        {
            currStr = item.Trim();
            keywordFlag = searchKeywords.Contains(prevStr) && searchKeywords.Contains(currStr);
            duplicateFlag = outputArray.Contains(currStr) && !searchKeywords.Contains(currStr);
            if (!currStr.Equals(prevStr) && !keywordFlag && !duplicateFlag)
            {
                outputArray.Add(currStr);
                prevStr = currStr;
            }
        }
        if (outputArray.Count() == 2 && searchKeywords.Contains(outputArray[1]))
        {
            outputArray.Remove(outputArray[1]);
        }
        output = string.Join(" ", outputArray);
        if (output.Contains(" ") && !output.Contains("and") && !output.Contains("or"))
        {
            return string.Join(" or ", output.Split(' ').Select(I => I.Trim()));
        }
        return output;
    }


![电流输出][1]

牡蛎和鱼和干净的水
牡蛎和鱼和干净的水
牡蛎鱼清洁和水
牡蛎或鱼或清洁或水
牡蛎或鱼
牡蛎
牡蛎

C# 全文搜索字符串格式:字符串删除所有相邻的重复项并附加“AND”“OR”

不确定这个答案是否正确,非常感谢@saggio,提供建议。

private string FormatSearchString(string str)
    {
        List<string> searchKeywords = new List<string> { "and", "or" };
        //convert to lower case
        str = str.Replace(",", " ").ToLower();
        Regex regex = new Regex(@"[ ]{2,}", RegexOptions.None);
        //remove extra whitespace with space
        str = regex.Replace(str, @" ");
        //split string 
        string[] strArray = str.Split(' ');
        List<string> outputArray = new List<string>();
        string output = "";
        string prevStr = "";
        string currStr = "";
        bool keywordFlag = false;
        bool duplicateFlag = false;
        //remove adjacent keyword or same words
        foreach (var item in strArray)
        {
            currStr = item.Trim();
            keywordFlag = searchKeywords.Contains(prevStr) && searchKeywords.Contains(currStr);
            duplicateFlag = outputArray.Contains(currStr) && !searchKeywords.Contains(currStr);
            if (!currStr.Equals(prevStr) && !keywordFlag && !duplicateFlag)
            {
                if (!searchKeywords.Contains(prevStr) && !searchKeywords.Contains(currStr) && prevStr != "")
                {
                    outputArray.Add("or");
                }
                outputArray.Add(currStr);
                prevStr = currStr;
            }
        }
        if (outputArray.Count() == 2)
        {
            if (searchKeywords.Contains(outputArray[0]))
                outputArray.Remove(outputArray[0]);
            else
                outputArray.Remove(outputArray[1]);
        }
        output = string.Join(" ", outputArray);
        return output;
    }

由于您还没有展示到目前为止所做的工作,因此我假设您尚未开始解决方案,因此这里有一个高级算法:

在这种情况下,请使用 String.Split(' ') 按每个空格拆分searchstring

对生成的字符串数组使用 foreach 循环并使用字符串连接来完成,如果之前已经使用了某个单词,则不是orand,请不要将其添加到生成的字符串中。如果上一个单词是orand,而当前单词也是,请不要将其添加到生成的字符串中。如果上一个单词不是orand,而当前单词不是,请将or添加到生成的字符串中。

编辑:现在代码已经发布,我可以看到出了什么问题

此条件:

    if (output.Contains(" ") && !output.Contains("and") && !output.Contains("or"))
    {
        return string.Join(" or ", output.Split(' ').Select(I => I.Trim()));
    }

仅当输出不包含任何andor实例时才被调用

检查是否需要在

foreach循环中添加or,并摆脱该条件

例如:

            foreach (var item in strArray)
            {
                currStr = item.Trim();
                keywordFlag = searchKeywords.Contains(prevStr) && searchKeywords.Contains(currStr);
                duplicateFlag = outputArray.Contains(currStr) && !searchKeywords.Contains(currStr);
                if (!currStr.Equals(prevStr) && !keywordFlag && !duplicateFlag)
                {
                    if (!searchKeywords.Contains(prevStr) && !searchKeywords.Contains(currStr) && prevStr != "")
                    {
                        outputArray.Add("or");
                    }
                    outputArray.Add(currStr);
                    prevStr = currStr;
                }
            }

此外,当您检查数组中是否只有 2 个标记时,您只考虑它们是否在单词后放置orand,如果他们输入or Oyster作为输入字符串会发生什么? 生成的字符串将只是or

您需要考虑这一点:

            if (outputArray.Count() == 2)
            {
                if(searchKeywords.Contains(outputArray[0]))
                    outputArray.Remove(outputArray[0]);
                else if(searchKeywords.Contains(outputArray[1]))
                    outputArray.Remove(outputArray[1]);
            }