从字符串中的单个集合中删除单词的多个实例

本文关键字:实例 单词 集合 字符串 单个 删除 | 更新日期: 2023-09-27 18:02:17

我有一个字符串street,它可能包含:

street= "Siegfriedst strasse st 16.";
street= "Frontos strasse s .";

我想删除额外的"st", "strasse"answers"s"。

我使用

:

 street= street.Replace("(", "").Replace(")", "").Replace(".", "").
                             Replace("-", "").Replace("strasse","").
                             Replace("st","").Replace("s","");

但是我不想从"Siegfriedst"中删除"st",从"Frontos"中删除"s"。

从字符串中的单个集合中删除单词的多个实例

也许这就是您想要的,如果您只想删除重复的单词或子字符串,则不清楚:

public static string RemoveDuplicates(string input, params string[] wordsToCheck)
{
    var wordSet = new HashSet<string>(wordsToCheck);
    int taken = 0;
    var newWords = input.Split()
        .Select(w => !wordSet.Contains(w) || ++taken == 1 ? w : "");
    return string.Join(" ", newWords);
}

用法:

string text = RemoveDuplicates("Siegfriedst strasse st 16.", "st", "strasse", "s");

Result: Siegfriedst strasse 16.

street = street.Replace(".", " ") //To better enable pattern matching
               .Replace(" strasse ", "")
               .Replace(" st ", " ")
               .Replace(" s ", "")
               .Replace("  ", " ")
               .Trim();  //Trim() removes the leading and trailing whitespaces
street = street.Replace("(", "")
               .Replace(")", "")
               .Replace(".", "")
               .Replace("-", "")
               .Replace(" strasse "," ")
               .Replace(" st "," ")
               .Replace(" s "," ");

你可以这样写一个辅助方法:

public static string Cleanup(string text, string[] exclude)
{
    string[] parts = text.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
    List<string> words = new List<string>();
    foreach(string part in parts)
    {
        if (!exclude.Contains(part))
        {
            words.Add(part);
        }
    }
    return string.Join(" ", words.ToArray());
}

,然后像这样使用

string street = Cleanup("Siegfriedst strasse st 16.", new string[] { "strasse", "st", "s", " " });

可以使用正则表达式。这将匹配整个单词"strasse" "st"或"s",但不匹配单词的部分:

using System.Text.RegularExpressions;
Regex rgx = new Regex(@"'b(strasse|st|s)'b|'(|')|'.");
street = rgx.Replace(street, "");

基于您使用串联Replace操作的事实(因此不希望出现任何额外的字符串),我建议使用以下LINQ查询:

street = street.Split(' ').Where(s => s != "strasse" && s != "st" && s != "s").Aggregate((x, y) => x + " " + y);