从字符串中的单个集合中删除单词的多个实例
本文关键字:实例 单词 集合 字符串 单个 删除 | 更新日期: 2023-09-27 18:02:17
我有一个字符串street
,它可能包含:
street= "Siegfriedst strasse st 16.";
street= "Frontos strasse s .";
我想删除额外的"st", "strasse"answers"s"。
我使用:
street= street.Replace("(", "").Replace(")", "").Replace(".", "").
Replace("-", "").Replace("strasse","").
Replace("st","").Replace("s","");
但是我不想从"Siegfriedst"中删除"st",从"Frontos"中删除"s"。
也许这就是您想要的,如果您只想删除重复的单词或子字符串,则不清楚:
public static string RemoveDuplicates(string input, params string[] wordsToCheck)
{
var wordSet = new HashSet<string>(wordsToCheck);
int taken = 0;
var newWords = input.Split()
.Select(w => !wordSet.Contains(w) || ++taken == 1 ? w : "");
return string.Join(" ", newWords);
}
用法:
string text = RemoveDuplicates("Siegfriedst strasse st 16.", "st", "strasse", "s");
Result: Siegfriedst strasse 16.
street = street.Replace(".", " ") //To better enable pattern matching
.Replace(" strasse ", "")
.Replace(" st ", " ")
.Replace(" s ", "")
.Replace(" ", " ")
.Trim(); //Trim() removes the leading and trailing whitespaces
street = street.Replace("(", "")
.Replace(")", "")
.Replace(".", "")
.Replace("-", "")
.Replace(" strasse "," ")
.Replace(" st "," ")
.Replace(" s "," ");
你可以这样写一个辅助方法:
public static string Cleanup(string text, string[] exclude)
{
string[] parts = text.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
List<string> words = new List<string>();
foreach(string part in parts)
{
if (!exclude.Contains(part))
{
words.Add(part);
}
}
return string.Join(" ", words.ToArray());
}
,然后像这样使用
string street = Cleanup("Siegfriedst strasse st 16.", new string[] { "strasse", "st", "s", " " });
可以使用正则表达式。这将匹配整个单词"strasse" "st"或"s",但不匹配单词的部分:
using System.Text.RegularExpressions;
Regex rgx = new Regex(@"'b(strasse|st|s)'b|'(|')|'.");
street = rgx.Replace(street, "");
基于您使用串联Replace
操作的事实(因此不希望出现任何额外的字符串),我建议使用以下LINQ查询:
street = street.Split(' ').Where(s => s != "strasse" && s != "st" && s != "s").Aggregate((x, y) => x + " " + y);