检查字符串是否包含子字符串列表,并保存匹配的子字符串

本文关键字:字符串 保存 列表 是否 包含 检查 | 更新日期: 2023-09-27 17:49:27

这是我的情况:我有一个字符串表示文本

string myText = "Text to analyze for words, bar, foo";   

和要搜索的单词列表

List<string> words = new List<string> {"foo", "bar", "xyz"};

我想知道最有效的方法,如果存在的话,获取文本中包含的单词列表,就像这样:

List<string> matches = myText.findWords(words)

检查字符串是否包含子字符串列表,并保存匹配的子字符串

除了必须使用Contains方法外,此查询中没有特殊的分析。所以你可以试试这个:

string myText = "Text to analyze for words, bar, foo";
List<string> words = new List<string> { "foo", "bar", "xyz" };
var result = words.Where(i => myText.Contains(i)).ToList();
//result: bar, foo

您可以使用HashSet<string>并使两个集合相交:

string myText = "Text to analyze for words, bar, foo"; 
string[] splitWords = myText.Split(' ', ',');
HashSet<string> hashWords = new HashSet<string>(splitWords,
                                                StringComparer.OrdinalIgnoreCase);
HashSet<string> words = new HashSet<string>(new[] { "foo", "bar" },
                                            StringComparer.OrdinalIgnoreCase);
hashWords.IntersectWith(words);

一个正则表达式解

var words = new string[]{"Lucy", "play", "soccer"};
var text = "Lucy loves going to the field and play soccer with her friend";
var match = new Regex(String.Join("|",words)).Match(text);
var result = new List<string>();
while (match.Success) {
    result.Add(match.Value);
    match = match.NextMatch();
}
//Result ["Lucy", "play", "soccer"]

播放的想法,你想要能够使用myText.findWords(words),你可以做一个扩展方法的字符串类做你想要的。

public static class StringExtentions
{
    public static List<string> findWords(this string str, List<string> words)
    {
        return words.Where(str.Contains).ToList();
    }
}

用法:

string myText = "Text to analyze for words, bar, foo";
List<string> words = new List<string> { "foo", "bar", "xyz" };
List<string> matches = myText.findWords(words);
Console.WriteLine(String.Join(", ", matches.ToArray()));
Console.ReadLine();

结果:

foo,酒吧

这里有一个简单的解决方案来解释空格和标点符号:

static void Main(string[] args)
{
    string sentence = "Text to analyze for words, bar, foo";            
    var words = Regex.Split(sentence, @"'W+");
    var searchWords = new List<string> { "foo", "bar", "xyz" };
    var foundWords = words.Intersect(searchWords);
    foreach (var item in foundWords)
    {
        Console.WriteLine(item);
    }
    Console.ReadLine();
}