用关键字提取句子

本文关键字:句子 提取 关键字 | 更新日期: 2023-09-27 18:12:07

我有这个问题

编写程序,从文本中提取包含特定单词的所有句子。我们接受句子之间用字符"."隔开,单词之间用非字母字符隔开。

示例文本:

We are living in a yellow submarine. We don't have anything else. Inside the submarine is very tight. So we are drinking all the day. We will move out of it in 5 days.

样本结果:

We are living in a yellow submarine.
We will move out of it in 5 days. 

这是我到目前为止的代码。

public static string Extract(string str, string keyword)
    {
        string[] arr = str.Split('.');
        string answer = string.Empty;
        foreach(string sentence in arr)
        {
            var iter = sentence.GetEnumerator();
            while(iter.MoveNext())
            {
                if(iter.Current.ToString() == keyword)
                    answer += sentence;
            }
        }
        return answer;
    }

它不起作用。我用下面的代码调用它:

string example = "We are living in a yellow submarine. We don't have anything else. Inside the submarine is very tight. So we are drinking all the day. We will move out of it in 5 days.";
string keyword = "in";
string answer = Extract(example, keyword);
Console.WriteLine(answer);

不输出任何内容。可能是迭代器部分,因为我不熟悉迭代器。

无论如何,这个问题的提示说我们应该使用splitIndexOf方法

用关键字提取句子

sentence.GetEnumerator()返回一个CharEnumerator,因此您要检查每个句子中的每个字符。单个字符永远不会等于字符串"in",这就是它不起作用的原因。您需要查看每个句子中的每个单词,并与您要查找的术语进行比较。

尝试:

public static string Extract(string str, string keyword)
{
    string[] arr = str.Split('.');
    string answer = string.Empty;
    foreach(string sentence in arr)
    {
        //Add any other required punctuation characters for splitting words in the sentence
        string[] words = sentence.Split(new char[] { ' ', ',' });
        if(words.Contains(keyword)
        {
            answer += sentence;
        }
    }
    return answer;
}

您的代码使用迭代器逐个字符遍历每个句子。除非关键字是一个单字符的单词(例如:"I"或"a")将没有匹配。

解决这个问题的一种方法是使用LINQ来检查一个句子是否有关键字,像这样:

foreach(string sentence in arr)
{
    if(sentence.Split(' ').Any(w => w == keyword))
            answer += sentence+". ";
}

Demo on ideone.

另一种方法是使用正则表达式只在单词边界上检查匹配。请注意,您不能使用普通的Contains方法,因为这样做会导致"误报"(即发现关键字嵌入在较长的单词中的句子)。

另一件要注意的事情是使用+=进行连接。这种方法效率很低,因为会创建许多临时的一次性对象。实现相同结果的更好方法是使用StringBuilder

string input = "We are living in a yellow submarine. We don't have anything else. Inside the submarine is very tight. So we are drinking all the day. We will move out of it in 5 days.";
var lookup = input.Split('.')
                .Select(s => s.Split().Select(w => new { w, s }))
                .SelectMany(x => x)
                .ToLookup(x => x.w, x => x.s);
foreach(var sentence  in lookup["in"])
{
    Console.WriteLine(sentence);
}

我会按句点分割输入,然后在每个句子中搜索给定的单词。

string metin = "We are living in a yellow submarine. We don't have anything else. Inside the submarine is very tight. So we are drinking all the day. We will move out of it in 5 days.";
string[] metinDizisi = metin.Split('.');
string answer = string.Empty;
for (int i = 0; i < metinDizisi.Length; i++)
{
    if (metinDizisi[i].Contains(" in "))
    {
        answer += metinDizisi[i];
    }
}
Console.WriteLine(answer);

您可以使用sentence.Contains(keyword)检查字符串是否包含您要查找的单词。

public static string Extract(string str, string keyword)
    {
        string[] arr = str.Split('.');
        string answer = string.Empty;
        foreach(string sentence in arr)
            if(sentence.Contains(keyword))
                answer+=sentence;
        return answer;
    }

您可以拆分句点以获得一组句子,然后使用包含关键字的正则表达式过滤这些句子。

var results = example.Split('.')
    .Where(s => Regex.IsMatch(s, String.Format(@"'b{0}'b", keyword)));