用关键字提取句子
本文关键字:句子 提取 关键字 | 更新日期: 2023-09-27 18:12:07
我有这个问题
编写程序,从文本中提取包含特定单词的所有句子。我们接受句子之间用字符"."隔开,单词之间用非字母字符隔开。
示例文本:
We are living in a yellow submarine. We don't have anything else. Inside the submarine is very tight. So we are drinking all the day. We will move out of it in 5 days.
We are living in a yellow submarine.
We will move out of it in 5 days.
这是我到目前为止的代码。
public static string Extract(string str, string keyword)
{
string[] arr = str.Split('.');
string answer = string.Empty;
foreach(string sentence in arr)
{
var iter = sentence.GetEnumerator();
while(iter.MoveNext())
{
if(iter.Current.ToString() == keyword)
answer += sentence;
}
}
return answer;
}
它不起作用。我用下面的代码调用它:
string example = "We are living in a yellow submarine. We don't have anything else. Inside the submarine is very tight. So we are drinking all the day. We will move out of it in 5 days.";
string keyword = "in";
string answer = Extract(example, keyword);
Console.WriteLine(answer);
不输出任何内容。可能是迭代器部分,因为我不熟悉迭代器。
无论如何,这个问题的提示说我们应该使用split
和IndexOf
方法
sentence.GetEnumerator()
返回一个CharEnumerator
,因此您要检查每个句子中的每个字符。单个字符永远不会等于字符串"in",这就是它不起作用的原因。您需要查看每个句子中的每个单词,并与您要查找的术语进行比较。
尝试:
public static string Extract(string str, string keyword)
{
string[] arr = str.Split('.');
string answer = string.Empty;
foreach(string sentence in arr)
{
//Add any other required punctuation characters for splitting words in the sentence
string[] words = sentence.Split(new char[] { ' ', ',' });
if(words.Contains(keyword)
{
answer += sentence;
}
}
return answer;
}
您的代码使用迭代器逐个字符遍历每个句子。除非关键字是一个单字符的单词(例如:"I"或"a")将没有匹配。
解决这个问题的一种方法是使用LINQ来检查一个句子是否有关键字,像这样:
foreach(string sentence in arr)
{
if(sentence.Split(' ').Any(w => w == keyword))
answer += sentence+". ";
}
Demo on ideone.
另一种方法是使用正则表达式只在单词边界上检查匹配。请注意,您不能使用普通的Contains
方法,因为这样做会导致"误报"(即发现关键字嵌入在较长的单词中的句子)。
另一件要注意的事情是使用+=
进行连接。这种方法效率很低,因为会创建许多临时的一次性对象。实现相同结果的更好方法是使用StringBuilder
。
string input = "We are living in a yellow submarine. We don't have anything else. Inside the submarine is very tight. So we are drinking all the day. We will move out of it in 5 days.";
var lookup = input.Split('.')
.Select(s => s.Split().Select(w => new { w, s }))
.SelectMany(x => x)
.ToLookup(x => x.w, x => x.s);
foreach(var sentence in lookup["in"])
{
Console.WriteLine(sentence);
}
我会按句点分割输入,然后在每个句子中搜索给定的单词。
string metin = "We are living in a yellow submarine. We don't have anything else. Inside the submarine is very tight. So we are drinking all the day. We will move out of it in 5 days.";
string[] metinDizisi = metin.Split('.');
string answer = string.Empty;
for (int i = 0; i < metinDizisi.Length; i++)
{
if (metinDizisi[i].Contains(" in "))
{
answer += metinDizisi[i];
}
}
Console.WriteLine(answer);
您可以使用sentence.Contains(keyword)
检查字符串是否包含您要查找的单词。
public static string Extract(string str, string keyword)
{
string[] arr = str.Split('.');
string answer = string.Empty;
foreach(string sentence in arr)
if(sentence.Contains(keyword))
answer+=sentence;
return answer;
}
您可以拆分句点以获得一组句子,然后使用包含关键字的正则表达式过滤这些句子。
var results = example.Split('.')
.Where(s => Regex.IsMatch(s, String.Format(@"'b{0}'b", keyword)));