查找字符串C#中出现次数最多的单词
本文关键字:单词 字符串 查找 | 更新日期: 2023-09-27 18:19:28
我正在尝试查找字符串中单词的最高出现次数。
例如
Hello World This is a great world, This World is simply great
根据上面的字符串,我试图计算如下结果:
- 世界,3
- 很棒,2
- 你好,1
- 这个,2
但忽略长度小于3个字符的任何单词,例如出现两次的CCD_。
我试图研究Dictionary<key, value>
对,我试图研究linq的GroupBy
扩展。我知道解决方案介于两者之间,但我就是无法理解算法以及如何完成这项工作。
使用LINQ和Regex
Regex.Split("Hello World This is a great world, This World is simply great".ToLower(), @"'W+")
.Where(s => s.Length > 3)
.GroupBy(s => s)
.OrderByDescending(g => g.Count())
所以我会避免使用LINQ和Regex等,因为听起来你正在试图找到一种算法并理解它,而不是使用一些函数来为你做这件事。
并不是说这些不是有效的解决方案。确实如此。肯定
试试这个
Dictionary<string, int> dictionary = new Dictionary<string, int>();
string sInput = "Hello World, This is a great World. I love this great World";
sInput = sInput.Replace(",", ""); //Just cleaning up a bit
sInput = sInput.Replace(".", ""); //Just cleaning up a bit
string[] arr = sInput.Split(' '); //Create an array of words
foreach (string word in arr) //let's loop over the words
{
if (word.Length >= 3) //if it meets our criteria of at least 3 letters
{
if (dictionary.ContainsKey(word)) //if it's in the dictionary
dictionary[word] = dictionary[word] + 1; //Increment the count
else
dictionary[word] = 1; //put it in the dictionary with a count 1
}
}
foreach (KeyValuePair<string, int> pair in dictionary) //loop through the dictionary
Response.Write(string.Format("Key: {0}, Pair: {1}<br />",pair.Key,pair.Value));
我写了一个字符串处理器类。你可以使用它。
示例:
metaKeywords = bodyText.Process(blackListWords: prepositions).OrderByDescending().TakeTop().GetWords().AsString();
类别:
public static class StringProcessor
{
private static List<String> PrepositionList;
public static string ToNormalString(this string strText)
{
if (String.IsNullOrEmpty(strText)) return String.Empty;
char chNormalKaf = (char)1603;
char chNormalYah = (char)1610;
char chNonNormalKaf = (char)1705;
char chNonNormalYah = (char)1740;
string result = strText.Replace(chNonNormalKaf, chNormalKaf);
result = result.Replace(chNonNormalYah, chNormalYah);
return result;
}
public static List<KeyValuePair<String, Int32>> Process(this String bodyText,
List<String> blackListWords = null,
int minimumWordLength = 3,
char splitor = ' ',
bool perWordIsLowerCase = true)
{
string[] btArray = bodyText.ToNormalString().Split(splitor);
long numberOfWords = btArray.LongLength;
Dictionary<String, Int32> wordsDic = new Dictionary<String, Int32>(1);
foreach (string word in btArray)
{
if (word != null)
{
string lowerWord = word;
if (perWordIsLowerCase)
lowerWord = word.ToLower();
var normalWord = lowerWord.Replace(".", "").Replace("(", "").Replace(")", "")
.Replace("?", "").Replace("!", "").Replace(",", "")
.Replace("<br>", "").Replace(":", "").Replace(";", "")
.Replace("،", "").Replace("-", "").Replace("'n", "").Trim();
if ((normalWord.Length > minimumWordLength && !normalWord.IsMemberOfBlackListWords(blackListWords)))
{
if (wordsDic.ContainsKey(normalWord))
{
var cnt = wordsDic[normalWord];
wordsDic[normalWord] = ++cnt;
}
else
{
wordsDic.Add(normalWord, 1);
}
}
}
}
List<KeyValuePair<String, Int32>> keywords = wordsDic.ToList();
return keywords;
}
public static List<KeyValuePair<String, Int32>> OrderByDescending(this List<KeyValuePair<String, Int32>> list, bool isBasedOnFrequency = true)
{
List<KeyValuePair<String, Int32>> result = null;
if (isBasedOnFrequency)
result = list.OrderByDescending(q => q.Value).ToList();
else
result = list.OrderByDescending(q => q.Key).ToList();
return result;
}
public static List<KeyValuePair<String, Int32>> TakeTop(this List<KeyValuePair<String, Int32>> list, Int32 n = 10)
{
List<KeyValuePair<String, Int32>> result = list.Take(n).ToList();
return result;
}
public static List<String> GetWords(this List<KeyValuePair<String, Int32>> list)
{
List<String> result = new List<String>();
foreach (var item in list)
{
result.Add(item.Key);
}
return result;
}
public static List<Int32> GetFrequency(this List<KeyValuePair<String, Int32>> list)
{
List<Int32> result = new List<Int32>();
foreach (var item in list)
{
result.Add(item.Value);
}
return result;
}
public static String AsString<T>(this List<T> list, string seprator = ", ")
{
String result = string.Empty;
foreach (var item in list)
{
result += string.Format("{0}{1}", item, seprator);
}
return result;
}
private static bool IsMemberOfBlackListWords(this String word, List<String> blackListWords)
{
bool result = false;
if (blackListWords == null) return false;
foreach (var w in blackListWords)
{
if (w.ToNormalString().Equals(word))
{
result = true;
break;
}
}
return result;
}
}
const string input = "Hello World This is a great world, This World is simply great";
var words = input
.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
.Where(w => w.Length >= 3)
.GroupBy(w => w)
.OrderByDescending(g => g.Count());
foreach (var word in words)
Console.WriteLine("{0}x {1}", g.Count(), word.Key);
// 2x World
// 2x This
// 2x great
// 1x Hello
// 1x world,
// 1x simply
这并不完美,因为它没有修剪逗号,但它至少向您展示了如何进行分组和过滤。
string words = "Hello World This is a great world, This World is simply great".ToLower();
var results = words.Split(' ').Where(x => x.Length > 3)
.GroupBy(x => x)
.Select(x => new { Count = x.Count(), Word = x.Key })
.OrderByDescending(x => x.Count);
foreach (var item in results)
Console.WriteLine(String.Format("{0} occured {1} times", item.Word, item.Count));
Console.ReadLine();
要获得出现次数最多的单词:
results.First().Word;
您应该能够使用Linq 来完成此操作
string[] splitString = actualString.Split(' ');
var arrayCount = splitString.GroupBy(a => a);
foreach (var r in arrayCount)
{
Console.WriteLine("This " + r.Key + " appeared " + r.Count() + " times in a string.");
}
这可以通过许多不同的方式来解决。链接以供参考。