将字符串拆分为单词数组

本文关键字:单词 数组 拆分 字符串 | 更新日期: 2023-09-27 18:30:28

我想在不使用string.Split的情况下将string拆分为单词数组。我已经尝试了这段代码,它正在工作,但无法将结果分配到数组中

string str = "Hello, how are you?";
string tmp = "";
int word_counter = 0;
for (int i = 0; i < str.Length; i++)
{
     if (str[i] == ' ')
     {
         word_counter++;
     }
}
string[] words = new string[word_counter+1];
for (int i = 0; i < str.Length; i++)
{
    if (str[i] != ' ')
    {
        tmp = tmp + str[i];
        continue;
    }
    // here is the problem, i cant assign every tmp in the array
    for (int j = 0; j < words.Length; j++)
    {
        words[j] = tmp;
    }
    tmp = "";
}

将字符串拆分为单词数组

你只需要一种index pointer来一个接一个地把你的项目放到数组中:

string str = "Hello, how are you?";
string tmp = "";
int word_counter = 0;
for (int i = 0; i < str.Length; i++) {
    if (str[i] == ' ') {
        word_counter++;
    }
}
string[] words = new string[word_counter + 1];
int currentWordNo = 0; //at this index pointer
for (int i = 0; i < str.Length; i++) {
    if (str[i] != ' ') {
        tmp = tmp + str[i];
        continue;
    }
    words[currentWordNo++] = tmp; //change your loop to this
    tmp = "";
}
words[currentWordNo++] = tmp; //do this for the last assignment

在我的示例中,索引指针名为 currentWordNo

尝试使用正则表达式,如下所示:

  string str = "Hello, how are you?";
  // words == ["Hello", "how", "are", "you"] 
  string[] words = Regex.Matches(str, "''w+")
    .OfType<Match>()
    .Select(m => m.Value)
    .ToArray();

String.Split不是一个好的选择,因为有太多的字符需要拆分:' '(空格)、'.'','';''!'等。

单词不仅仅是空格之间的东西,还有标点符号需要考虑,不间断的空格等。看看这样的输入:

  string str = "Bad(very bad) input to test. . ."

注意

  1. "坏"后没有空格
  2. 不间断空间
  3. 句号后的附加空间

正确的输出应该是

  ["Bad", "very", "bad", "input", "to", "test"] 

您还可以使用列表来创建单词列表:

    string str = "Hello, how are you?";
    string tmp = "";
    List<string> ListOfWords = new List<string>();
    int j = 0;
    for (int i = 0; i < str.Length; i++)
    {
        if (str[i] != ' ')
        {
            tmp = tmp + str[i];
            continue;
        }
        // here is the problem, i cant assign every tmp in the array
        ListOfWords.Add(tmp);
        tmp = "";
    }
    ListOfWords.Add(tmp);

这样就可以避免数字数,代码也更简单。使用 ListOfWord[x] 阅读任何单词