在连接词上分割字符串

本文关键字:字符串 分割 连接词 | 更新日期: 2023-09-27 18:15:02

我需要根据连接词拆分数组中的几个字符串,即on, in, from等。

string sampleString = "what was total sales for pencils from Japan in 1999";

预期的结果:

what was total sales
for pencils
from japan 
in 1999

我熟悉基于一个词而不是多个同时分割字符串:

string[] stringArray = sampleString.Split(new string[] {"of"}, StringSplitOptions.None);

有什么建议吗?

在连接词上分割字符串

对于这种特殊情况,您可以使用正则表达式来完成此操作。

你必须使用一种叫做向前看模式的东西,因为否则你分割的单词将从结果中删除。

下面是一个小的LINQPad程序,它演示了:

void Main()
{
    string sampleString = "what was total sales for pencils from Japan in 1999";
    Regex.Split(sampleString, @"'b(?=of|for|in|from)'b").Dump();
}
输出:

what was total sales  
for pencils  
from Japan  
in 1999 

但是,正如我在评论中所说的,它会被一些东西绊倒,比如包含你分割的任何单词的位置名称,所以:

string sampleString = "what was total sales for pencils from the Isle of Islay in 1999";
Regex.Split(sampleString, @"'b(?=of|for|in|from)'b").Dump();
输出:

what was total sales  
for pencils  
from the Isle  
of Islay  
in 1999 

正则表达式可以这样重写,以便在将来维护时更具表现力:

Regex.Split(sampleString, @"
    'b          # Must be a word boundary here
                # makes sure we don't match words that contain the split words, like 'fortune'
    (?=         # lookahead group, will match, but not be consumed/zero length
        of      # List of words, separated by the OR operator, |
        |for
        |in
        |from
    )
    'b          # Also a word boundary", RegexOptions.IgnorePatternWhitespace).Dump();

您可能还想将RegexOptions.IgnoreCase添加到选项中,以匹配"Of"answers"Of"等