从字符串列表中提取公共前缀

本文关键字:前缀 提取 字符串 列表 | 更新日期: 2023-09-27 18:06:53

我有一个字符串列表,例如:

{ abc001, abc002, abc003, cdef001, cdef002, cdef004, ghi002, ghi001 }

我想要得到所有常见的唯一前缀;例如,对于上面的列表:

{ abc, cdef, ghi }

我该怎么做?

从字符串列表中提取公共前缀

var list = new List<String> {
    "abc001", "abc002", "abc003", "cdef001",
    "cdef002", "cdef004", "ghi002", "ghi001"
};
var prefixes = list.Select(x = >Regex.Match(x, @"^[^'d]+").Value).Distinct();

编写一个helper类来表示您的数据可能是个好主意。例如:

public class PrefixedNumber
{
    private static Regex parser = new Regex(@"^('p{L}+)('d+)$");
    public PrefixedNumber(string source) // you may want a static Parse method.
    {
        Match parsed = parser.Match(source); // think about an error here when it doesn't match
        Prefix = parsed.Groups[1].Value;
        Index = parsed.Groups[2].Value;
    }
    public string Prefix { get; set; }
    public string Index { get; set; }
}

当然,您需要想出一个更好的名称和更好的访问修饰符。

现在任务很简单了:

List<string> data = new List<string> { "abc001", "abc002", "abc003", "cdef001",
                                       "cdef002", "cdef004", "ghi002", "ghi001" };
var groups = data.Select(str => new PrefixedNumber(str))
                 .GroupBy(prefixed => prefixed.Prefix);

结果是所有数据,经过解析,并按前缀分组。

您可以使用正则表达式选择文本部分,然后使用HashSet<string>添加该文本部分,因此不会添加重复:

using System.Text.RegularExpressions;

//simulate your real list 
List<string> myList = new List<string>(new string[] { "abc001", "abc002", "cdef001" });   
string pattern = @"^('D*)'d+$";
//  'D* any non digit characters, and 'd+ means followed by at least one digit,
// Note if you want also to capture string like "abc" alone without followed by numbers
// then the pattern will be "^('D*)$"
Regex regex = new Regex(pattern);
HashSet<string> matchesStrings = new HashSet<string>();
foreach (string item in myList)
{
    var match = regex.Match(item);
    if (match.Groups.Count > 1)
    {
        matchesString.Add(match.Groups[1].Value);
    }
}
结果:

abc, cde

假设您的前缀都是alpha字符,并且以第一个非alpha字符结尾,您可以使用以下LINQ表达式

List<string> listOfStrings = new List<String>() 
  { "abc001d", "abc002", "abc003", "cdef001", "cdef002", "cdef004", "ghi002", "ghi001" }; 
var prefixes = (from s in listOfStrings
                select new string(s.TakeWhile(c => char.IsLetter(c)).ToArray())).Distinct();