从字符串中提取颜色和大小

本文关键字:颜色 提取 字符串 | 更新日期: 2023-09-27 18:07:33

给出下列产品名称。我的任务是提取所有的颜色和大小。

示例:Nike Relay Women's Running Capris - **Black**, **L/XS**

Color = Black
Size = [XS,L]

做这件事最好的方法是什么?我想做一个所有颜色和尺寸的dictionary,然后做一个匹配。

但是必须有更好的方法和更可维护的方法。我看到的最大的问题是有这么多不同的组合

  1. Nautica S 蓝色骨织睡裤
  2. 耐克接力女子跑步紧身裤- 黑色XS
  3. Nautica男士j级睡裤-小号海军蓝
  4. Nautica j级梭织睡裤LMaritime Navy
  5. Nike Legend Tank - women - 黑色/黑色
  6. Nike 3PK DF缓冲无显示标签短袜-女式- 黑色/白色/黑色
  7. Stance休闲袜-男士Mahalo, L/XL
  8. Nautica耐皱连衣裙裤30x30灰色
  9. Nautica耐皱裤装36x30黑色
  10. Nautica耐皱裤装33x32黑色
  11. RVCA VA翻转盒修身t恤-短袖-男士BluestoneL
  12. RVCA VA翻转盒修身t恤-短袖-男士Bluestone
  13. RVCA VA翻转盒修身t恤-短袖-男士Bluestone S

从字符串中提取颜色和大小

这是耗时的,但服务于目的,整个想法是你必须有一个List/Collection可用的colorssizes,然后逐个迭代它们并检查

enum ColorBase {
    [Description("Blue")] //by using System.ComponentModel;  
    Blue,
    [Description("White")]
    White,
    [Description("Grey")]
    Grey,
    [Description("Magenta")]
    Magenta,
    [Description("Pale")]
    Pale,
    [Description("MaryTime Navy")]
    MaryTimeNavy,
    [Description("Navy")]
    Navy,
    [Description("Bluestone")]
    Bluestone,
}
enum SizeBase
{
    [Description("XL")]
    XL,
    [Description("XXL")]
    XXL,
    [Description("L")]
    L,
    [Description("M")]
    M,
    [Description("S")]
    S,
    [Description("XS")]
    XS,
    [Description("3X30")]
    S30X30,
    [Description("36X30")]
    S36X30,
    [Description("33X32")]
    S33X32
}

使用System.Reflection的辅助方法,它将返回上面声明的enumDescription

 public static string GetEnumDescription(Enum value)
    {
        FieldInfo fi = value.GetType().GetField(value.ToString());
        DescriptionAttribute[] attributes =
            (DescriptionAttribute[])fi.GetCustomAttributes(
            typeof(DescriptionAttribute),
            false);
        if (attributes != null &&
            attributes.Length > 0)
            return attributes[0].Description;
        else
            return value.ToString();
    }

,下面是对它们的访问:-

 static void Main(string[] args)
    {
      List<string> availableColorsAndSizes = new List<string>();
        string item = string.Empty;
        StringBuilder mediator = new StringBuilder();
        List<string> capries = new List<string>{"Nautica S Blue Bone Woven Pajama Pants",
                                                "Nike Relay Women's Running Capris - Black, XS",
                                                "Nautica Mens J-Class Pajama Pants-Small, NAVY",
                                                "Nautica J-Class Woven Pajama Pant L, Maritime Navy",
                                                "Nike Legend Tank - Womens - Black/Black",
                                                "Nike 3PK DF Cushion No Show Tab Socks - Womens - Black/White/Black",
                                                "Stance Casual Socks - Men's Mahalo, L/XL",
                                                "Nautica Wrinkle Resistant Dress Pant 30x30, Grey",
                                                "Nautica Wrinkle Resistant Dress Pant 36x30, Black",
                                                "Nautica Wrinkle Resistant Dress Pant 33x32, Black",
                                                "RVCA VA Flipped Box Slim T-Shirt - Short-Sleeve - Men's Bluestone, L",
                                                "RVCA VA Flipped Box Slim T-Shirt - Short-Sleeve - Men's Bluestone, M",
                                                "RVCA VA Flipped Box Slim T-Shirt - Short-Sleeve - Men's Bluestone, S",
                                                };
        foreach (var caprie in capries)
        {
            string[] words = caprie.Split(); //added this for WORD level precison
            foreach (ColorBase colorBase in Enum.GetValues(typeof(ColorBase)))
            {
                item = Program.GetEnumDescription(colorBase);
                if (caprie.Contains(item))
                    if (!mediator.ToString().Contains(item + ":"))//just to confirm that it's not being added to the same twice
                        mediator.Append(item + ":");
            }
            foreach (SizeBase sizeBase in Enum.GetValues(typeof(SizeBase)))
            {
                item = Program.GetEnumDescription(sizeBase);
                if (caprie.Contains(item))
                    if (!mediator.ToString().Contains(item + ":"))
                        mediator.Append(item);
            }
            mediator.Append("|"); //identifies a pair of 'Color' and 'Size'
        }
        Console.WriteLine("Availabe Parameters");
        string[] colorsAndSizes = mediator.ToString().Split('|');
        foreach (var clrSiz in colorsAndSizes)
        {
            Console.Write("Color : {0}", clrSiz.Split(':')[0]);
            if(clrSiz.Split(':').Length > 1)
                Console.Write(" ,Size : {0}", clrSiz.Split(':')[1]);
            Console.WriteLine();
        }
 }

我会做一个分层的正则表达式构建。我已经创建了这样一个效果很好的系统,尽管它是用于日志解析的。

//basic definitions:
String colorsRegex = "(?black|red|blue|orange|navy|cyan|white)";
String sizesRegex = "(?small|large|medium)";
String sizesShortRegex = "(?s|m|l|xl|xxl|xxxl)";
// some more complex definitions
// always start the array with the most complex regex, so that as much is captured as possible ("blue-green" instead of just "blue")
String[] colorFinders = {"("+colorsRegex+"[/- ]+)+", colorsRegex};
String[] sizesFinders = {"("+sizesRegex+"[/- ]+)+", "("+sizesShortRegex+"[/- ]+){2,}", sizesRegex};
// match the string for each complex definition

对于未被系统匹配(或正确匹配)的每一行,构建一个专用的"finder"。重复,直到所有数据都匹配。

注意无效的交叉匹配。在测试和生产环境中记录不匹配的线。记住要注意部分匹配,并排除任何可能混淆算法的字符串部分(想象一个名为"blue moon"的公司,它总是会被匹配)。