从字符串中提取颜色和大小
本文关键字:颜色 提取 字符串 | 更新日期: 2023-09-27 18:07:33
给出下列产品名称。我的任务是提取所有的颜色和大小。
示例:Nike Relay Women's Running Capris - **Black**, **L/XS**
Color = Black
Size = [XS,L]
做这件事最好的方法是什么?我想做一个所有颜色和尺寸的dictionary
,然后做一个匹配。
但是必须有更好的方法和更可维护的方法。我看到的最大的问题是有这么多不同的组合
- Nautica S 蓝色骨织睡裤
- 耐克接力女子跑步紧身裤- 黑色, XS
- Nautica男士j级睡裤-小号, 海军蓝
- Nautica j级梭织睡裤L, Maritime Navy
- Nike Legend Tank - women - 黑色/黑色
- Nike 3PK DF缓冲无显示标签短袜-女式- 黑色/白色/黑色
- Stance休闲袜-男士Mahalo, L/XL
- Nautica耐皱连衣裙裤30x30, 灰色
- Nautica耐皱裤装36x30, 黑色
- Nautica耐皱裤装33x32, 黑色
- RVCA VA翻转盒修身t恤-短袖-男士BluestoneL
- RVCA VA翻转盒修身t恤-短袖-男士Bluestone
- RVCA VA翻转盒修身t恤-短袖-男士Bluestone S
这是耗时的,但服务于目的,整个想法是你必须有一个List
/Collection
可用的colors
和sizes
,然后逐个迭代它们并检查
enum ColorBase {
[Description("Blue")] //by using System.ComponentModel;
Blue,
[Description("White")]
White,
[Description("Grey")]
Grey,
[Description("Magenta")]
Magenta,
[Description("Pale")]
Pale,
[Description("MaryTime Navy")]
MaryTimeNavy,
[Description("Navy")]
Navy,
[Description("Bluestone")]
Bluestone,
}
enum SizeBase
{
[Description("XL")]
XL,
[Description("XXL")]
XXL,
[Description("L")]
L,
[Description("M")]
M,
[Description("S")]
S,
[Description("XS")]
XS,
[Description("3X30")]
S30X30,
[Description("36X30")]
S36X30,
[Description("33X32")]
S33X32
}
使用System.Reflection
的辅助方法,它将返回上面声明的enum
的Description
public static string GetEnumDescription(Enum value)
{
FieldInfo fi = value.GetType().GetField(value.ToString());
DescriptionAttribute[] attributes =
(DescriptionAttribute[])fi.GetCustomAttributes(
typeof(DescriptionAttribute),
false);
if (attributes != null &&
attributes.Length > 0)
return attributes[0].Description;
else
return value.ToString();
}
,下面是对它们的访问:-
static void Main(string[] args)
{
List<string> availableColorsAndSizes = new List<string>();
string item = string.Empty;
StringBuilder mediator = new StringBuilder();
List<string> capries = new List<string>{"Nautica S Blue Bone Woven Pajama Pants",
"Nike Relay Women's Running Capris - Black, XS",
"Nautica Mens J-Class Pajama Pants-Small, NAVY",
"Nautica J-Class Woven Pajama Pant L, Maritime Navy",
"Nike Legend Tank - Womens - Black/Black",
"Nike 3PK DF Cushion No Show Tab Socks - Womens - Black/White/Black",
"Stance Casual Socks - Men's Mahalo, L/XL",
"Nautica Wrinkle Resistant Dress Pant 30x30, Grey",
"Nautica Wrinkle Resistant Dress Pant 36x30, Black",
"Nautica Wrinkle Resistant Dress Pant 33x32, Black",
"RVCA VA Flipped Box Slim T-Shirt - Short-Sleeve - Men's Bluestone, L",
"RVCA VA Flipped Box Slim T-Shirt - Short-Sleeve - Men's Bluestone, M",
"RVCA VA Flipped Box Slim T-Shirt - Short-Sleeve - Men's Bluestone, S",
};
foreach (var caprie in capries)
{
string[] words = caprie.Split(); //added this for WORD level precison
foreach (ColorBase colorBase in Enum.GetValues(typeof(ColorBase)))
{
item = Program.GetEnumDescription(colorBase);
if (caprie.Contains(item))
if (!mediator.ToString().Contains(item + ":"))//just to confirm that it's not being added to the same twice
mediator.Append(item + ":");
}
foreach (SizeBase sizeBase in Enum.GetValues(typeof(SizeBase)))
{
item = Program.GetEnumDescription(sizeBase);
if (caprie.Contains(item))
if (!mediator.ToString().Contains(item + ":"))
mediator.Append(item);
}
mediator.Append("|"); //identifies a pair of 'Color' and 'Size'
}
Console.WriteLine("Availabe Parameters");
string[] colorsAndSizes = mediator.ToString().Split('|');
foreach (var clrSiz in colorsAndSizes)
{
Console.Write("Color : {0}", clrSiz.Split(':')[0]);
if(clrSiz.Split(':').Length > 1)
Console.Write(" ,Size : {0}", clrSiz.Split(':')[1]);
Console.WriteLine();
}
}
我会做一个分层的正则表达式构建。我已经创建了这样一个效果很好的系统,尽管它是用于日志解析的。
//basic definitions:
String colorsRegex = "(?black|red|blue|orange|navy|cyan|white)";
String sizesRegex = "(?small|large|medium)";
String sizesShortRegex = "(?s|m|l|xl|xxl|xxxl)";
// some more complex definitions
// always start the array with the most complex regex, so that as much is captured as possible ("blue-green" instead of just "blue")
String[] colorFinders = {"("+colorsRegex+"[/- ]+)+", colorsRegex};
String[] sizesFinders = {"("+sizesRegex+"[/- ]+)+", "("+sizesShortRegex+"[/- ]+){2,}", sizesRegex};
// match the string for each complex definition
对于未被系统匹配(或正确匹配)的每一行,构建一个专用的"finder"。重复,直到所有数据都匹配。
注意无效的交叉匹配。在测试和生产环境中记录不匹配的线。记住要注意部分匹配,并排除任何可能混淆算法的字符串部分(想象一个名为"blue moon"的公司,它总是会被匹配)。