我的模式字符串(正则表达式C#)出了什么问题
本文关键字:什么 问题 模式 字符串 正则表达式 我的 | 更新日期: 2023-09-27 18:28:16
我遇到了字符串解析的问题,想用正则表达式来解决。作为输入,我总是得到与以下字符串相同的字符串:%function_name%(IN:param1,…,paramN;OUT:param1,..,paramN)
我写了一个模式:
string pattern = @"[A-za-z][A-za-z0-9]*'(IN:'s*(([A-za-z][A-za-z0-9](,|;))+|;)'s*OUT:('s*[A-za-z][A-za-z0-9],?)*')";
这个模式检测到了我的输入字符串,但事实上,作为输出,我希望有两个字符串数组。其中一个数组必须包含INPUT参数(在"IN:"之后)IN: param1, ..., paramN
,第二个数组必须具有输出参数的名称。参数可以包含数字和"_"。
真实输入字符串的几个例子:
Add_func(IN:port_0,IN_port_1;OUT:OUT_port99)
Some_func(输入:;输出:abc_P1)
Some_func2(IN:input_portA;OUT:)
请告诉我如何制作正确的图案。
您可以使用此模式,它允许在一次捕获具有单独参数的所有函数:
(?<funcName>'w+)'(IN: ?|OUT: ?|'G(?<inParam>[^,;()]+)?(?=[^)(;]*;)'s*[,;]'s*|'G(?<outParam>[^,()]+)(?=[^;]*'s*'))'s*[,)]'s*
图案细节:
(?<funcName>'w+)'(IN: ? # capture the function name and match "(IN: "
| # OR
OUT: ? # match "OUT: "
| # OR
'G(?<inParam>[^,;()]+)? # contiguous match, that captures a IN param
(?=[^)(;]*;) # check that it is always followed by ";"
's*[,;]'s* # match "," or ";" (to be always contiguous)
| # OR
'G(?<outParam>[^,()]+)? # contiguous match, that captures a OUT param
(?=[^;]*'s*')) # check that it is always followed by ")"
's*[,)]'s* # match "," (to be always contiguous) or ")"
(要获得更干净的结果,您必须走到匹配数组(使用foreach)并删除空条目)
示例代码:
static void Main(string[] args)
{
string subject = @"Add_func(IN: port_0, in_port_1; OUT: out_port99)
Some_func(IN:;OUT: abc_P1)
shift_data(IN:po1_p0;OUT: po1_p1, po1_p2)
Some_func2(IN: input_portA;OUT:)";
string pattern = @"(?<funcName>'w+)'(IN: ?|OUT: ?|'G(?<inParam>[^,;()]+)?(?=[^)(;]*;)'s*[,;]'s*|'G(?<outParam>[^,()]+)(?=[^;]*'s*'))'s*[,)]'s*";
Match m = Regex.Match(subject, pattern);
while (m.Success)
{
if (m.Groups["funcName"].ToString() != "")
{
Console.WriteLine("'nfunction name: " + m.Groups["funcName"]);
}
if (m.Groups["inParam"].ToString() != "")
{
Console.WriteLine("IN param: " + m.Groups["inParam"]);
}
if (m.Groups["outParam"].ToString() != "")
{
Console.WriteLine("OUT param: "+m.Groups["outParam"]);
}
m = m.NextMatch();
}
}
另一种方法是匹配一个字符串中的所有IN参数和所有OUT参数,然后用's*,'s*
分割这些字符串
示例:
string pattern = @"(?<funcName>'w+)'('s*IN:'s*(?<inParams>[^;]*?)'s*;'s*OUT's*:'s*(?<outParams>[^)]*?)'s*')";
Match m = Regex.Match(subject, pattern);
while (m.Success)
{
string functionName = m.Groups["function name"].ToString();
string[] inParams = Regex.Split(m.Groups["inParams"].ToString(), @"'s*,'s*");
string[] outParams = Regex.Split(m.Groups["outParams"].ToString(), @"'s*,'s*");
// Why not construct a "function" object to store all these values
m = m.NextMatch();
}
实现这一点的方法是捕获组。命名捕获组最容易使用:
// a regex surrounded by parens is a capturing group
// a regex surrounded by (?<name> ... ) is a named capturing group
// here I've tried to surround the relevant parts of the pattern with named groups
var pattern = @"[A-za-z][A-za-z0-9]*'(IN:'s*(((?<inValue>[A-za-z][A-za-z0-9])(,|;))+|;)'s*OUT:('s*(?<outValue>[A-za-z][A-za-z0-9]),?)*')";
// get all the matches. ExplicitCapture is just an optimization which tells the engine that it
// doesn't have to save state for non-named capturing groups
var matches = Regex.Matches(input: input, pattern: pattern, options: RegexOptions.ExplicitCapture)
// convert from IEnumerable to IEnumerable<Match>
.Cast<Match>()
// for each match, select out the captured values
.Select(m => new {
// m.Groups["inValue"] gets the named capturing group "inValue"
// for groups that match multiple times in a single match (as in this case, we access
// group.Captures, which records each capture of the group. .Cast converts to IEnumerable<T>,
// at which point we can select out capture.Value, which is the actual captured text
inValues = m.Groups["inValue"].Captures.Cast<Capture>().Select(c => c.Value).ToArray(),
outValues = m.Groups["outValue"].Captures.Cast<Capture>().Select(c => c.Value).ToArray()
})
.ToArray();
我想这就是您想要的:
[A-za-z][A-za-z0-9_]*'(IN:((?:'s*(?:[A-za-z][A-za-z0-9_]*(?:[,;])))+|;)'s*OUT:('s*[A-za-z][A-za-z0-9_]*,?)*')
分组时出现了一些问题,并且您错过了多个IN参数之间的空间。你也不允许在你的例子中出现下划线。
以上内容将适用于您上面的所有示例。
Add_func(IN: port_0, in_port_1; OUT: out_port99)
将捕获:
port_0, in_port_1
和out_port99
Some_func(IN:;OUT: abc_P1)
将捕获:
;
和abc_P1
Some_func2(IN: input_portA; OUT:)
将捕获:
- CCD_ 11且为空
在获得这些捕获组之后,您可以用逗号将它们拆分以获得您的数组。