具有不同格式的可选短语的正则表达式问题

本文关键字:短语 正则表达式 问题 格式 | 更新日期: 2023-09-27 17:50:03

我有一个文件,我想从中解析特定的值。我如何将以下三个正则表达式放在一起以返回每个测试的一组条目,无论它是否有测量,是否有错误,并且包括测量和错误,如果有的话?可以有任意数量的测试,在一个测试中可以有任意数量的度量,但是在没有其他度量的测试中只有一个错误。我试过许多不同的组合都没有成功。我想我需要使用前瞻性和交替,但还没有找到正确的组合。仅供参考,正则表达式存储在数据库中,并由c#应用程序使用。提前感谢!

输入文件:

<event>
<common>
    <event_start_time>2014-01-29T17:30:36</event_start_time>
    <operator>10586546</operator>
    <shift>A</shift>
    <program>PPM</program>
    <program_revision>eo01</program_revision>
</common>
<test_instance>
<teststart startid = "ABCDEF">
        <test>MB</test>
        <test_start_time>2014-01-29T17:30:39</test_start_time>
        <exe>HelloWorld</exe>
        <subtest>CheckVersion</subtest>
        <subtest_number>1</subtest_number>
    </teststart>
    <testend endid = "ABCDEF">
        <test_result>PASS</test_result>
        <test_duration duration_units="millisec">1000</test_duration>
    </testend>
    <teststart startid = "CDEFG">
        <test>MB</test>
        <test_start_time>2014-01-29T17:30:40</test_start_time>
        <exe>HelloWorld</exe>
        <subtest>Program1</subtest>
        <subtest_number>2</subtest_number>
    </teststart>
    <measurement measid = "CDEFG">
        <measurement_name>CycleCounter </measurement_name>
        <numeric_measurement> 1</numeric_measurement>
        <measurement_time>2014-01-29T17:30:50</measurement_time>
    </measurement>
    <measurement measid = "CDEFG">
        <measurement_name>Counter </measurement_name>
        <numeric_measurement> 1</numeric_measurement>
        <measurement_time>2014-01-29T17:30:50</measurement_time>
    </measurement>
    <testend endid = "CDEFG">
        <test_result>PASS</test_result>
        <test_duration duration_units="millisec">10000</test_duration>
    </testend>
    <teststart startid = "xYZABC">
        <test>MB</test>
        <test_start_time>2014-01-29T17:36:01</test_start_time>
        <exe>HelloWorld</exe>
        <subtest>Check2</subtest>
        <subtest_number>17</subtest_number>
    </teststart>
    <measurement measid = "xYZABC">
        <measurement_name>ERROR1</measurement_name>
        <error_code>31001717</error_code>
        <error_message>MB:FAILED_CHECK_TEST</error_message>
        <measurement_time>2014-01-29T17:36:50</measurement_time>
        <measurement_result>FAIL</measurement_result>
    </measurement>
    <testend endid = "xYZABC">
        <test_result>FAIL</test_result>
        <test_duration duration_units="millisec">49000</test_duration>
    </testend>
</test_instance>
<event_duration duration_units="sec">374</event_duration>
<event_result>FAIL</event_result>

为了解析测试部分,我使用正则表达式,它可以工作:
'<teststart'sstartid's='s"
(?<tid>.*?)"'>
.*'n
.*'<test'>
(?<testid>.*?)'<
.*'n
.*'<test_start_time'>
(?<teststartdate>.*?)T
(?<teststarttime>.*?)'</.*'n
.*?
'<exe'>
(?<texe>.*?)'<.*'n
(.*?'n)*?
.*?'<testend.*?'n
.*?'<test_result'>
(?<result>.*?)'<.*'n
.*?duration_units="
(?<dunits>.{1}).*?
'>
(?<duration>.*?)'<
为了解析测量数据,我使用正则表达式,它可以工作:
.*?'<measurement'smeasid's='s"
(?<measid>.*?)"'>.*'r'n
(.*?'r'n)*?
.*?
'<measurement_name'>
(?<measurename>.*?)'<.*'r'n
.*?
'<numeric_measurement'>
(?<measurenum>[^/s].*?)'<.*'r'n
.*?
'<measurement_time'>
(?<measureDate>[^/s].*?)T
(?<measureTime>[^/s].*?)'<.*'r'n

要解析错误,我使用正则表达式,它可以工作:

.*?'<measurement'smeasid's='s"
(?<measid>.*?)"'>.*'r'n
.*?'<measurement_name'>
(?<measurename>.*?)'<.*'r'n
.*?'<error_code'>
(?<sterrcode>.*?)'<.*'r'n
.*?'<error_message'>
(?<sterrmsg>.*?)'<.*'r'n
.*?'<measurement_time'>
(?<measureDate>[^/s].*?)T
(?<measureTime>[^/s].*?)'<.*'r'n
.*?'<measurement_result'>
(?<measureResult>[^/s].*?)'<.*'r'n

免责声明:是的,我知道输入是XML,但我不能更改应用程序来反序列化,它使用正则表达式

具有不同格式的可选短语的正则表达式问题

可以在零宽度断言中使用反向引用。

(?=.*?(?<foo>a))?(?=.*?(?<bar>b))?

应用于

ab
ba

将报告

group "foo" = "a"
group "bar" = "b"

a

它将报告

group "foo" = "a"
group "bar" = (does not exist)