有条件地忽略最后一个单词

本文关键字:最后一个 单词 有条件 | 更新日期: 2023-09-27 18:20:03

有人能帮忙吗?(也发布在RegexBuddy论坛上)

我有一个相对较大的(自动生成的)正则表达式(在底部完整列出),使用这个片段有很多重复的片段:-

# Add words to word list
(?<_KC1>(?:(?:'w|[ 't''/]|'['w*'])*?))

这是为了在更知名的片段之间"抓取"单词和文本。这些捕获稍后都会在代码中聚合,以提供整个匹配中的单词列表。

我遇到的问题是第一个备选部分,即:

    # Pair of Strike prices
    (?<Strike>[+|-]?'d+(?:'.'d+)?)/(?<Strike2>[+|-]?'d+(?:'.'d+)?)
    # Add to Word List (but not 'x' as last word) !!!!!!!!!!!! This is what needs changing
    (?<_KC3>(?:(?:'w|[ 't''/]|'['w*'])*?))
    # Cross price
    (?:x[ 't]?-?(?<Cross>[+|-]?'d+(?:'.'d+)?)x?)?

正如你所看到的,"交叉价格"总是以"x"开头,所以我需要的是一个尽可能类似于我提到的第一个片段的模式,但如果最后一个单词恰好是"x",则忽略它。还有两个进一步的并发症:1) "交叉价格"本身是可选的2) "x"本身可以匹配作为路透社日期代码的"期货到期日"。

我试过消极的外表等等,但无论我做什么,我都会把其他事情搞砸。我相信答案可能在于If Then Else条件句,但我不确定。

例如:-

WTI美国:6月12日110.00/140.00[1x2]看涨价差x 102.50 350-365

"执行价格对"按预期返回"110.00/140.00"

但单词列表正在提取"[1x2]Call Spread x"102.50"本应是"交叉价格",现在在表达式的后面被匹配为"出价/报价价差"的"出价"部分。

对此表示感谢

干杯Simon

# Match this group (optional)
(?:
    # Match one of the product symbols or their aliases
    'b(?<ProductSymbol>CL|Brent|GasOil|WTI|LO|BRT)'b
    # Add words to word list
    (?<_KC1>(?:(?:'w|[ 't''/]|'['w*'])*?))
    # Skip over whitespace plus any of these characters [:]
    [ 't:]+
)?
# Futures expiry date
(?<=[ 't]|'|^)(?<FuturesExpiryPeriod>(?<_MY>(?<_MYP>(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?))[ 't]?(?<_MYY>(?:20)?'d'd))|(?<_CE>Cal-?(?<_CEY>(?:20)?'d'd))|(?<_QF>Q(?:uarter)?(?<_QFP>1|2|3|4)[ 't]*(?<_QFY>(?:20)?'d'd))|(?<_QL>(?<_QLP>1|2|3|4)[ 't]*Q(?:uarter)?[ 't]*(?<_QLY>(?:20)?'d'd))|(?<_HY>(?<_HYP>1|2)[ 't]*H(?:alf)?[ 't]*(?<_HYY>(?:20)?'d'd))|(?<_ER>(?<_ERP>[FGHJKMNQUVXZ])(?<_ERY>'d{0,2}))[ 't]*)
# Skip over whitespace
[ 't]+
# Add words to word list
(?<_KC2>(?:(?:'w|[ 't''/]|'['w*'])*?))
# Match one of the following choices (in order):
(?:
    (?: # First choice
        # Pair of Strike prices
        (?<Strike>[+|-]?'d+(?:'.'d+)?)/(?<Strike2>[+|-]?'d+(?:'.'d+)?)
        # Add to Word List (but not 'x' as last word) !!!!!!!!!!!! This is what needs changing
        (?<_KC3>(?:(?:'w|[ 't''/]|'['w*'])*?))
        # Cross price
        (?:x[ 't]?-?(?<Cross>[+|-]?'d+(?:'.'d+)?)x?)?
    )
    |
    (?: # Second choice
        # Cross price
        (?:x[ 't]?-?(?<Cross>[+|-]?'d+(?:'.'d+)?)x?)
        # Add words to word list
        (?<_KC4>(?:(?:'w|[ 't''/]|'['w*'])*?))
        # Pair of Strike prices
        (?<Strike>[+|-]?'d+(?:'.'d+)?)/(?<Strike2>[+|-]?'d+(?:'.'d+)?)?
    )
    |
    (?: # Third choice
        # Single Strike price
        (?<Strike>[+|-]?'d+(?:'.'d+)?)
        # Add to Word List (but not 'x' as last word) !!!!!!!!!!!! This is what needs changing
        (?<_KC5>(?:(?:'w|[ 't''/]|'['w*'])*?))
        # Cross price
        (?:x[ 't]?-?(?<Cross>[+|-]?'d+(?:'.'d+)?)x?)?
    )
    |
    (?: # Fourth choice
        # Cross price
        (?:x[ 't]?-?(?<Cross>[+|-]?'d+(?:'.'d+)?)x?)
        # Add words to word list
        (?<_KC6>(?:(?:'w|[ 't''/]|'['w*'])*?))
        # Single Strike price
        (?<Strike>[+|-]?'d+(?:'.'d+)?)?
    )
)
# Add words to word list
(?<_KC7>(?:(?:'w|[ 't''/]|'['w*'])*?))
# Skip over whitespace plus any of these characters [,]
[ 't,]+
# Bid/Offer spread
(?<Bid>[+|-]?'d+(?:'.'d+)?)[ 't]*(?:/|-|' )[ 't]*(?<Offer>[+|-]?'d+(?:'.'d+)?)
# Look for any other keywords in brackets (optional)
(?:
    # Skip over whitespace
    [ 't]*
    # <pattern>
    '(
    # Add words to word list
    (?<_KC8>(?:(?:'w|[ 't''/]|'['w*'])*?))
    # <pattern>
    ')
)?

有条件地忽略最后一个单词

如果你要从文件或其他文件中读取,最好使用像awk这样的工具进行解析。不要使用复杂的regex程序,因为它们可能会在一些不太常见的场景中引发问题。干杯