地址超级Regex

本文关键字:Regex 地址 | 更新日期: 2023-09-27 18:21:06

我正在尝试编写一个regex,它可以以多种形式捕获地址。这一切都很完美,直到我试图对郊区可能有多个单词的可能性进行编码。

我现在得到的是:

Input:
"Unit 1/61 bob-bob east st. bobville vic 3070"
Output Groups:
PropertyType = "Unit"
Unit = "1"
Number = "61"
Street = "bob-bob east"
Street Type = "st"
Suburb = "bobville"
State = "VIC"
Postcode = "3070"
Input:
"Unit 1/61 bob-bob east st. bobville west vic 3070"
Output Groups:
PropertyType = "Unit"
Unit = "1"
Number = "61"
Street = "bob-bob east"
Street Type = "st"
Suburb = "bobville"
State = ""
Postcode = ""

这是正则表达式:

new MyRegex("Address2", @"((?<PropertyType>Unit|Lot|Level|Floor|P.?O.? Box)'b)?" +
@"'s*((?<Unit>'d+)(/|''|-| ))?" +
@"'s*(?<Number>'d+)" +
@"'s*(?<Street>[a-z]+(('s*|-?)[a-z]+)*?)" +
@"'s*(?<StreetType>st|rd|ave|hwy|cct|ct|cl|gr|street|road|avenue|highway|circuit|court|close|grove)'.?" +
@"'s*(?<Suburb>[a-z]+(('s*|-?)[a-z]+)*?)?" +
@"'s*(?<State>Victoria|Tasmania|Queensland|New South Wales|(South|Western) Australia|(Northern|Australian Capital) Territory|VIC|NSW|SA|WA|NT|TAS|ACT|QLD)?" +
@"'s*(?<Postcode>'d{4})?"
, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture)

将郊区线路替换为:

's*(?<Suburb>[a-z]+((('s*|-?)[a-z]+){1,2}?)?)?

将正确捕获"第一单元/61 bob-bob east st.bobville-bob west vic 3070",但"第一单元/61 bob-bab east st bobville-bob west vic 3070"不会。

类似地,将郊区线替换为:

's*(?<Suburb>[a-z]+((('s*|-?)[a-z]+){1,2})?)?

将捕获"第一单元/61 bob-bb东-圣-博维尔-东西vic 3070",但不捕获"第一单元/61 bom-bb东-圣博维尔-西vic 3071"。

将郊区线路替换为:

's*(?<Suburb>[a-z]+(('s*|-?)[a-z]+){0,2}?)?

除了"第一单元/61东圣波伯维尔vic 3070",什么都不喜欢。是否更改{0,2}?到{0,2},然后还捕获郊区线中的状态。

有什么想法可以帮我清理一下吗?

地址超级Regex

我已经构建了一个更快的地址正则表达式,它也会及时失败。它基于以下正则表达式:www.regexlib.com

我想id会发布它,以防有人需要类似的东西:

new Regex(@"
^(
    ((?<PropertyType>[a-z' ,'.']+?)' *?)?
    ((?<Unit>'d+)(,|/|-|[' ]*?))?
    ('b(?<Number>'d+[a-z]?)'b)' *?
    (?<Street>['w' '-]+)
    ('b(?<StreetType>STREET|ST|ROAD|RD|GROVE|GR|DRIVE|DR|AVENUE|AVE|CIRCUIT|CCT|CLOSE|CL|COURT|CRT|CT|CRESCENT|CRES|PLACE|PL|PARADE|PDE|BOULEVARD|BLVD|HIGHWAY|HWY|ALLEY|ALLY|APPROACH|APP|ARCADE|ARC|BROW|BYPASS|BYPA|CAUSEWAY|CWAY|CIRCUS|CIRC|COPSE|CPSE|CORNER|CNR|COVE|END|ESPLANANDE|ESP|FLAT|FREEWAY|FWAY|FRONTAGE|FRNT|GARDENS|GDNS|GLADE|GLD|GLEN|GREEN|GRN|HEIGHTS|HTS|LANE|LINK|LOOP|MALL|MEWS|PACKET|PCKT|PARK|PARKWAY|PKWY|PROMENADE|PROM|RESERVE|RES|RIDGE|RDGE|RISE|ROW|SQUARE|SQ|STRIP|STRP|TARN|TERRACE|TCE|THOROUGHFARE|TFRE|TRACK|TRAC|TRUNKWAY|TWAY|VIEW|VISTA|VSTA|WALK|WAY|WALKWAY|WWAY|YARD)'b).?,?' *?
 )
 ((?<Suburb>[a-z'.]+(['-,' ]+[a-z'.]+)*?),?' *?)?
 ('b(?<State>New' South' Wales|NSW|Victoria|VIC|Queensland|QLD|Australian' Capital' Territory|ACT|South' Australia|SA|West' Australia|WA|Tasmania|TAS|Northern' Territory|NT)'b,?' *?)?
 ((?<Postcode>'d{4}),?' *?)?
 (Au(s(tralia)?)?)?
 ('s(?=[^$]))* 
$"
, RegexOptions.IgnoreCase | 
  RegexOptions.ExplicitCapture |
  RegexOptions.IgnorePatternWhitespace)