C#正则表达式解析span
本文关键字:span 正则表达式 | 更新日期: 2024-10-22 04:46:53
我有一个要求,我只需要从这个HTML 获得链接
"<span class=""name""><a href=Details.aspx?entityID=1&hash=20&searchFunctionID=53b&type=Advanced&nameSet=Entities&q=a&textSearchType=ExactPhrase&orgTypes=01%2c02%2c03%2c04%2c05%2c06%2c07%2c08%2c09%2c10%2c11%2c12%2c13%2c14%2c15%2c16%2c90%2c96%2c98%2c99> GOOGLE CORPORATION </a> </span> <br /> <span class=typeDescription> 09 - Analytics Company </span>"
我需要的输出是
Details.aspx?entityID=1&hash=20&searchFunctionID=53b&type=Advanced&nameSet=Entities&q=a&textSearchType=ExactPhrase&orgTypes=01%2c02%2c03%2c04%2c05%2c06%2c07%2c08%2c09%2c10%2c11%2c12%2c13%2c14%2c15%2c16%2c90%2c96%2c98%2c99
我使用
string sPattern ="[<a href=](.*?(99))";
MatchCollection mcMatches = Regex.Matches(input,sPattern);
foreach (Match m in mcMatches)
{
Console.WriteLine(m.Value);
}
这没有给我正确的输出。有人能给我指正确的方向吗。
如上所述,使用Regex解析HTML不是一个好主意。我建议你使用HtmlAgilityPack(你可以从NuGet获得):
HtmlDocument hdoc = new HtmlDocument();
hdoc.LoadHtml(@"<span class=""name""><a href=Details.aspx?entityID=1&hash=20&searchFunctionID=53b&type=Advanced&nameSet=Entities&q=a&textSearchType=ExactPhrase&orgTypes=01%2c02%2c03%2c04%2c05%2c06%2c07%2c08%2c09%2c10%2c11%2c12%2c13%2c14%2c15%2c16%2c90%2c96%2c98%2c99> GOOGLE CORPORATION </a> </span> <br /> <span class=typeDescription> 09 - Analytics Company </span>");
var href = hdoc.DocumentNode.SelectSingleNode("//a").Attributes["href"].Value;
它为您提供href
属性的值。
正如Shaamaan所说,Regex不是解析HTML的正确方法,对于您给定的示例,Regex会更好,尽管不能保证它总是有效:
(?:<a href=)([^">]*)