HTML敏捷包和字符串解析

本文关键字:字符串 HTML | 更新日期: 2023-09-27 17:51:01

我有一个这样的html字符串(yahoo xml description element)

<img src="http://l.yimg.com/a/i/us/we/52/26.gif"/><br /> 
<b>Current Conditions:</b><br /> Cloudy, 1 C<BR /> <BR />
<b>Forecast:</b><BR /> Mon - Snow. High: -5 Low: -14<br /> Tue - Light Snow. High: -8 Low: -16<br /> <br /> 
....

我只想得到High和Low值(例如:-5,-14,-8,-16)

我试着像这样得到htmllagilitypack:

HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(rssDescriptionElement);
List<string> elements = new List<string>();
foreach (HtmlNode element in htmlDoc.DocumentNode.SelectNodes("//br"))
{
    elements.Add(element.NextSibling.InnerText);
}

elements列表输出以上htmlString:

"'n"
"'nCloudy, 1 C"
"'n"
"Forecast:"
"'nMon - Snow. High: -5 Low: -14"
"'nTue - Light Snow. High: -8 Low: -16"
"'n"
"'n"
""
"'n(provided by "
"'n"

我如何从这个列表中只得到高值和低值(-5,-14,-8,-16)或另一个不同的解决方案?

HTML敏捷包和字符串解析

使用正则表达式:

(?:High|Low)'s*:'s*(?<num>-?'d+)

,得到名为num的组。示例代码:

List<string> elements = new List<string>();
var pattern = @"(?:High|Low)'s*:'s*(?<num>-?'d+)";
foreach (HtmlNode element in htmlDoc.DocumentNode.SelectNodes("//br"))
{
    foreach(Match mc in Regex.Matches(element.NextSibling.InnerText, pattern))
    {
        elements.Add(mc.Groups["num"].ToString());
    }
}