Html敏捷包从表中获取内容
本文关键字:获取 Html | 更新日期: 2023-09-27 18:10:22
我需要从"http://anytimefitness.com/find-gym/list/AL"获取位置,地址和电话号码。到目前为止,我有这个…
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.LoadHtml(stateURLs[0].ToString());
var BlankNode =
htmlDoc.DocumentNode.SelectNodes("/div[@class='segmentwhite']/table[@style='width: 100%;']//tr[@class='']");
var GrayNode =
htmlDoc.DocumentNode.SelectNodes("/div[@class='segmentwhite']/table[@style='width: 100%;']//tr[@class='gray_bk']");
我已经在stackoverflow周围看了一段时间,但是没有一个关于htmlagilitypack的帖子真的有帮助。我也一直在使用http://www.w3schools.com/xpath/xpath_syntax.asp
由于您所追求的<div>
不是根节点的直接子节点,因此您需要使用//
而不是/
。然后,您可以使用or
操作符将XPath用于BlankNode
和GrayNode
,例如:
var htmlweb = new HtmlWeb();
HtmlDocument htmlDoc = htmlweb.Load("http://anytimefitness.com/find-gym/list/AL");
htmlDoc.OptionFixNestedTags = true;
var AllNode =
htmlDoc.DocumentNode.SelectNodes("//div[@class='segmentwhite']/table//tr[@class='' or @class='gray_bk']");
foreach (HtmlNode node in AllNode)
{
var location = node.SelectSingleNode("./td[2]").InnerText;
var address = node.SelectSingleNode("./td[3]").InnerText;
var phone = node.SelectSingleNode("./td[4]").InnerText;
//do something with above informations
}
下面是我在LinqPad中测试的一个例子:
string url = @"http://anytimefitness.com/find-gym/list/AL";
var client = new System.Net.WebClient();
var data = client.DownloadData(url);
var html = Encoding.UTF8.GetString(data);
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.LoadHtml(html);
var gyms = htmlDoc.DocumentNode.SelectNodes("//tbody/tr[@class='' or @class='gray_bk']");
foreach (var gym in gyms) {
var city = gym.SelectSingleNode("./td[2]").InnerText;
var address = gym.SelectSingleNode("./td[3]").InnerText;
var phone = gym.SelectSingleNode("./td[4]").InnerText;
}
因为htmllagilitypack也支持Linq,你也可以这样做:
string [] classes = {"", "gray_bk"};
var gyms = htmlDoc
.DocumentNode
.Descendants("tr")
.Where(t => classes.Contains(t.Attributes["class"].Value))
.ToList();
gyms.ForEach(gym => {
var city = gym.SelectSingleNode("./td[2]").InnerText;
var address = gym.SelectSingleNode("./td[3]").InnerText;
var phone = gym.SelectSingleNode("./td[4]").InnerText;
});