Html敏捷包从表中获取内容

本文关键字:获取 Html | 更新日期: 2023-09-27 18:10:22

我需要从"http://anytimefitness.com/find-gym/list/AL"获取位置,地址和电话号码。到目前为止,我有这个…

    HtmlDocument htmlDoc = new HtmlDocument();
    htmlDoc.OptionFixNestedTags = true;
    htmlDoc.LoadHtml(stateURLs[0].ToString());
    var BlankNode = 
        htmlDoc.DocumentNode.SelectNodes("/div[@class='segmentwhite']/table[@style='width: 100%;']//tr[@class='']");
    var GrayNode = 
        htmlDoc.DocumentNode.SelectNodes("/div[@class='segmentwhite']/table[@style='width: 100%;']//tr[@class='gray_bk']");

我已经在stackoverflow周围看了一段时间,但是没有一个关于htmlagilitypack的帖子真的有帮助。我也一直在使用http://www.w3schools.com/xpath/xpath_syntax.asp

Html敏捷包从表中获取内容

由于您所追求的<div>不是根节点的直接子节点,因此您需要使用//而不是/。然后,您可以使用or操作符将XPath用于BlankNodeGrayNode,例如:

var htmlweb = new HtmlWeb();
HtmlDocument htmlDoc = htmlweb.Load("http://anytimefitness.com/find-gym/list/AL");
htmlDoc.OptionFixNestedTags = true;
var AllNode =
        htmlDoc.DocumentNode.SelectNodes("//div[@class='segmentwhite']/table//tr[@class='' or @class='gray_bk']");
foreach (HtmlNode node in AllNode)
{
    var location = node.SelectSingleNode("./td[2]").InnerText;
    var address = node.SelectSingleNode("./td[3]").InnerText;
    var phone = node.SelectSingleNode("./td[4]").InnerText;
    //do something with above informations
}

下面是我在LinqPad中测试的一个例子:

string url = @"http://anytimefitness.com/find-gym/list/AL";
var client = new System.Net.WebClient();
var data = client.DownloadData(url);
var html = Encoding.UTF8.GetString(data);
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.LoadHtml(html);
var gyms = htmlDoc.DocumentNode.SelectNodes("//tbody/tr[@class='' or @class='gray_bk']");
foreach (var gym in gyms) {
    var city = gym.SelectSingleNode("./td[2]").InnerText;
    var address = gym.SelectSingleNode("./td[3]").InnerText;
    var phone = gym.SelectSingleNode("./td[4]").InnerText;
}

因为htmllagilitypack也支持Linq,你也可以这样做:

string [] classes = {"", "gray_bk"};
var gyms = htmlDoc
        .DocumentNode
        .Descendants("tr")
        .Where(t => classes.Contains(t.Attributes["class"].Value))
        .ToList();
gyms.ForEach(gym => {
    var city = gym.SelectSingleNode("./td[2]").InnerText;
    var address = gym.SelectSingleNode("./td[3]").InnerText;
    var phone = gym.SelectSingleNode("./td[4]").InnerText;
});