Html敏捷性将表中的空值打包

本文关键字:空值 Html | 更新日期: 2023-09-27 18:20:21

我正在努力学习一些基本的刮削,多亏了这个网站,我学到了很多新东西,但现在我遇到了这个问题。。。这是我正在使用的代码:

var web = new HtmlWeb();
var doc = web.Load("url");
var nodes = doc.DocumentNode.SelectNodes("//*[@id='hotellist_inner']/div");
StreamWriter output = new StreamWriter("out.txt");
if (nodes != null)
{
    foreach (HtmlNode item in nodes)
    {
        if (item != null && item.Attributes["data-recommended"] != null)
        {
            string line = "";
            var nome = item.SelectSingleNode(".//h3/a").InnerText;
            var rating = item.SelectSingleNode(".//span[@class='rating']").InnerText;
            var price = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/strong[1]");
            var discount = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/div[1]");
            line = line + nome + "," + rating + "," + price + "," + discount;
            Console.WriteLine(line);
            output.WriteLine(line);
        }
    }
}

前两个项目(名称和评级)都很好,但当涉及到价格和折扣时,我得到的结果是空的。我用chrome scraper分析了这个页面(这里是链接),使用我使用的xpath可以很容易地得到结果。我不明白我做错了什么。如有任何帮助,我们将不胜感激!:D

Html敏捷性将表中的空值打包

快速浏览您试图抓取的网页后,并非所有item都有价格和折扣信息。您需要正确处理这种情况以避免异常,例如,在获取InnerText之前检查null。你的代码有了这个细微的变化,就可以获得价格和折扣信息:

if (item != null && item.Attributes["data-recommended"] != null)
{
    string line = "";
    var nome = item.SelectSingleNode(".//h3/a").InnerText;
    var rating = item.SelectSingleNode(".//span[@class='rating']").InnerText;
    var price = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/strong[1]");
    var discount = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/div[1]");
    //set priceString to empty string if price is null, else set it to price.InnerText
    var priceString = price == null ? "" : price.InnerText;
    //do similar step for discountString
    var discountString = discount == null ? "" : discount.InnerText;
    line = line + nome + "," + rating + "," + priceString + "," + discountString;
    Console.WriteLine(line);
    output.WriteLine(line);
}