从表Fizzler进行HTML解析

本文关键字:解析 HTML 进行 Fizzler 从表 | 更新日期: 2023-09-27 17:51:03

我必须解析以下HTML页面:

这是我的代码解析使用Fizzler,我想要得到的是标题,利率,天数(有时为空)和价格;span之后的第二个价格。但是当我运行我的代码时,它只能从ListRoomDetails中获得2个对象,如下所示,我们有房间类型1促销10%和房间类型2 60%,但它跳过了房间类型2 60%,并获得ListRoomDetails的第一个元素(房间类型1促销90%)。

我希望保持所有的房间类型在两个ListRoomDetailsdiv

是否还有办法检测days值是否存在,如果存在,获取它,否则忽略它。

//HTML File
<div class="ListItem">
     <div class="ListRoom">
          <span class="title">
             <strong>Super Room</strong>
          </span>
      </div>            
     //section to get details of room
     <div class="listRoomDetails">
        <table>
            <thead>
                <tr>
                    Days
                </tr>
            </thead>
            <tbody>
                <tr>
                    <td class = "rates">
                        Room Type 1 promotion 10%
                    </td>
                    <td class = "days">
                        261.00
                    </td>
                                        <td class = "days">
                    </td>
                    <td class="price">
                        <span>290.00&euro;</span>
                        261.00&euro; //get this money
                    </td>
                </tr>
                <tr>
                    <td class = "rates">
                        Room Type 2 promotion 60%
                    </td>
                                        <td class = "days">
                    </td>
                    <td class = "days">
                        261.00
                    </td>
                    <td class="price">
                        <span>290.00&euro;</span>
                        261.00&euro; // get this money
                    </td>
                </tr>
            </tbody>
    </div>
    <div class="listRoomDetails">
        <table>
            <thead>
                <tr>
                    Days
                </tr>
            </thead>
            <tbody>
                <tr>
                    <td class = "rates">
                        Room Type 1 promotion 90%
                    </td>
                                         <td class = "days">
                    </td>
                    <td class = "rates">
                        261.00
                    </td>
                    <td class="price">
                        <span>290.00&euro;</span>
                        261.00&euro;
                    </td>
                </tr>
                <tr>
                    <td class = "rates">
                        Room Type 2 promotion 0 % // type of room
                    </td>
                    <td class = "days">
                        261.00
                    </td>
                    <td class="price">
                        <span>290.00&euro;</span>
                        261.00&euro;
                    </td>
                </tr>
            </tbody>
        </div>
   </div>
源代码:

        var source = File.ReadAllText("TestHtml/HotelWithAvailability.html");
        var html = new HtmlDocument(); // with HTML Agility pack
        html.LoadHtml(source);
        var doc = html.DocumentNode;
        var rooms = (from listR in doc.QuerySelectorAll(".ListItem")
                     from listR2 in doc.QuerySelectorAll("tbody")
                     select new HotelAvailability
                     {
                         HotelName = listR.QuerySelector(".title").InnerText.Trim(), //get room name
                         TypeRooms = listR2.QuerySelector("tr td.rates").InnerText.Trim(), //get room type
                         Price = listR2.QuerySelector("tr td.price").InnerText.Trim(), //
                     }).ToArray();

从表Fizzler进行HTML解析

您应该查询当前房间(即ListItem)的房间详细信息:

var rooms = from r in doc.QuerySelectorAll(".ListItem")
            from rd in r.QuerySelectorAll(".listRoomDetails tbody tr")
            select new HotelAvailability {
                HotelName = r.QuerySelector(".title").InnerText.Trim(),
                TypeRooms = rd.QuerySelector(".rates").InnerText.Trim(),
                Price = rd.QuerySelector(".price span").InnerText.Trim()
             };
对于您的示例html,它产生:
[
  {
     HotelName: "Super Room",
     Price: "290.00&euro;",
     TypeRooms: "Room Type 1 promotion 10%"
  },
  {
    HotelName: "Super Room",
    Price: "290.00&euro;",
    TypeRooms: "Room Type 2 promotion 60%"
  },
  {
    HotelName:  "Super Room",
    Price: "290.00&euro;",
    TypeRooms: "Room Type 1 promotion 90%"
  },
  {
    HotelName: "Super Room",
    Price: "290.00&euro;",
    TypeRooms: "Room Type 2 promotion 0 % // type of room"
  }
]