从表Fizzler进行HTML解析
本文关键字:解析 HTML 进行 Fizzler 从表 | 更新日期: 2023-09-27 17:51:03
我必须解析以下HTML页面:
这是我的代码解析使用Fizzler,我想要得到的是标题,利率,天数(有时为空)和价格;span之后的第二个价格。但是当我运行我的代码时,它只能从ListRoomDetails中获得2个对象,如下所示,我们有房间类型1促销10%和房间类型2 60%,但它跳过了房间类型2 60%,并获得ListRoomDetails的第一个元素(房间类型1促销90%)。
我希望保持所有的房间类型在两个ListRoomDetailsdiv
是否还有办法检测days值是否存在,如果存在,获取它,否则忽略它。
//HTML File
<div class="ListItem">
<div class="ListRoom">
<span class="title">
<strong>Super Room</strong>
</span>
</div>
//section to get details of room
<div class="listRoomDetails">
<table>
<thead>
<tr>
Days
</tr>
</thead>
<tbody>
<tr>
<td class = "rates">
Room Type 1 promotion 10%
</td>
<td class = "days">
261.00
</td>
<td class = "days">
</td>
<td class="price">
<span>290.00€</span>
261.00€ //get this money
</td>
</tr>
<tr>
<td class = "rates">
Room Type 2 promotion 60%
</td>
<td class = "days">
</td>
<td class = "days">
261.00
</td>
<td class="price">
<span>290.00€</span>
261.00€ // get this money
</td>
</tr>
</tbody>
</div>
<div class="listRoomDetails">
<table>
<thead>
<tr>
Days
</tr>
</thead>
<tbody>
<tr>
<td class = "rates">
Room Type 1 promotion 90%
</td>
<td class = "days">
</td>
<td class = "rates">
261.00
</td>
<td class="price">
<span>290.00€</span>
261.00€
</td>
</tr>
<tr>
<td class = "rates">
Room Type 2 promotion 0 % // type of room
</td>
<td class = "days">
261.00
</td>
<td class="price">
<span>290.00€</span>
261.00€
</td>
</tr>
</tbody>
</div>
</div>
源代码: var source = File.ReadAllText("TestHtml/HotelWithAvailability.html");
var html = new HtmlDocument(); // with HTML Agility pack
html.LoadHtml(source);
var doc = html.DocumentNode;
var rooms = (from listR in doc.QuerySelectorAll(".ListItem")
from listR2 in doc.QuerySelectorAll("tbody")
select new HotelAvailability
{
HotelName = listR.QuerySelector(".title").InnerText.Trim(), //get room name
TypeRooms = listR2.QuerySelector("tr td.rates").InnerText.Trim(), //get room type
Price = listR2.QuerySelector("tr td.price").InnerText.Trim(), //
}).ToArray();
您应该查询当前房间(即ListItem)的房间详细信息:
var rooms = from r in doc.QuerySelectorAll(".ListItem")
from rd in r.QuerySelectorAll(".listRoomDetails tbody tr")
select new HotelAvailability {
HotelName = r.QuerySelector(".title").InnerText.Trim(),
TypeRooms = rd.QuerySelector(".rates").InnerText.Trim(),
Price = rd.QuerySelector(".price span").InnerText.Trim()
};
对于您的示例html,它产生:
[
{
HotelName: "Super Room",
Price: "290.00€",
TypeRooms: "Room Type 1 promotion 10%"
},
{
HotelName: "Super Room",
Price: "290.00€",
TypeRooms: "Room Type 2 promotion 60%"
},
{
HotelName: "Super Room",
Price: "290.00€",
TypeRooms: "Room Type 1 promotion 90%"
},
{
HotelName: "Super Room",
Price: "290.00€",
TypeRooms: "Room Type 2 promotion 0 % // type of room"
}
]