C#使用htmlagilitypack解析表中的特定行
本文关键字:使用 htmlagilitypack | 更新日期: 2023-09-27 18:28:57
我正在开发一个C#应用程序,在该应用程序中,我需要解析表中的特定行,并将其不同值与给定字符串进行比较。
我要解析的URL是http://wsrpg.com/clans/57
<table cellspacing="0" cellpadding="0" border="0" class="table">
<thead>
<tr>
<th width="100">Rank</th>
<th width="150">Nick</th>
<th width="100">RCASH</th>
<th width="150">Activity in December</th>
<th width="100">Comportment</th>
<th width="100">Online</th>
<th width="150">Last Login</th>
</tr>
</thead>
<tr><td>Leader</td><td>robbie_william</td><td>351</td><td>1024</td><td>1195</td><td class='offline'><span class="label label-danger">Offline</span></td><td>03/01/2016</td></tr><tr><td>Boss</td><td>Alloy_</td><td>1418</td><td>1043</td><td>354</td><td class='offline'><span class="label label-danger">Offline</span></td><td>26/12/2015</td></tr><tr><td>Boss</td><td>AnonYmous_</td><td>32976</td><td>5142</td><td>937</td><td class='offline'><span class="label label-danger">Offline</span></td><td>04/01/2016</td></tr><tr><td>Boss</td><td>dJones</td><td>2739</td><td>6152</td><td>1044</td><td class='online'><span class="label label-success">Online</span></td><td>04/01/2016</td></tr><tr><td>Boss</td><td>SHARP</td><td>9015</td><td>1216</td><td>32</td><td class='offline'><span class="label label-danger">Offline</span></td><td>04/01/2016</td></tr><tr><td>Boss</td><td>Steffie</td><td>7888</td><td>6043</td><td>887</td><td class='online'><span class="label label-success">Online</span></td><td>04/01/2016</td></tr><tr><td>Boss</td><td>YOLOvsYODO</td><td>10950</td><td>2703</td><td>385</td><td class='offline'><span class="label label-danger">Offline</span></td><td>04/01/2016</td></tr><tr><td>Member</td><td>Angel_</td><td>8629</td><td>3256</td><td>167</td><td class='offline'><span class="label label-danger">Offline</span></td><td>04/01/2016</td></tr><tr><td>Member</td><td>asad</td><td>2452</td><td>3938</td><td>183</td><td class='offline'><span class="label label-danger">Offline</span></td><td>03/01/2016</td></tr><tr><td>Member</td><td>D3nim</td><td>1285</td><td>3217</td><td>31</td><td class='offline'><span class="label label-danger">Offline</span></td><td>03/01/2016</td></tr><tr><td>Member</td><td>Dell</td><td>5025</td><td>3305</td><td>182</td><td class='offline'><span class="label label-danger">Offline</span></td><td>01/01/2016</td></tr><tr><td>Member</td><td>Habib</td><td>1650</td><td>3860</td><td>36</td><td class='offline'><span class="label label-danger">Offline</span></td><td>04/01/2016</td></tr><tr><td>Member</td><td>Iron_MiXx</td><td>2569</td><td>485</td><td>525</td><td class='offline'><span class="label label-danger">Offline</span></td><td>03/01/2016</td></tr><tr><td>Member</td><td>MCool</td><td>4960</td><td>12739</td><td>290</td><td class='online'><span class="label label-success">Online</span></td><td>04/01/2016</td></tr><tr><td>Member</td><td>PREXEN</td><td>127</td><td>3873</td><td>1547</td><td class='offline'><span class="label label-danger">Offline</span></td><td>04/01/2016</td></tr><tr><td>Member</td><td>Sensation_</td><td>2733</td><td>1944</td><td>338</td><td class='offline'><span class="label label-danger">Offline</span></td><td>03/01/2016</td></tr><tr><td>Member</td><td>Wizard_</td><td>2081</td><td>2578</td><td>46</td><td class='offline'><span class="label label-danger">Offline</span></td><td>03/01/2016</td></tr>
</table>
我只想把nick存储在一个字符串或字符串数组中,这样我就可以使用它来与已经给定的字符串进行比较。
我想实现的是,检查用户输入的缺口是否存在于该表中。
我将使用布尔方法来实现这一点。
解决方案:我使用了Tim Schmelter的代码,下面是我如何使用它的:
private bool Authenticate(string nick)
{
using (WebClient client = new WebClient())
{
string html = client.DownloadString("http://wsrpg.com/clans/57");
DataTable table = GetTable(html, "table", true);
string[] nicks = table.AsEnumerable().Select(r => r.Field<string>("nick")).ToArray();
if(nicks.Contains(nick))
{
return true;
}
else
{
return false;
}
}
}
然后被称为
bool Authenticated = Authenticate(Player.GetName());
您可以使用此方法解析HTML,并用包含给定类名的第一个表填充DataTable
:
public static DataTable GetTable(string html, string tableClass, bool firstRowContainsHeader = false)
{
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
string xpath = string.Format("//table[contains(@class,'{0}')]", tableClass);
var firstTable = doc.DocumentNode.SelectSingleNode(xpath);
if (firstTable == null) return null;
DataTable table = new DataTable();
var tableRows = firstTable.Descendants("tr");
var tableData = tableRows.Skip(firstRowContainsHeader ? 1 : 0)
.Select(row => row.Descendants("td")
.Select((cell, index) => new { row, cell, index, cell.InnerText })
.ToList());
var headerCells = tableRows.First().Descendants()
.Where(n => n.Name == "td" || n.Name == "th");
int columnIndex = 0;
foreach (HtmlNode cell in headerCells)
{
string colName = firstRowContainsHeader
? cell.InnerText
: String.Format("Column {0}", (++columnIndex).ToString());
table.Columns.Add(colName, typeof(string));
}
foreach (var rowCells in tableData)
{
DataRow row = table.Rows.Add();
for (int i = 0; i < Math.Min(rowCells.Count, table.Columns.Count); i++)
{
row.SetField(i, rowCells[i].InnerText);
}
}
return table;
}
然后,您可以使用LINQ To DataTable来检查它是否包含给定的缺口:
string html = File.ReadAllText("C:''Temp''html.txt"); // loading your sample from file
DataTable table = GetTable(html, "table", true);
string nick = "robbie_william"; // input example
bool isContained = table.AsEnumerable()
.Any(r => nick.Equals(r.Field<string>("nick"), StringComparison.InvariantCultureIgnoreCase));
如果您只想填写string[]
或List<string>
:
string[] nicks = table.AsEnumerable().Select(r => r.Field<string>("nick")).ToArray(); // or ToList()
实现这一点的最简单方法是使用元素的XPath,因此从我的应用程序中解析不同的表:
string tableResult = htmlDocument.DocumentNode.SelectSingleNode("//table[@class='output']/tr[3]/td[3]").InnerText;