C#使用htmlagilitypack解析表中的特定行

本文关键字:使用 htmlagilitypack | 更新日期: 2023-09-27 18:28:57

我正在开发一个C#应用程序,在该应用程序中,我需要解析表中的特定行,并将其不同值与给定字符串进行比较。

我要解析的URL是http://wsrpg.com/clans/57

<table cellspacing="0" cellpadding="0" border="0" class="table">
<thead>
<tr>
<th width="100">Rank</th>
<th width="150">Nick</th>
<th width="100">RCASH</th>
<th width="150">Activity in December</th>
<th width="100">Comportment</th>
<th width="100">Online</th>
<th width="150">Last Login</th> 
</tr>
</thead>
<tr><td>Leader</td><td>robbie_william</td><td>351</td><td>1024</td><td>1195</td><td class='offline'><span class="label label-danger">Offline</span></td><td>03/01/2016</td></tr><tr><td>Boss</td><td>Alloy_</td><td>1418</td><td>1043</td><td>354</td><td class='offline'><span class="label label-danger">Offline</span></td><td>26/12/2015</td></tr><tr><td>Boss</td><td>AnonYmous_</td><td>32976</td><td>5142</td><td>937</td><td class='offline'><span class="label label-danger">Offline</span></td><td>04/01/2016</td></tr><tr><td>Boss</td><td>dJones</td><td>2739</td><td>6152</td><td>1044</td><td class='online'><span class="label label-success">Online</span></td><td>04/01/2016</td></tr><tr><td>Boss</td><td>SHARP</td><td>9015</td><td>1216</td><td>32</td><td class='offline'><span class="label label-danger">Offline</span></td><td>04/01/2016</td></tr><tr><td>Boss</td><td>Steffie</td><td>7888</td><td>6043</td><td>887</td><td class='online'><span class="label label-success">Online</span></td><td>04/01/2016</td></tr><tr><td>Boss</td><td>YOLOvsYODO</td><td>10950</td><td>2703</td><td>385</td><td class='offline'><span class="label label-danger">Offline</span></td><td>04/01/2016</td></tr><tr><td>Member</td><td>Angel_</td><td>8629</td><td>3256</td><td>167</td><td class='offline'><span class="label label-danger">Offline</span></td><td>04/01/2016</td></tr><tr><td>Member</td><td>asad</td><td>2452</td><td>3938</td><td>183</td><td class='offline'><span class="label label-danger">Offline</span></td><td>03/01/2016</td></tr><tr><td>Member</td><td>D3nim</td><td>1285</td><td>3217</td><td>31</td><td class='offline'><span class="label label-danger">Offline</span></td><td>03/01/2016</td></tr><tr><td>Member</td><td>Dell</td><td>5025</td><td>3305</td><td>182</td><td class='offline'><span class="label label-danger">Offline</span></td><td>01/01/2016</td></tr><tr><td>Member</td><td>Habib</td><td>1650</td><td>3860</td><td>36</td><td class='offline'><span class="label label-danger">Offline</span></td><td>04/01/2016</td></tr><tr><td>Member</td><td>Iron_MiXx</td><td>2569</td><td>485</td><td>525</td><td class='offline'><span class="label label-danger">Offline</span></td><td>03/01/2016</td></tr><tr><td>Member</td><td>MCool</td><td>4960</td><td>12739</td><td>290</td><td class='online'><span class="label label-success">Online</span></td><td>04/01/2016</td></tr><tr><td>Member</td><td>PREXEN</td><td>127</td><td>3873</td><td>1547</td><td class='offline'><span class="label label-danger">Offline</span></td><td>04/01/2016</td></tr><tr><td>Member</td><td>Sensation_</td><td>2733</td><td>1944</td><td>338</td><td class='offline'><span class="label label-danger">Offline</span></td><td>03/01/2016</td></tr><tr><td>Member</td><td>Wizard_</td><td>2081</td><td>2578</td><td>46</td><td class='offline'><span class="label label-danger">Offline</span></td><td>03/01/2016</td></tr>
</table>

我只想把nick存储在一个字符串或字符串数组中,这样我就可以使用它来与已经给定的字符串进行比较。

我想实现的是,检查用户输入的缺口是否存在于该表中。

我将使用布尔方法来实现这一点。

解决方案:我使用了Tim Schmelter的代码,下面是我如何使用它的:

private bool Authenticate(string nick)
        {
            using (WebClient client = new WebClient())
            {
                string html = client.DownloadString("http://wsrpg.com/clans/57");
                DataTable table = GetTable(html, "table", true);
                string[] nicks = table.AsEnumerable().Select(r => r.Field<string>("nick")).ToArray();
               if(nicks.Contains(nick))
                {
                    return true;
                }
               else
                {
                    return false;
                }
            }
        }

然后被称为

bool Authenticated = Authenticate(Player.GetName());

C#使用htmlagilitypack解析表中的特定行

您可以使用此方法解析HTML,并用包含给定类名的第一个表填充DataTable

public static DataTable GetTable(string html, string tableClass, bool firstRowContainsHeader = false)
{
    var doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(html);
    string xpath = string.Format("//table[contains(@class,'{0}')]", tableClass);
    var firstTable = doc.DocumentNode.SelectSingleNode(xpath);
    if (firstTable == null) return null;
    DataTable table = new DataTable();
    var tableRows = firstTable.Descendants("tr");
    var tableData = tableRows.Skip(firstRowContainsHeader ? 1 : 0)
        .Select(row => row.Descendants("td")
            .Select((cell, index) => new { row, cell, index, cell.InnerText })
            .ToList());
    var headerCells = tableRows.First().Descendants()
        .Where(n => n.Name == "td" || n.Name == "th");
    int columnIndex = 0;
    foreach (HtmlNode cell in headerCells)
    {
        string colName = firstRowContainsHeader
            ? cell.InnerText
            : String.Format("Column {0}", (++columnIndex).ToString());
        table.Columns.Add(colName, typeof(string));
    }
    foreach (var rowCells in tableData)
    {
        DataRow row = table.Rows.Add();
        for (int i = 0; i < Math.Min(rowCells.Count, table.Columns.Count); i++)
        {
            row.SetField(i, rowCells[i].InnerText);
        }
    }
    return table;
}

然后,您可以使用LINQ To DataTable来检查它是否包含给定的缺口:

string html = File.ReadAllText("C:''Temp''html.txt");  // loading your sample from file
DataTable table = GetTable(html, "table", true);
string nick = "robbie_william";  // input example
bool isContained = table.AsEnumerable()
    .Any(r => nick.Equals(r.Field<string>("nick"), StringComparison.InvariantCultureIgnoreCase));

如果您只想填写string[]List<string>:

string[] nicks = table.AsEnumerable().Select(r => r.Field<string>("nick")).ToArray(); // or ToList()

实现这一点的最简单方法是使用元素的XPath,因此从我的应用程序中解析不同的表:

string tableResult = htmlDocument.DocumentNode.SelectSingleNode("//table[@class='output']/tr[3]/td[3]").InnerText;