如何使用 HTML 敏捷包抓取内容
本文关键字:抓取 包抓取 何使用 HTML | 更新日期: 2024-10-30 14:03:34
我是HTML敏捷包的新手,如何在C#中使用HTML敏捷包获取这些内容(代理)。
我的代码 :
string url = "http://www.proxybase.de/";
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(url);
var nodes = doc.DocumentNode.SelectNodes("//table[@border='0' and @cellspacing='0' and @cellpadding='0']");
if (nodes != null)
{
foreach (HtmlNode item in nodes)
{
if (item != null)
{
string s = item.InnerText;
listView1.Items.Add(s);
}
}
}
else
{
MessageBox.Show("Nothing found");
}
HTML 将看起来像...
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>...</tr> //Ignore first one
<tr>
<td>...</td>
<td style="padding-left:5px;border-left;1px solid #999;"> 123.45.678.90:80 </td>
<td style="padding-left:5px;border-left;1px solid #999;">...</td>
</tr>
</tbody>
</table>
更新 :
如何使用选择单节点选择带有索引数组的表数据?
我认为您需要将网站信息(例如IP地址等)存储到文件或数据库中
如果上述情况属实,您几乎就在那里:这应该可以解决它:
string url = "http://www.proxybase.de/";
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(url);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//td[@style='padding-left:5px;border-left;1px solid #999;'"))
{
String s = HtmlNode.InnerText;
//Now the IP address is stored in s.
//You can either put it in a file/database or a webpage :)
}
HtmlWeb hw = new HtmlWeb();
hw.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)";
hw.PreRequest = new HtmlAgilityPack.HtmlWeb.PreRequestHandler(p.ProxyOnPreRequest); // this is proxy request
HtmlAgilityPack.HtmlDocument doc = hw.Load(openUrl);
public bool ProxyOnPreRequest(HttpWebRequest request)
{
WebProxy myProxy = new WebProxy("203.189.134.17:80");
request.Proxy = myProxy;
return true; // ok, go on
}