Xpath Web scrape
本文关键字:scrape Web Xpath | 更新日期: 2023-09-27 18:32:03
<a class="support" style="letter-spacing: -1px" href="/support/index.php?/Knowledgebase/List/updates" data-executing="0">I'm random</a>
我正在尝试使用 xpath 抓取上面的链接属性,链接文本"I'm random"
总是在变化。其余的保持不变。"I'm random"
文本是我要抓取的。
真的不明白 xpath,我怎么会只拉内部文本?我试过:
string html = Web.ExecuteJavascriptWithResult("document.getElementsByTagName('html')[0].innerHTML");
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(html);
var Attributes = new List<string>();
var Randomtxt = htmlDoc.DocumentNode.SelectNodes("//a[@href]");
if (Randomtxt != null)
{
foreach (var contents in Randomtxt)
{
string href = contents.InnerHtml;
var parts = href.Split(new char[] { '=' }, StringSplitOptions.RemoveEmptyEntries);
if (parts.Length > 1)
{
Attributes.Add(parts[1]);
}
}
Attribute.DataSource = Attributes;
}
但它什么也没返回。我将如何只获得内部文本。
不是 xpath,但这适用于我想做的事情,问题解决了。
List<string> Attributes = new List<string>();
string html = Web.ExecuteJavascriptWithResult("document.getElementsByTagName('html')[0].innerHTML");
MatchCollection m1 = Regex.Matches(html, @"data-executing='s*(.+?)'s*/a>", RegexOptions.Singleline);
foreach (Match m in m1)
{
string new = m.Groups[1].Value;
Attributes.Add(new);
}
Attribute.DataSource = Attributes;
首先找到单个节点
var Randomtxt = htmlDoc.DocumentNode.SelectSingleNode("//*[ @class='support']");
然后拉取内部文本
字符串值 = Randomtxt.Innertext;