发布一个 http 表单并使用网络抓取获取目标 html

本文关键字:抓取 网络 获取 目标 html 布一个 http 表单 | 更新日期: 2023-09-27 18:35:42

我目前正在构建一个网站以发布到第三方网站,并使用以下方法从中提取详细信息

String htmlCode = "<html>" +
"<head>" +
"<title>Form</title>" +
"</head>" +
"<body onload='"javascript:document.forms[0].submit()>" +
"<form method='"post'" action='"%verylongactionurl%'">" +
"<input type='"hidden'" name='"key'" value='"value'">" +
"</form>" +
"</body>" +
"</html>";

我在 c# 代码中替换了上述 html 字符串中的所有必需值,然后执行以下操作将内容写入我的页面,

这运行良好
Response.Write(httpForm);

有没有办法可以捕获我在代码的上述步骤中获得的目标表单的 html?

这是针对执行网络抓取并从目标站点中提取所需详细信息并在我们的应用程序中显示所需值的新要求。

我尝试了以下代码,但不起作用。我在回复URL中看到目标站点的错误页面,我得到了回复。

HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(sourceUrl);
request.AllowAutoRedirect = true;
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
string postData = HttpUtility.UrlEncode(String.Format("key={0}&", value));
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = postData.Length;
// This is sent to the Post
byte[] bytes = Encoding.UTF8.GetBytes(postData);

//request.ContentLength = bytes.Length;
using (Stream requestStream = request.GetRequestStream())
{
    requestStream.Write(bytes, 0, postData.Length);
    requestStream.Flush();
    requestStream.Close();
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
}

发布一个 http 表单并使用网络抓取获取目标 html

尝试做这样的事情

//Createing instans of web client
WebClient wc = new WebClient();
//Getting the html content of the whattsap application page for android
string HtmlString = wc.DownloadString("http://www.whatsapp.com/android");
//Loading the html content into HtmlAgilityPack HTML Document
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(HtmlString);
//Extracting The latest version string from the HTML content by searching the "P" with class named version
//and retriving it's inner text.
_currentVersion = htmlDoc.DocumentNode.Descendants("p").Where(d => d.Attributes.Contains("class") && d.Attributes["class"].Value.Contains("version")).First().InnerText;
//removing the "Version" keyword from the version string so we can get only rhe version number
_currentVersion = _currentVersion.Replace("Version", "").Trim();

在此示例中,我正在提取WhatsApp应用程序的最新版本号海峡从那里网站。

因此,您唯一需要发布的是您需要从中提取数据的Web URL