循环浏览使用 JavaScript 的页面
本文关键字:JavaScript 浏览 循环 | 更新日期: 2023-09-27 18:31:10
我正在尝试使用 C# 解析此网页,但我不确定如何循环浏览不同的页面。 我希望只使用每个页面的 URL,但看起来他们正在使用一堆 JavaScript 调用来访问下一页,因此 URL 实际上并没有因页面而异。 有人知道我该怎么做吗?
网页: http://www.roads.maryland.gov/pages/cic.aspx?PageId=857&Type=tab
提前谢谢。
您需要将 POST 到具有所有视图状态的同一 URL,然后您可以将__EVENTARGUMENT post 值更改为 Page$1、Page$2、Page$3 等
像这样的东西
//Create request to URL.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.roads.maryland.gov/pages/cic.aspx?PageId=857&Type=tab");
//Set request headers.
request.KeepAlive = true;
request.Headers.Add("Origin", @"http://www.roads.maryland.gov");
request.Headers.Add("X-MicrosoftAjax", "Delta=true");
request.UserAgent = "Mozilla/1.0";
request.ContentType = "application/x-www-form-urlencoded; charset=UTF-8";
request.Accept = "*/*";
request.Headers.Set(HttpRequestHeader.AcceptLanguage, "en-GB,en;q=0.8,en-US;q=0.6");
//Set request method
request.Method = "POST";
// Disable 'Expect: 100-continue' behavior. More info: http://haacked.com/archive/2004/05/15/http-web-request-expect-100-continue.aspx
request.ServicePoint.Expect100Continue = false;
//Set request body.
string body = "__EVENTARGUMENT=Page%24";
body += pageNumber.ToString();
body += @"&ctl00%24PlaceHolderMain%24ScriptManager1=ctl00%24PlaceHolderMain%24CICContract1%24displayPanel1%7Cctl00%24PlaceHolderMain%24CICContract1%24gvSSP&MSO_PageHashCode=687&MSOWebPartPage_PostbackSource=&MSOTlPn_SelectedWpId=&MSOTlPn_View=0&MSOTlPn_ShowSettings=False&MSOGallery_SelectedLibrary=&MSOGallery_FilterString=&MSOTlPn_Button=none&__REQUESTDIGEST=0x8DBC9CD602B61A910B12C83C03A47B6755DE07797515E34AEA8EE0BDC93813FC6A208251090430008C51E50BF4A3D0EBEAF4A9F2CC60072EC4B6B5FED3F48D31%2C20%20Nov%202014%2017%3A15%3A59%20-0000&MSOSPWebPartManager_DisplayModeName=Browse&MSOWebPartPage_Shared=&MSOLayout_LayoutChanges=&MSOLayout_InDesignMode=&MSOSPWebPartManager_OldDisplayModeName=Browse&MSOSPWebPartManager_StartWebPartEditingName=false&ctl00%24PlaceHolderSearchArea%24q=Search&ctl00%24PlaceHolderMain%24CICContract1%24ddlPulldown=tab&__EVENTTARGET=ctl00%24PlaceHolderMain%24CICContract1%24gvSSP&__LASTFOCUS=&__VIEWSTATE=%2F
...massive viewstate...
%3D&__VIEWSTATEGENERATOR=E6DD55AA&__EVENTVALIDATION=%2FwEWKwK7ur20CQKpuunwCgLbzK2CBQKBio7OBALDpsvcBQK6mIPlBQLa0ZnPCgLmidHRCQKu%2FbCaDQKvg%2BHxDwKl%2FYSaDQK87ov7BQKV8JfxAQKdo6DlBgLSppvcBQLFutGoAwK3suC5CgK3spS5CgK3sui5CgK3svC5CgK3suS5CgK3svi5CgK3ssy5CgK3ssC5CgLZuKSJCwKykYb0DgKrq%2FH8DAK3zM7sDALA%2BfaWBAKYyYezCwK3suC5CgK3spS5CgK3sui5CgK3svC5CgK3suS5CgK3svi5CgK3ssy5CgK3ssC5CgLZuKSJCwKykYb0DgKrq%2FH8DALl54b0BgK%2FjseDBA0f8JbX2FIE%2F7%2FP1ojwVriOGXif&__ASYNCPOST=true&";
byte[] postBytes = System.Text.Encoding.UTF8.GetBytes(body);
request.ContentLength = postBytes.Length;
Stream stream = request.GetRequestStream();
stream.Write(postBytes, 0, postBytes.Length);
stream.Close();
//Get response to request.
response = (HttpWebResponse)request.GetResponse();
HTML Agility Pack 应该能够做到这一点。 您可以使用它来模拟按钮单击,这将正确激活JavaScript。
类似这样的东西可能会起作用(假设您的窗体上有 webbrowser
控件):
var browser = new IE(webBrowser1.ActiveXInstance);
var page2Link = (Link)browser.Elements.Where(e => e.InnerHtml.Contains("Page$2")).First();
page2Link.Click();