屏幕抓取ASP.. NET页面不工作
本文关键字:工作 NET 抓取 ASP 屏幕 | 更新日期: 2023-09-27 18:04:37
我试图在以下站点带回页面上的日历事件:http://www.wphospital.org/News-Events/Calendar-of-Events.aspx
请注意,这个站点有一个名为"Month"的链接-我需要能够POST请求特定月份的日历事件的数据。我没法让它工作。下面是代码:
private static void GetData(ref string buf)
{
try
{
//First, request the search form to get the viewstate value
HttpWebRequest webRequest = default(HttpWebRequest);
webRequest = (HttpWebRequest)System.Net.WebRequest.Create("http://www.wphospital.org/News-Events/Calendar-of-Events.aspx");
StreamReader responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream());
string responseData = responseReader.ReadToEnd();
responseReader.Close();
//Extract the viewstate value and build out POST data
string viewState = ExtractViewState(responseData);
string eventValidation = ExtractEventValidation(responseData);
string postData = null;
postData = String.Format("ctl00$manScript={0}&__EVENTTARGET=&__EVENTARGUMENT&__LASTFOCUS=&__VIEWSTATE={1}&lng={2}&__EVENTVALIDATION={3}&ctl00$searchbox1$txtWord={4}&textfield2={5}&ctl00$plcMain$lstbxCategory={6}&ctl00$plcMain$lstbxSubCategory={7}", "ctl00$plcMain$updMonthNav|ctl00$plcMain$btnNextMonth", viewState, "en-US", eventValidation, "Search", "your search here", 0, 0);
var encoding = new ASCIIEncoding();
byte[] data = encoding.GetBytes(postData);
//Now post to the search form
webRequest = (HttpWebRequest)System.Net.WebRequest.Create("http://www.wphospital.org/News-Events/Calendar-of-Events.aspx");
webRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)";
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.ContentLength = data.Length;
var newStream = webRequest.GetRequestStream();
newStream.Write(data, 0, data.Length);
newStream.Close();
responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream());
//And read the response
responseData = responseReader.ReadToEnd();
responseReader.Close();
buf = responseData;
}
catch (WebException ex)
{
if (ex.Status == WebExceptionStatus.ProtocolError)
{
Console.Write("The server returned protocol error ");
// Get HttpWebResponse so that you can check the HTTP status code.
HttpWebResponse httpResponse = (HttpWebResponse)ex.Response;
int sc = (int)httpResponse.StatusCode;
string strsc = httpResponse.StatusCode.ToString();
}
}
}
private static string ExtractViewState(string s)
{
string viewStateNameDelimiter = "__VIEWSTATE";
string valueDelimiter = "value='"";
int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(valueDelimiter, viewStateNamePosition);
int viewStateStartPosition = viewStateValuePosition + valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("'"", viewStateStartPosition);
return HttpUtility.UrlEncodeUnicode(s.Substring(viewStateStartPosition, viewStateEndPosition - viewStateStartPosition));
}
谁能给我指个正确的方向?
这可能会或可能不会解决你的问题,因为我不知道到底是什么问题,当你说它不工作。但是正如"Al W"所指出的,来自异步回发的响应不会看起来像一个直接的HTML流。因此,如果您的问题是事后解析它,那么这可能会有所帮助。
我最近有"机会"发现这一点,因为我需要重写该输出。我正在做一个c# jQuery移植,当我试图在异步回发期间重新呈现输出流时,我发现我正在破坏WebForms页面。我遍历了解析响应的客户端脚本,并找出了响应的格式。
每一个被更新的面板将返回一个数据块,格式如下:
" | | |类型ID长度内容"
可以有任意数量的串在一起。UpdatePanels
类型为"updatePanel"。ID是控件的UniqueID, Content是实际的HTML数据。Length等于Content中的字节数,您需要使用它来解析每个块,因为分隔符可能出现在Content本身内部。因此,如果您决定在将此数据发送回ASP之前重写此数据。. NET页面(像我一样),您需要更新长度以反映您的内容的最终长度。
我用来解析和重写它的代码在Server/CsQueryHttpContext.
对于POST操作,您希望使用UTF-8编码,因此只需重新执行一行
//var encoding = new ASCIIEncoding();
//byte[] data = encoding.GetBytes(postData);
//do this instead.....
byte[] data = Encoding.UTF8.GetBytes(postData);
,看看是否有帮助
下面是我点击每月按钮时在chrome浏览器中获得的网络痕迹。注意__EVENTTARGET:ctl00$plcMain$monthBtn asp.net中有一个javascript框架,当点击该链接时调用javascript:postback()方法,该方法设置事件目标。这就是ASP。NET webforms知道回发时触发哪个事件。一个棘手的事情是,这个网页使用了一个更新面板,所以你可能不会得到一个真正的html响应。如果你可以让你的请求看起来像这样,那么你应该得到一个成功的响应。希望对你有帮助。
Request URL:http://www.wphospital.org/News-Events/Calendar-of-Events.aspx
Request Method:POST
Status Code:200 OK
Request Headers
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Cache-Control:no-cache
Content-Length:9718
Content-Type:application/x-www-form-urlencoded
Cookie:CMSPreferredCulture=en-US; ASP.NET_SessionId=h2nval45vq0q5yb0cp233huc; __utma=101137351.234148951.1312486481.1312486481.1312486481.1; __utmb=101137351.1.10.1312486481; __utmc=101137351; __utmz=101137351.1312486481.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __unam=ef169fe-131964a5f2a-24ec879b-1
Host:www.wphospital.org
Origin:http://www.wphospital.org
Proxy-Connection:keep-alive
Referer:http://www.wphospital.org/News-Events/Calendar-of-Events.aspx
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.124 Safari/534.30
X-MicrosoftAjax:Delta=true
Form Dataview URL encoded
ctl00$manScript:ctl00$plcMain$updTab|ctl00$plcMain$monthBtn
__EVENTTARGET:ctl00$plcMain$monthBtn
__EVENTARGUMENT:
__LASTFOCUS:
__VIEWSTATE:<removed for brevity>
lng:en-US
__EVENTVALIDATION:/wEWEgLbj/nSDgKt983zDgKWlOLbAQKr3LqFAwKL3uqpBwK9kfRnArDHltMCAuTk0eAHAsfniK0DAteIosMPAsiIosMPAsmIosMPAsuIosMPAoD0ookDApCbiOcPAo biOcPAombiOcPAoubiOcPyfqRx8FdqYzlnnkXcJEJZzzopJY=
ctl00$searchbox1$txtWord:Search
textfield2:Enter your search here
ctl00$plcMain$lstbxCategory:0
ctl00$plcMain$lstbxSubCategory:0
ctl00$plcMain$hdnEventCount:2