屏幕抓取ASP.. NET页面不工作

本文关键字:工作 NET 抓取 ASP 屏幕 | 更新日期: 2023-09-27 18:04:37

我试图在以下站点带回页面上的日历事件:http://www.wphospital.org/News-Events/Calendar-of-Events.aspx

请注意,这个站点有一个名为"Month"的链接-我需要能够POST请求特定月份的日历事件的数据。我没法让它工作。下面是代码:

private static void GetData(ref string buf)
{
    try
    {
        //First, request the search form to get the viewstate value 
        HttpWebRequest webRequest = default(HttpWebRequest);
        webRequest = (HttpWebRequest)System.Net.WebRequest.Create("http://www.wphospital.org/News-Events/Calendar-of-Events.aspx");
        StreamReader responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream());
        string responseData = responseReader.ReadToEnd();
        responseReader.Close();
        //Extract the viewstate value and build out POST data 
        string viewState = ExtractViewState(responseData);
        string eventValidation = ExtractEventValidation(responseData);
        string postData = null;
        postData = String.Format("ctl00$manScript={0}&__EVENTTARGET=&__EVENTARGUMENT&__LASTFOCUS=&__VIEWSTATE={1}&lng={2}&__EVENTVALIDATION={3}&ctl00$searchbox1$txtWord={4}&textfield2={5}&ctl00$plcMain$lstbxCategory={6}&ctl00$plcMain$lstbxSubCategory={7}", "ctl00$plcMain$updMonthNav|ctl00$plcMain$btnNextMonth", viewState, "en-US", eventValidation, "Search", "your search here", 0, 0);
        var encoding = new ASCIIEncoding();
        byte[] data = encoding.GetBytes(postData);
        //Now post to the search form 
        webRequest = (HttpWebRequest)System.Net.WebRequest.Create("http://www.wphospital.org/News-Events/Calendar-of-Events.aspx");
        webRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)";
        webRequest.Method = "POST";
        webRequest.ContentType = "application/x-www-form-urlencoded";
        webRequest.ContentLength = data.Length;
        var newStream = webRequest.GetRequestStream();
        newStream.Write(data, 0, data.Length);
        newStream.Close();
        responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream());
        //And read the response 
        responseData = responseReader.ReadToEnd();
        responseReader.Close();
        buf = responseData;
    }
    catch (WebException ex)
    {
        if (ex.Status == WebExceptionStatus.ProtocolError)
        {
            Console.Write("The server returned protocol error ");
            // Get HttpWebResponse so that you can check the HTTP status code.
            HttpWebResponse httpResponse = (HttpWebResponse)ex.Response;
            int sc = (int)httpResponse.StatusCode;
            string strsc = httpResponse.StatusCode.ToString();
        }
    }
}
private static string ExtractViewState(string s)
{
    string viewStateNameDelimiter = "__VIEWSTATE";
    string valueDelimiter = "value='"";
    int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
    int viewStateValuePosition = s.IndexOf(valueDelimiter, viewStateNamePosition);
    int viewStateStartPosition = viewStateValuePosition + valueDelimiter.Length;
    int viewStateEndPosition = s.IndexOf("'"", viewStateStartPosition);
    return HttpUtility.UrlEncodeUnicode(s.Substring(viewStateStartPosition, viewStateEndPosition - viewStateStartPosition));
}
谁能给我指个正确的方向?

屏幕抓取ASP.. NET页面不工作

这可能会或可能不会解决你的问题,因为我不知道到底是什么问题,当你说它不工作。但是正如"Al W"所指出的,来自异步回发的响应不会看起来像一个直接的HTML流。因此,如果您的问题是事后解析它,那么这可能会有所帮助。

我最近有"机会"发现这一点,因为我需要重写该输出。我正在做一个c# jQuery移植,当我试图在异步回发期间重新呈现输出流时,我发现我正在破坏WebForms页面。我遍历了解析响应的客户端脚本,并找出了响应的格式。

每一个被更新的面板将返回一个数据块,格式如下:

" | | |类型ID长度内容"

可以有任意数量的串在一起。UpdatePanels类型为"updatePanel"。ID是控件的UniqueID, Content是实际的HTML数据。Length等于Content中的字节数,您需要使用它来解析每个块,因为分隔符可能出现在Content本身内部。因此,如果您决定在将此数据发送回ASP之前重写此数据。. NET页面(像我一样),您需要更新长度以反映您的内容的最终长度。

我用来解析和重写它的代码在Server/CsQueryHttpContext.

对于POST操作,您希望使用UTF-8编码,因此只需重新执行一行

        //var encoding = new ASCIIEncoding();
        //byte[] data = encoding.GetBytes(postData);
        //do this instead.....
        byte[] data = Encoding.UTF8.GetBytes(postData);

,看看是否有帮助

下面是我点击每月按钮时在chrome浏览器中获得的网络痕迹。注意__EVENTTARGET:ctl00$plcMain$monthBtn asp.net中有一个javascript框架,当点击该链接时调用javascript:postback()方法,该方法设置事件目标。这就是ASP。NET webforms知道回发时触发哪个事件。一个棘手的事情是,这个网页使用了一个更新面板,所以你可能不会得到一个真正的html响应。如果你可以让你的请求看起来像这样,那么你应该得到一个成功的响应。希望对你有帮助。

Request URL:http://www.wphospital.org/News-Events/Calendar-of-Events.aspx
Request Method:POST
Status Code:200 OK
Request Headers
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Cache-Control:no-cache
Content-Length:9718
Content-Type:application/x-www-form-urlencoded
Cookie:CMSPreferredCulture=en-US; ASP.NET_SessionId=h2nval45vq0q5yb0cp233huc; __utma=101137351.234148951.1312486481.1312486481.1312486481.1; __utmb=101137351.1.10.1312486481; __utmc=101137351; __utmz=101137351.1312486481.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __unam=ef169fe-131964a5f2a-24ec879b-1
Host:www.wphospital.org
Origin:http://www.wphospital.org
Proxy-Connection:keep-alive
Referer:http://www.wphospital.org/News-Events/Calendar-of-Events.aspx
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.124 Safari/534.30
X-MicrosoftAjax:Delta=true
Form Dataview URL encoded
ctl00$manScript:ctl00$plcMain$updTab|ctl00$plcMain$monthBtn
__EVENTTARGET:ctl00$plcMain$monthBtn
__EVENTARGUMENT:
__LASTFOCUS:
__VIEWSTATE:<removed for brevity>
lng:en-US
__EVENTVALIDATION:/wEWEgLbj/nSDgKt983zDgKWlOLbAQKr3LqFAwKL3uqpBwK9kfRnArDHltMCAuTk0eAHAsfniK0DAteIosMPAsiIosMPAsmIosMPAsuIosMPAoD0ookDApCbiOcPAo biOcPAombiOcPAoubiOcPyfqRx8FdqYzlnnkXcJEJZzzopJY=
ctl00$searchbox1$txtWord:Search
textfield2:Enter your search here
ctl00$plcMain$lstbxCategory:0
ctl00$plcMain$lstbxSubCategory:0
ctl00$plcMain$hdnEventCount:2