download CSV from google Insight?

本文关键字:Insight google from CSV download | 更新日期: 2023-09-27 17:49:45

我在过去的4-5也许6个月前成功地做到了这一点,但现在我看到网站已经改变了。我能够使用HttpWebRequest获得所需的搜索结果,问题是下载CSV文件。

下载失败。我用WebClient复制了这个,得到了所有的cookie,但仍然不起作用。

当我这样做时,我在文件

中得到这个

…元http-equiv ="refresh"内容="0;url = ' http://www.google.com/trends内容= 1,地理= US-AL& q = snooker& cmpt = q& hl = en-AU '"在

location.replace (" http://www.google.com/trends内容' x3d1 ' x26geo ' x3dUS-AL ' x26q ' x3dsnooker ' x26cmpt ' x3dq ' x26hl ' x3den-AU")

文件下载代码如下:

public void downloadsheet(string url, string path)
    {
        try
        {
            using (WebClient client = new WebClient())
            {

                string tmpCookieString = string.Empty;
                string[] array = webBrowser1.Document.Cookie.Split(new char[]
                        {
                            ';'
                        });
                for (int i = 0; i < array.Length; i++)
                {
                    string cookie = array[i];
                    string name = cookie.Split(new char[]
                            {
                                '='
                            })[0];
                    string value = cookie.Substring(name.Length + 1);
                    //client.Headers.Add(name, value);
                    if (i < array.Length - 1)
                    {
                        tmpCookieString = tmpCookieString + name + "=" + value + ";";
                    }
                    else
                    {
                        tmpCookieString = tmpCookieString + name + "=" + value;
                    }
                }
                client.Headers.Add(HttpRequestHeader.Cookie, tmpCookieString);
                client.Headers.Add("Accept", "text/html, application/xhtml+xml, */*");
                client.Headers.Add("User-Agent", "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.2)");
                client.Headers.Add("Accept-Language", "en-US");
                using (FileStream file = File.Create(path))
                {
                    byte[] bytes = client.DownloadData(url);
                    file.Write(bytes, 0, bytes.Length);
                }
            }
        }
        catch (Exception exp_DE)
        {
        }
    }

使用的url是:

http://www.google.com/trends/trendsReport?hl=en-AU& q = snooker&地理= US-AL& cmpt = q&内容= 1,出口= 2

如果我使用WebBrowser控件来导航到上面的相应链接,它会打开一个对话框。

download CSV from google Insight?

问题是HttpOnly cookie(即SIDHSID)出于安全目的从WebBrowser.Document.Cookie中丢失。

解决方案如下:

[DllImport("wininet.dll", CharSet = CharSet.Auto, SetLastError = true)]
static extern bool InternetGetCookieEx(string pchURL, string pchCookieName, StringBuilder pchCookieData, ref uint pcchCookieData, int dwFlags, IntPtr lpReserved);
const int INTERNET_COOKIE_HTTPONLY = 0x00002000;
private static string GetGlobalCookies(string uri)
{
    uint datasize = 2048;
    StringBuilder cookieData = new StringBuilder((int)datasize);
    if (InternetGetCookieEx(uri, null, cookieData, ref datasize, INTERNET_COOKIE_HTTPONLY, IntPtr.Zero)
        && cookieData.Length > 0)
    {
        return cookieData.ToString();
    }
    else
    {
        return null;
    }
}
public void downloadsheet(string url, string path)
{
    try
    {
        using (WebClient client = new WebClient())
        {
            string tmpCookieString = GetGlobalCookies(webBrowser1.Url.AbsoluteUri);
            client.Headers.Add(HttpRequestHeader.Cookie, tmpCookieString);
            client.Headers.Add("Accept", "text/html, application/xhtml+xml, */*");
            client.Headers.Add("User-Agent", "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.2)");
            client.Headers.Add("Accept-Language", "en-US");
            using (FileStream file = File.Create(path))
            {
                byte[] bytes = client.DownloadData(url);
                file.Write(bytes, 0, bytes.Length);
            }
        }
    }
    catch (Exception exp_DE)
    {
    }
}

当然,您应该在调用InternetGetCookieEx之前登录您的帐户。