可以';t下载utf-8网络内容

本文关键字:utf-8 网络 下载 可以 | 更新日期: 2023-09-27 18:00:44

我有从越南网站获取响应的简单代码:http://vnexpress.net,但是有一个小问题。它第一次下载ok,但在那之后,内容中包含了这样的未知符号:�''b''0''0''0''0''0�''a`I�%&m……怎么了?

    string address = "http://vnexpress.net";
    WebClient webClient = new WebClient();
    webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 AlexaToolbar/alxg-3.1");
    webClient.Encoding = System.Text.Encoding.UTF8;
    return webClient.DownloadString(address);

可以';t下载utf-8网络内容

您会发现响应是GZipped。除非创建一个派生类并修改底层HttpWebRequest以允许自动解压缩,否则似乎没有办法用WebClient下载它。

以下是您的操作方法:

    public class MyWebClient : WebClient
    {
        protected override WebRequest GetWebRequest(Uri address)
        {
            var req = base.GetWebRequest(address) as HttpWebRequest;
            req.AutomaticDecompression = DecompressionMethods.GZip;
            return req;
        }
    }

使用它:

string address = "http://vnexpress.net";
MyWebClient webClient = new MyWebClient();
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 AlexaToolbar/alxg-3.1");
webClient.Encoding = System.Text.Encoding.UTF8;
return webClient.DownloadString(address);

试着使用代码,你会没事的:

string address = "http://vnexpress.net";
WebClient webClient = new WebClient();
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.2; WOW64)   AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 AlexaToolbar/alxg-3.1");
return Encoding.UTF8.GetString(Encoding.Default.GetBytes(webClient.DownloadString(address)));             

DownloadString要求服务器在Content-Type响应标头中正确指示字符集。如果你在Fiddler中观察,你会看到服务器在HTML响应体中的META标记中发送字符集:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />   

如果你需要处理这样的响应,你需要自己解析HTML,或者使用像FiddlerCore这样的库来为你做这件事。