有些网站拒绝HttpClient请求,即使设置了报头

本文关键字:设置 报头 请求 网站 拒绝 HttpClient | 更新日期: 2023-09-27 18:09:08

我已经写了一些代码来检查我的数据库中的所有网站是否仍然托管和在线。

问题是其中一些网站似乎有机器人保护,每当我试图通过HttpClient请求时,他们会提出错误,而不是显示页面。

我已经看到了其他类似的问题,建议在浏览器头添加,所以我已经这样做了,但这并没有帮助。同样的站点仍然拒绝HttpClient连接,但当我在浏览器中查看它们时,它们是完全好的。

我做错了我的代码或我需要一些额外的步骤?

下面是我的代码:
public static async Task CheckSite(string url, int id)
{
    try
    {
        using(var db = new PlaceDBContext())
        using (HttpClient client = new HttpClient(new HttpClientHandler()
        {
            AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip
        }))
        using (HttpResponseMessage response = await client.GetAsync(url))
        using (HttpContent content = response.Content)
        {
            client.DefaultRequestHeaders.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml");
            client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate");
            client.DefaultRequestHeaders.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0");
            client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Charset", "ISO-8859-1");
            var rd = db.RootDomains.Find(id);
            string result = await content.ReadAsStringAsync();

            if (result != null && result.Length >= 50)
            {
                Console.WriteLine("fine");
                rd.LastCheckOnline = true;
            }
            else
            {
                Console.WriteLine("There was empty or short result");
                rd.LastCheckOnline = false;
            }
            db.SaveChanges();
            semaphore.Release();
        }
    }
    catch(Exception ex)
    {
        Console.WriteLine(ex.Message);
        using(var db = new PlaceDBContext())
        {
            var rd = db.RootDomains.Find(id);
            rd.LastCheckOnline = false;
            db.SaveChanges();
            semaphore.Release();
        }
    }
}

有些网站拒绝HttpClient请求,即使设置了报头

发送请求前设置报头。你是在已经得到响应后才做的

public static async Task CheckSite(string url, int id) {
    try {
        using (var db = new PlaceDBContext())
        using (var client = new HttpClient(new HttpClientHandler() {
            AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip
        })) {
            client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate");
            client.DefaultRequestHeaders.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0");
            client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Charset", "ISO-8859-1");
            using (var response = await client.GetAsync(url))
            using (var content = response.Content) {
                client.DefaultRequestHeaders.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml");
                var rd = db.RootDomains.Find(id);
                string result = await content.ReadAsStringAsync();

                if (result != null && result.Length >= 50) {
                    Console.WriteLine("fine");
                    rd.LastCheckOnline = true;
                } else {
                    Console.WriteLine("There was empty or short result");
                    rd.LastCheckOnline = false;
                }
                db.SaveChanges();
                semaphore.Release();
            }
        }
    } catch (Exception ex) {
        Console.WriteLine(ex.Message);
        using (var db = new PlaceDBContext()) {
            var rd = db.RootDomains.Find(id);
            rd.LastCheckOnline = false;
            db.SaveChanges();
            semaphore.Release();
        }
    }
}