我正在尝试从网站检索所有 http 和 https 链接,但有时我会收到空异常

本文关键字:链接 异常 https http 检索 网站 | 更新日期: 2023-09-27 18:34:24

public partial class Form1 : Form
{
   int y = 0;
   string url = @"http://www.google.co.il";
   string urls = @"http://www.bing.com/images/search?q=cat&go=&form=QB&qs=n";
   public Form1()
   {
       InitializeComponent();
       //webCrawler(urls, 3);
       List<string> a = webCrawler(urls, 1);
       //GetAllImages();
   }
   private int factorial(int n)
   {
      if (n == 0) return 1;
      else y = n * factorial(n - 1);
      listBox1.Items.Add(y);
      return y;
   }
   private List<string> getLinks(HtmlAgilityPack.HtmlDocument document)
   {
       List<string> mainLinks = new List<string>();
       if (document.DocumentNode.SelectNodes("//a[@href]") == null)
       { }
       foreach (HtmlNode link in document.DocumentNode.SelectNodes("//a[@href]"))
       {
           var href = link.Attributes["href"].Value;
           mainLinks.Add(href);
       }
       return mainLinks;
   }
   private List<string> webCrawler(string url, int levels)
   {
      HtmlAgilityPack.HtmlDocument doc;
      HtmlWeb hw = new HtmlWeb(); 
      List<string> webSites;// = new List<string>();
      List<string> csFiles = new List<string>();
      csFiles.Add("temp string to know that something is happening in level = " + levels.ToString());
      csFiles.Add("current site name in this level is : "+url);
      /* later should be replaced with real cs files .. cs files links..*/
      doc = hw.Load(url);
      webSites = getLinks(doc);
      if (levels == 0)
      {
         return csFiles;
      }
      else
      {
         int actual_sites = 0;
         for (int i = 0; i < webSites.Count() && i< 100000; i++) // limiting ourseleves for 20 sites for each level for now..
         //or it will take forever.
         {
             string t = webSites[i];
             /*
                    if (!webSites.Contains(t))
                    {
                        webCrawler(t, levels - 1);
                    }
             */
             if ( (t.StartsWith("http://")==true) || (t.StartsWith("https://")==true) ) // replace this with future FilterJunkLinks function
             {
                actual_sites++;
                csFiles.AddRange(webCrawler(t, levels - 1));
                richTextBox1.Text += t + Environment.NewLine;
             }
          }
          // report to a message box only at high levels..
          if (levels==1)
             MessageBox.Show(actual_sites.ToString());
          return csFiles;
       }                
    }

几个站点发送到 getLinks 函数后,将引发异常。

例外情况是以下行上的 getLinks 函数:

foreach (HtmlNode link in document.DocumentNode.SelectNodes("//a[@href]"))

对象引用未设置为对象的实例

我试图在那里使用 IF 来检查它是否为空,然后我做了return mainLinks;这是一个列表。

但是如果我这样做,我不会从网站上获得所有链接。

现在我在构造函数中使用 url 如果我使用 url ( www.google.co.il (,几秒钟后我会收到相同的异常。

我不知道为什么会抛出这个异常。这个例外有什么原因吗?

System.NullReferenceException 未处理
消息=对象引用未设置为对象的实例。
来源=收集链接
堆栈跟踪:
at GatherLinks.Form1.getLinks(HtmlDocument document( in D:''C-Sharp''GatherLinks''GatherLinks''Form1.cs:line 55
at GatherLinks.Form1.webCrawler(String url, Int32 levels( in D:''C-Sharp''GatherLinks''GatherLinks''GatherLinks''Form1.cs:line 76
at GatherLinks.Form1.webCrawler(String url, Int32 levels( in D:''C-Sharp''GatherLinks''GatherLinks''GatherLinks''Form1.cs:line 104
在GatherLinks.Form1..ctor(( in D:''C-Sharp''GatherLinks''GatherLinks''GatherLinks''Form1.cs:line 29
at GatherLinks.Program.Main(( in D:''C-Sharp''GatherLinks''GatherLinks''GatherLinks''Program.cs:line 18
at System.AppDomain._nExecuteAssembly(Assembly Assembly, String[] args(
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args(
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly((
在System.Threading.ThreadHelper.ThreadStart_Context(对象状态(
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state(
at System.Threading.ThreadHelper.ThreadStart((

我正在尝试从网站检索所有 http 和 https 链接,但有时我会收到空异常

问题似乎是您正在测试 null,但随后什么也没做 - 这里

            if (document.DocumentNode.SelectNodes("//a[@href]") == null)
            {
            }

我怀疑您想处理空情况,但尚未编写代码来执行此操作。您可能想要类似以下内容:

    private List<string> getLinks(HtmlAgilityPack.HtmlDocument document)
        {
           List<string> mainLinks = new List<string>();
           if (document.DocumentNode.SelectNodes("//a[@href]") != null)
            {
                foreach (HtmlNode link in document.DocumentNode.SelectNodes("//a[@href]"))
                {
                    var href = link.Attributes["href"].Value;
                    mainLinks.Add(href);
                }
            }
            return mainLinks;
        }

您可能希望整理成更像以下内容的内容:

   private List<string> getLinks(HtmlAgilityPack.HtmlDocument document)
        {
           List<string> mainLinks = new List<string>();
           var linkNodes = document.DocumentNode.SelectNodes("//a[@href]");
           if (linkNodes != null)
            {
                foreach (HtmlNode link in linkNodes)
                {
                    var href = link.Attributes["href"].Value;
                    mainLinks.Add(href);
                }
            }
            return mainLinks;
        }