为什么在使用 HTML 敏捷包从 p 标签获取内部文本时会得到一个空字符串

本文关键字:字符串 一个 文本 内部 HTML 获取 标签 为什么 | 更新日期: 2023-09-27 18:37:07

我正在尝试为这个网站制作一个网络API:http://apod.nasa.gov/apod/astropix.html

图像的描述位于"p"标签中。因此,使用我的 C# 代码,我尝试获取标记中的所有内容并将其解析为字符串。我的代码如下所示:

Description = item.SelectSingleNode("p").InnerText()

这只返回一个值为 " 的字符串。它不是空的,而只是空的。为什么它不显示标签中的文本?

如果我这样做:

Description = item.SelectSingleNode("p").NextSibling.NextSibling.Innertext

然后它确实显示了段落的第一个单词:"解释:",位于"b"标签之间。

以下是完整的 HTML:

<!doctype html>
<html>
<head>
<title>Astronomy Picture of the Day
</title>
<!-- gsfc meta tags -->
<meta name="orgcode" content="661">
<meta name="rno" content="phillip.a.newman">
<meta name="content-owner" content="Jerry.T.Bonnell.1">
<meta name="webmaster" content="Stephen.F.Fantasia.1">
<meta name="description" content="A different astronomy and space science
related image is featured each day, along with a brief explanation.">
<!-- -->
<meta name="keywords" content="full moon, lunar phase">
<script id="_fed_an_js_tag" type="text/javascript"
src="js/federated-analytics.all.min.js?agency=NASA"></script>
</head>
<body BGCOLOR="#F4F4FF" text="#000000" link="#0000FF" vlink="#7F0F9F"
alink="#FF0000">
<center>
<h1> Astronomy Picture of the Day </h1>
<p>
<a href="archivepix.html">Discover the cosmos!</a>
Each day a different image or photograph of our fascinating universe is
featured, along with a brief explanation written by a professional astronomer.
<p>
2015 January 10
<br>
<a href="image/1501/_MG_4115sTafreshi.jpg">
<IMG SRC="image/1501/_MG_4115sTafreshi1024.jpg"
alt="See Explanation.  Clicking on the picture will download
 the highest resolution version available."></a>
</center>
<center>
<b> The Windmill's Moon </b> <br>
<b> Image Credit &
<a href="lib/about_apod.html#srapply">Copyright</a>: </b>
<a href="http://www.twanight.org/tafreshi">Babak Tafreshi</a>
(<a href="http://www.twanight.org/">TWAN</a>)
</center> 
<p> 
<b> Explanation: </b>
Seen from the
<a href="http://earthobservatory.nasa.gov/IOTD/view.php?id=81421">Canary
Island</a>
of
<a href="http://earthobservatory.nasa.gov/NaturalHazards/
view.php?id=77372">Fuerteventura</a>, this bright
Full Moon rose at sunset.
Reaching its full phase
<a href="http://earthsky.org/tonight/
january-full-moon-mimics-path-of-july-sun">on the night</a>
of January 4/5, it was
the first Full Moon of the new year and the first to follow
December's solstice.
Of course, in North America the first Full Moon of January
has been known as
<a href="ap120120.html">the Wolf's Moon</a>.
But this Full Moon, posed in the twilight above an island of strong
winds and traditional windmills,
suggests another name.
<a href="http://www.dreamview.net/dv/new/photos.asp?ID=104104">The
telephoto image</a>, taken at a distance from the foreground
windmill, creates the
<a href="http://home.hiwaay.net/~krcool/Astro/moon/moonwords/
moonpoems.htm">dramatic</a> comparison in
<a href="ap080801.html">apparent</a> size for windmill and Full Moon.
<p><center>
<b> Tomorrow's picture: </b>Cataclysmic Dawn
<p> <hr>
<a href="ap150109.html">&lt;</a>
| <a href="archivepix.html">Archive</a>
| <a href="lib/apsubmit2015.html">Submissions</a>
| <a href="http://antwrp.gsfc.nasa.gov/cgi-bin/apod/apod_search">Search</a>
| <a href="calendar/allyears.html">Calendar</a>
| <a href="/apod.rss">RSS</a>
| <a href="lib/edlinks.html">Education</a>
| <a href="lib/about_apod.html">About APOD</a>
| <a href=
"http://asterisk.apod.com/discuss_apod.php?date=150110">Discuss</a>
| <a href="ap150111.html">&gt;</a>
<hr><p>
<b> Authors & editors: </b>
<a href="http://www.phy.mtu.edu/faculty/Nemiroff.html">Robert Nemiroff</a>
(<a href="http://www.phy.mtu.edu/">MTU</a>) &
<a href="http://antwrp.gsfc.nasa.gov/htmltest/jbonnell/www/bonnell.html"
>Jerry Bonnell</a> (<a href="http://www.astro.umd.edu/">UMCP</a>)<br>
<b>NASA Official: </b> Phillip Newman
<a href="lib/about_apod.html#srapply">Specific rights apply</a>.<br>
<a href="http://www.nasa.gov/about/highlights/HP_Privacy.html">NASA Web
Privacy Policy and Important Notices</a><br>
<b>A service of:</b>
<a href="http://astrophysics.gsfc.nasa.gov/">ASD</a> at
<a href="http://www.nasa.gov/">NASA</a> /
<a href="http://www.nasa.gov/centers/goddard/">GSFC</a>
<br><b>&</b> <a href="http://www.mtu.edu/">Michigan Tech. U.</a><br>
</center>
</body>
</html>

为什么在使用 HTML 敏捷包从 p 标签获取内部文本时会得到一个空字符串

我认为

这是因为没有</p>标签。所以它什么也不返回。

我用这段代码修复了它:

var description = item.SelectSingleNode("p").NextSibling.NextSibling;
string a = string.Empty;
while (description != item.SelectNodes("p")[1])
{
    a += description.InnerText + Environment.NewLine + Environment.NewLine;
    description = description.NextSibling;
}
apod.Description = a;