网络浏览器文档文本编码

本文关键字：编码文本文档浏览器网络 | 更新日期: 2023-09-27 18:32:06

我遇到了一些奇怪的事情，我想听听你的意见。

有一个网页包含一个span元素，InnerText和InnerHtml属性中有一些希腊文本。

页面的编码是希腊语（Windows）。

我if声明是：

if (mySpan != null && mySpan.InnerText.Contains(greekText))

这一行工作 100%，但我以前的非工作代码是：

if (mySpan != null && browser.DocumentText.Contains(greekText))

这一行不起作用，当我在调试器中单击预览时，我注意到希腊文本不可读（奇怪的符号而不是希腊字符）。但是，应用程序成功读取了所有其他包含希腊文本的元素，也就是说，我可以将它们的属性保存在变量中并使用它们。有什么解释为什么DocumentText失败而InnerText成功了吗？

网络浏览器文档文本编码

查看源代码以了解WebBrowser.DocumentText，它似乎默认使用 UTF8 编码：

public string DocumentText
{
  get
  {
    Stream documentStream = this.DocumentStream;
    if (documentStream == null)
      return "";
    StreamReader streamReader = new StreamReader(documentStream);
    documentStream.Position = 0L;
    return streamReader.ReadToEnd();
  }

也就是说，使用未指定编码的StreamReader将采用 UTF8 编码。

请参阅此链接以解决此问题

我只能假设使用browser.Document.GetElementById(mySpanId)尊重页面的声明编码，这就是为什么您在使用此调用时正确看到它的原因。