C#文件流无法正确读取word html

本文关键字：读取 word html 文件 | 更新日期: 2023-09-27 18:21:52

我正试图在网站上以html的形式发布用Word编写的文章，我有一个windows客户端，它会将文章转换为html，并将html发送到网站上的一个文件夹，然后在IFrame中显示文章。然而，在IE9中，图像不会显示，因为IE9试图将它们转换为矢量图形。我决定从html中删除对此负责的代码，这就是我的问题所在。在我修改并保存文件后，我会得到垃圾字符，这些字符也会显示在网页上。然而，如果我在notepad++中手动编辑文件，我不会遇到同样的问题，我如何使用C#读取保存在word中为html的文件，而不会得到这些垃圾字符？？这是我的代码

    private bool AdjustHtmlPageForIE9Images(FileInfo file)
    {
        bool success = true;
        try
        {
            string content = File.ReadAllText(file.FullName);
            //replace [if gte vml 1] with [if gte iesucksopd 1]
            content = content.Replace("[if gte vml 1]", "[if gte iesucksopd 1]");
            //replace [if !vml] with [if !iesucksopd]
            content = content.Replace("[if !vml]", "[if !iesucksopd]");
            //now write the file over
            File.WriteAllText(file.FullName, content);
        }
        catch (Exception ex)
        {
            throw ex;
        }
        return success;
    }

这会导致显示一些垃圾字符。

嗨，伙计们，感谢所有的回复，这是我为修复所做的

嗨，伙计们，谢谢你们的回复。最终，我不得不在FF中打开并检查编码，它是Western Windows-1252，然后当SLaks在读写操作中通过GetEncoding（1252）时，这里是修改后的代码。

    private bool AdjustHtmlPageForIE9Images(FileInfo file)
    {
        bool success = true;
        try
        {
            Encoding encoding = Encoding.GetEncoding(1252);
            string content = File.ReadAllText(file.FullName,encoding);
            //replace [if gte vml 1] with [if gte iesucksopd 1]
            content = content.Replace("[if gte vml 1]", "[if gte iesucksopd 1]");
            //replace [if !vml] with [if !iesucksopd]
            content = content.Replace("[if !vml]", "[if !iesucksopd]");
            //now write the file over
            File.WriteAllText(file.FullName, content, encoding);
        }
        catch (Exception ex)
        {
            throw ex;
        }
        return success;
    }

IE9不能在IFrame中显示单词中的html这样简单的事情，这难道不是很荒谬吗？难怪它的受欢迎程度一直在下降。

C#文件流无法正确读取word html

您需要显式地将编码传递给ReadAllText和WriteAllText；否则，它将默认为UTF8。

通过Encoding.GetEncoding(1252)。

确保转换后的html文件是UTF-8或UTF-32编码的，然后ReadAllText会正确检测到它。否则，请使用ReadAllText重载为参数提供转换后使用的编码。