通过 StringBuilder 使用 XmlWriter 进行 XML 序列化是 utf-16,而通过 Stream

本文关键字:utf-16 Stream 序列化 使用 StringBuilder XmlWriter 进行 XML 通过 | 更新日期: 2023-09-27 17:56:17

当我遇到它时,我很惊讶,并编写了一个控制台应用程序来检查它并确保我没有做任何其他事情。

谁能解释一下?

代码如下:

using System;    
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Serialization;
namespace ConsoleApplication1
{
    public class Program
    {
        static void Main(string[] args)
        {
            var o = new SomeObject { Field1 = "string value", Field2 = 8 };
            Console.WriteLine("ObjectToXmlViaStringBuilder");
            Console.Write(ObjectToXmlViaStringBuilder(o));
            Console.WriteLine();
            Console.WriteLine();
            Console.WriteLine("ObjectToXmlViaStream");
            Console.Write(StreamToString(ObjectToXmlViaStream(o)));
            Console.ReadKey();
        }
        public static string ObjectToXmlViaStringBuilder(SomeObject someObject)
        {
            var output = new StringBuilder();
            var settings = new XmlWriterSettings { Encoding = Encoding.UTF8, Indent = true };
            using (var xmlWriter = XmlWriter.Create(output, settings))
            {
                var serializer = new XmlSerializer(typeof(SomeObject));
                var namespaces = new XmlSerializerNamespaces();
                xmlWriter.WriteStartDocument();
                xmlWriter.WriteDocType("Field1", null, "someObject.dtd", null);
                namespaces.Add(string.Empty, string.Empty);
                serializer.Serialize(xmlWriter, someObject, namespaces);
            }
            return output.ToString();
        }
        private static string StreamToString(Stream stream)
        {
            var reader = new StreamReader(stream);
            return reader.ReadToEnd();
        }
        public static Stream ObjectToXmlViaStream(SomeObject someObject)
        {
            var output = new MemoryStream();
            var settings = new XmlWriterSettings { Encoding = Encoding.UTF8, Indent = true };
            using (var xmlWriter = XmlWriter.Create(output, settings))
            {
                var serializer = new XmlSerializer(typeof(SomeObject));
                var namespaces = new XmlSerializerNamespaces();
                xmlWriter.WriteStartDocument();
                xmlWriter.WriteDocType("Field1", null, "someObject.dtd", null);
                namespaces.Add(string.Empty, string.Empty);
                serializer.Serialize(xmlWriter, someObject, namespaces);
            }
            output.Seek(0L, SeekOrigin.Begin);
            return output;
        }
        public class SomeObject
        {
            public string Field1 { get; set; }
            public int Field2 { get; set; }
        }
    }
}

结果如下:

ObjectToXmlViaStringBuilder

<?xml version="1.0" encoding="utf-16"?>
<!DOCTYPE Field1 SYSTEM "someObject.dtd">
<SomeObject>
<Field1>string value</Field1>
<Field2>8</Field2>
</SomeObject>

ObjectToXmlViaStream

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE Field1 SYSTEM "someObject.dtd">
<SomeObject>
<Field1>string value</Field1>
<Field2>8</Field2>
</SomeObject>

通过 StringBuilder 使用 XmlWriter 进行 XML 序列化是 utf-16,而通过 Stream

当您围绕TextWriter创建XmlWriter时,XmlWriter始终使用底层TextWriter的编码。StringWriter的编码始终是 UTF-16,因为这是 .NET 字符串在内部编码的方式。

当你在Stream周围创建XmlWriter时,没有为Stream定义编码,所以它使用XmlWriterSettings中指定的编码。

对我来说,最优雅的解决方案是写入内存流,然后使用编码将流编码为所需的任何编码。这样

        using (MemoryStream memS = new MemoryStream())
        {
            //set up the xml settings
            XmlWriterSettings settings = new XmlWriterSettings();
            settings.OmitXmlDeclaration = OmitXmlHeader;
            using (XmlWriter writer = XmlTextWriter.Create(memS, settings))
            {
                //write the XML to a stream
                xmlSerializer.Serialize(writer, objectToSerialize);
                writer.Close();
            }
            //encode the memory stream to xml
            retString.AppendFormat("{0}", encoding.GetString(memS.ToArray()));
            memS.Close();
        }

编码发生在....编码。GetString(memS.ToArray())...

在可能的情况下,XmlWriter 使用基础流的编码。如果它将 UTF-8 数据写入它知道是 UTF-16 的流,你最终会一团糟。将 UTF-16 数据写入 UTF-8 流也会导致问题,尤其是对于使用以空结尾的字符串(如 C/C++)的环境。

StringBuilder/StringWriter 向 XmlWriter 提供 UTF-16 流,因此 XmlWriter 会忽略您请求的设置并使用该设置。

在实践中,我

通常不会发出标头,这样我就可以在下面使用 StringBuilder 并节省几行代码,这些代码会弄乱切换编码。