XDocument文本节点换行

本文关键字:换行 节点 文本 XDocument | 更新日期: 2023-09-27 18:09:11

我试图从Linq XML命名空间使用XText获得一个换行符到文本节点。

我有一个包含换行字符的字符串,但是我需要弄清楚如何将这些字符转换为实体字符(即
),而不仅仅是让它们作为新行出现在XML中。

XElement element = new XElement( "NodeName" );
...
string example = "This is a string'nWith new lines in it'n";
element.Add( new XText( example ) );

然后使用XmlTextWriterXElement写出来,这导致文件包含换行符而不是实体替换。

有人遇到这个问题并找到解决方案吗?
编辑:

当我将XML加载到EXCEL中,它似乎不喜欢换行符,但接受实体替换时,问题就出现了。结果是,换行符不显示在EXCEL中,除非我用


替换它们尼克。

XDocument文本节点换行

作弊:

        XmlWriterSettings settings = new XmlWriterSettings();
        settings.Indent = true;
        settings.CheckCharacters = false;
        settings.NewLineChars = "
";
        XmlWriter writer = XmlWriter.Create(..., settings);
        element.WriteTo(writer);
        writer.Flush();
更新:

完整的程序

using System;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication1
{
class Program
{
    static void Main(string[] args)
    {
        XElement element = new XElement( "NodeName" );
        string example = "This is a string'nWith new lines in it'n";
        element.Add( new XText( example ) );
        XmlWriterSettings settings = new XmlWriterSettings();
        settings.Indent = true;
        settings.CheckCharacters = false;
        settings.NewLineChars = "
";
        XmlWriter writer = XmlWriter.Create(Console.Out, settings);
        element.WriteTo(writer);
        writer.Flush();
    }
}
}
输出:

C:'Users'...''ConsoleApplication1'bin'Release>ConsoleApplication1.exe
<?xml version="1.0" encoding="ibm850"?>&#10;<NodeName>This is a string&#10;With new lines in it&#10;</NodeName>

对于任何标准的XML解析器来说,实体&#10;和新的行字符之间没有区别,因为它们是同一件事。

为了说明这一点,下面的代码表明它们是相同的:

string s1 = "<root>Test&#10;Test2</root>";
string s2 = "<root>Test'nTest2</root>";
XDocument doc1 = XDocument.Parse(s1);
XDocument doc2 = XDocument.Parse(s2);
Console.WriteLine(doc1.ToString());
Console.WriteLine(doc2.ToString());

是负责输出转义实体的xmltextwwriter。如果你这样做,例如:

        using (XmlTextWriter w = new XmlTextWriter("test.xml", Encoding.UTf8))
        {
            w.WriteString("&#x10;");
        }

您还将在text.xml &amp;#x10中得到一个转义的&符号输出,这是您不希望看到的。您希望保持&#x10;序列的原始状态。

我建议的解决方案是创建一个新的StreamWriter实现,能够检测像"&amp;#x10;"这样的转义字符串:

    // A StreamWriter that does not escape &#10; characters
    public class NonXmlEscapingStreamWriter : StreamWriter
    {
        private const string AmpToken = "amp";
        private int _bufferState = 0; // used to keep state
        // add other ctors overloads if needed
        public NonXmlEscapingStreamWriter(string path)
            : base(path)
        {
        }
        // NOTE this code is based on the assumption that StreamWriter
        // only overrides these 4 Write functions, which is true today but could change in the future
        // and also on the assumption that the XmlTextWrite writes escaped values in a specific WriteXX calls sequence
        public override void Write(char value)
        {
            if (value == '&')
            {
                if (_bufferState == 0)
                {
                    _bufferState++;
                    return; // hold it
                }
                else
                {
                    _bufferState = 0;
                }
            }
            else if (value == ';')
            {
                if (_bufferState > 1)
                {
                    _bufferState++;
                    return;
                }
                else
                {
                    Write('&'); // release what's been held
                    Write(AmpToken);
                    _bufferState = 0;
                }
            }
            else if (value == ''n') // detect non escaped 'n
            {
                base.Write("&#10;");
                return;
            }
            base.Write(value);
        }
        public override void Write(string value)
        {
            if (_bufferState > 0)
            {
                if (value == AmpToken)
                {
                    _bufferState++;
                    return; // hold it
                }
                else
                {
                    Write('&'); // release what's been held
                    _bufferState = 0;
                }
            }
            base.Write(value);
        }
        public override void Write(char[] buffer, int index, int count)
        {
            if (_bufferState > 2)
            {
                _bufferState = 0;
                base.Write('&'); // release this anyway
                string replace;
                if ((buffer != null) && ((replace = GetReplaceLength(buffer, index, count)) != null))
                {
                    base.Write(replace);
                    base.Write(buffer, index + replace.Length, count - replace.Length);
                    return;
                }
                else
                {
                    base.Write(AmpToken); // release this
                    base.Write(';'); // release this
                }
            }
            base.Write(buffer, index, count);
        }
        public override void Write(char[] buffer)
        {
            Write(buffer, 0, buffer != null ? buffer.Length : 0);
        }
        private string GetReplaceLength(char[] buffer, int index, int count)
        {
            // this is specific to the 10 character but could be adapted
            const string token = "#10;";
            if ((index + count) < token.Length)
                return null;
            // we test the char array to avoid string allocations
            for(int i = 0; i < token.Length; i++)
            {
                if (buffer[index + i] != token[i])
                    return null;
            }
            return token;
        }
    }

你可以这样使用:

    using (XmlTextWriter w = new XmlTextWriter(new NonXmlEscapingStreamWriter("test.xml")))
    {
        element.WriteTo(w);
    }
注意:虽然它能够检测孤独的'n序列,但我建议您确保所有'n实际上都在原始文本中转义,因此,在实际输出xml之前,您需要用&#x10;替换'n,如下所示:
string example = "This is a string&#x10;With new lines in it&#x10;";