XDocument文本节点换行
本文关键字:换行 节点 文本 XDocument | 更新日期: 2023-09-27 18:09:11
我试图从Linq XML命名空间使用XText
获得一个换行符到文本节点。
我有一个包含换行字符的字符串,但是我需要弄清楚如何将这些字符转换为实体字符(即
),而不仅仅是让它们作为新行出现在XML中。
XElement element = new XElement( "NodeName" );
...
string example = "This is a string'nWith new lines in it'n";
element.Add( new XText( example ) );
然后使用XmlTextWriter
将XElement
写出来,这导致文件包含换行符而不是实体替换。
编辑:
当我将XML加载到EXCEL中,它似乎不喜欢换行符,但接受实体替换时,问题就出现了。结果是,换行符不显示在EXCEL中,除非我用
作弊:
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.CheckCharacters = false;
settings.NewLineChars = " ";
XmlWriter writer = XmlWriter.Create(..., settings);
element.WriteTo(writer);
writer.Flush();
更新:完整的程序
using System;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
XElement element = new XElement( "NodeName" );
string example = "This is a string'nWith new lines in it'n";
element.Add( new XText( example ) );
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.CheckCharacters = false;
settings.NewLineChars = " ";
XmlWriter writer = XmlWriter.Create(Console.Out, settings);
element.WriteTo(writer);
writer.Flush();
}
}
}
输出:C:'Users'...''ConsoleApplication1'bin'Release>ConsoleApplication1.exe
<?xml version="1.0" encoding="ibm850"?> <NodeName>This is a string With new lines in it </NodeName>
对于任何标准的XML解析器来说,实体
和新的行字符之间没有区别,因为它们是同一件事。
为了说明这一点,下面的代码表明它们是相同的:
string s1 = "<root>Test Test2</root>";
string s2 = "<root>Test'nTest2</root>";
XDocument doc1 = XDocument.Parse(s1);
XDocument doc2 = XDocument.Parse(s2);
Console.WriteLine(doc1.ToString());
Console.WriteLine(doc2.ToString());
是负责输出转义实体的xmltextwwriter。如果你这样做,例如:
using (XmlTextWriter w = new XmlTextWriter("test.xml", Encoding.UTf8))
{
w.WriteString("");
}
您还将在text.xml &#x10
中得到一个转义的&符号输出,这是您不希望看到的。您希望保持
序列的原始状态。
我建议的解决方案是创建一个新的StreamWriter实现,能够检测像"&#x10;
"这样的转义字符串:
// A StreamWriter that does not escape characters
public class NonXmlEscapingStreamWriter : StreamWriter
{
private const string AmpToken = "amp";
private int _bufferState = 0; // used to keep state
// add other ctors overloads if needed
public NonXmlEscapingStreamWriter(string path)
: base(path)
{
}
// NOTE this code is based on the assumption that StreamWriter
// only overrides these 4 Write functions, which is true today but could change in the future
// and also on the assumption that the XmlTextWrite writes escaped values in a specific WriteXX calls sequence
public override void Write(char value)
{
if (value == '&')
{
if (_bufferState == 0)
{
_bufferState++;
return; // hold it
}
else
{
_bufferState = 0;
}
}
else if (value == ';')
{
if (_bufferState > 1)
{
_bufferState++;
return;
}
else
{
Write('&'); // release what's been held
Write(AmpToken);
_bufferState = 0;
}
}
else if (value == ''n') // detect non escaped 'n
{
base.Write(" ");
return;
}
base.Write(value);
}
public override void Write(string value)
{
if (_bufferState > 0)
{
if (value == AmpToken)
{
_bufferState++;
return; // hold it
}
else
{
Write('&'); // release what's been held
_bufferState = 0;
}
}
base.Write(value);
}
public override void Write(char[] buffer, int index, int count)
{
if (_bufferState > 2)
{
_bufferState = 0;
base.Write('&'); // release this anyway
string replace;
if ((buffer != null) && ((replace = GetReplaceLength(buffer, index, count)) != null))
{
base.Write(replace);
base.Write(buffer, index + replace.Length, count - replace.Length);
return;
}
else
{
base.Write(AmpToken); // release this
base.Write(';'); // release this
}
}
base.Write(buffer, index, count);
}
public override void Write(char[] buffer)
{
Write(buffer, 0, buffer != null ? buffer.Length : 0);
}
private string GetReplaceLength(char[] buffer, int index, int count)
{
// this is specific to the 10 character but could be adapted
const string token = "#10;";
if ((index + count) < token.Length)
return null;
// we test the char array to avoid string allocations
for(int i = 0; i < token.Length; i++)
{
if (buffer[index + i] != token[i])
return null;
}
return token;
}
}
你可以这样使用:
using (XmlTextWriter w = new XmlTextWriter(new NonXmlEscapingStreamWriter("test.xml")))
{
element.WriteTo(w);
}
注意:虽然它能够检测孤独的'n序列,但我建议您确保所有'n
实际上都在原始文本中转义,因此,在实际输出xml之前,您需要用
替换'n
,如下所示:
string example = "This is a stringWith new lines in it";