如何防止.net XML解析器在XML中展开参数实体?
本文关键字:XML 参数 实体 net 何防止 | 更新日期: 2023-09-27 17:52:34
当我尝试解析下面的xml(下面的代码)时,我一直得到<sgml>&question;&signature;</sgml>
展开为
<sgml>Why couldn’t I publish my books directly in standard SGML? — William Shakespeare.</sgml>
或
<sgml></sgml>
由于我正在研究XML 3-way合并算法,我想检索未扩展的<sgml>&question;&signature;</sgml>
I have try:
- 正常解析xml(这导致扩展sgml标签)
- 从xml的开头删除Doctype,这将导致空的sgml标签)
- 各种XmlReader DTD设置
我有以下XML文件:
<!DOCTYPE sgml [
<!ELEMENT sgml ANY>
<!ENTITY std "standard SGML">
<!ENTITY signature " — &author;.">
<!ENTITY question "Why couldn’t I publish my books directly in &std;?">
<!ENTITY author "William Shakespeare">
]>
<sgml>&question;&signature;</sgml>
下面是我尝试过的代码:
using System.IO;
using System.Xml;
using System.Xml.Linq;
using System.Reflection;
class Program
{
static void Main(string[] args)
{
string xml = @"C:'src'Apps'Wit'MergingAlgorithmTest'MergingAlgorithmTest'Tests'XMLMerge-DocTypeExpansion'DocTypeExpansion.0.xml";
var xmlSettingsIgnore = new XmlReaderSettings
{
CheckCharacters = false,
DtdProcessing = DtdProcessing.Ignore
};
var xmlSettingsParse = new XmlReaderSettings
{
CheckCharacters = false,
DtdProcessing = DtdProcessing.Parse
};
using (var fs = File.Open(xml, FileMode.Open, FileAccess.Read))
{
using (var xmkReaderIgnore = XmlReader.Create(fs, xmlSettingsIgnore))
{
// Prevents Exception "Reference to undeclared entity 'question'"
PropertyInfo propertyInfo = xmkReaderIgnore.GetType().GetProperty("DisableUndeclaredEntityCheck", BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
propertyInfo.SetValue(xmkReaderIgnore, true, null);
var doc = XDocument.Load(xmkReaderIgnore);
Console.WriteLine(doc.Root.ToString()); // outputs <sgml></sgml> not <sgml>&question;&signature;</sgml>
}// using xml ignore
fs.Position = 0;
using (var xmkReaderIgnore = XmlReader.Create(fs, xmlSettingsParse))
{
var doc = XDocument.Load(xmkReaderIgnore);
Console.WriteLine(doc.Root.ToString()); // outputs <sgml>Why couldn't I publish my books directly in standard SGML? - William Shakespeare.</sgml> not <sgml>&question;&signature;</sgml>
}
fs.Position = 0;
string parseXmlString = String.Empty;
using (StreamReader sr = new StreamReader(fs))
{
for (int i = 0; i < 7; ++i) // Skip DocType
sr.ReadLine();
parseXmlString = sr.ReadLine();
}
using (XmlReader xmlReaderSkip = XmlReader.Create(new StringReader(parseXmlString),xmlSettingsParse))
{
// Prevents Exception "Reference to undeclared entity 'question'"
PropertyInfo propertyInfo = xmlReaderSkip.GetType().GetProperty("DisableUndeclaredEntityCheck", BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
propertyInfo.SetValue(xmlReaderSkip, true, null);
var doc2 = XDocument.Load(xmlReaderSkip); // Empty sgml tag
}
}//using FileStream
}
}
Linq-to-XML不支持实体引用的建模——它们被自动扩展到它们的值(源1,源2)。根本没有为一般实体引用定义XObject
的子类。
然而,假设你的XML是有效的(即实体引用存在于DTD中,它们在你的例子中),你可以使用旧的XML文档对象模型来解析你的XML和插入XmlEntityReference
节点到你的XML DOM树,而不是扩展实体引用到纯文本:
using (var sr = new StreamReader(xml))
using (var xtr = new XmlTextReader(sr))
{
xtr.EntityHandling = EntityHandling.ExpandCharEntities; // Expands character entities and returns general entities as System.Xml.XmlNodeType.EntityReference
var oldDoc = new XmlDocument();
oldDoc.Load(xtr);
Debug.WriteLine(oldDoc.DocumentElement.OuterXml); // Outputs <sgml>&question;&signature;</sgml>
Debug.Assert(oldDoc.DocumentElement.OuterXml.Contains("&question;")); // Verify that the entity references are still there - no assert
Debug.Assert(oldDoc.DocumentElement.OuterXml.Contains("&signature;")); // Verify that the entity references are still there - no assert
}
每个XmlEntityReference
的ChildNodes
将具有通用实体的文本值。如果一个通用实体引用了其他通用实体,就像您的情况一样,相应的内部XmlEntityReference
将嵌套在外部的ChildNodes
中。然后可以使用旧的XmlDocument
API比较新旧XML。
注意您还需要使用旧的XmlTextReader
并设置EntityHandling = EntityHandling.ExpandCharEntities