如何防止.net XML解析器在XML中展开参数实体?

本文关键字:XML 参数 实体 net 何防止 | 更新日期: 2023-09-27 17:52:34

当我尝试解析下面的xml(下面的代码)时,我一直得到<sgml>&question;&signature;</sgml>

展开为

<sgml>Why couldn’t I publish my books directly in standard SGML? — William Shakespeare.</sgml>

<sgml></sgml>

由于我正在研究XML 3-way合并算法,我想检索未扩展的<sgml>&question;&signature;</sgml>

I have try:

  • 正常解析xml(这导致扩展sgml标签)
  • 从xml的开头删除Doctype,这将导致空的sgml标签)
  • 各种XmlReader DTD设置

我有以下XML文件:

<!DOCTYPE sgml [
  <!ELEMENT sgml ANY>
  <!ENTITY  std       "standard SGML">
  <!ENTITY  signature " &#x2014; &author;.">
  <!ENTITY  question  "Why couldn&#x2019;t I publish my books directly in &std;?">
  <!ENTITY  author    "William Shakespeare">
]>
<sgml>&question;&signature;</sgml>

下面是我尝试过的代码:

using System.IO;
using System.Xml;
using System.Xml.Linq;
using System.Reflection;
class Program
{
    static void Main(string[] args)
    {
        string xml = @"C:'src'Apps'Wit'MergingAlgorithmTest'MergingAlgorithmTest'Tests'XMLMerge-DocTypeExpansion'DocTypeExpansion.0.xml";
        var xmlSettingsIgnore = new XmlReaderSettings 
            {
                CheckCharacters = false,
                DtdProcessing = DtdProcessing.Ignore
            };
        var xmlSettingsParse = new XmlReaderSettings
        {
            CheckCharacters = false,
            DtdProcessing = DtdProcessing.Parse
        };
        using (var fs = File.Open(xml, FileMode.Open, FileAccess.Read))
        {
            using (var xmkReaderIgnore = XmlReader.Create(fs, xmlSettingsIgnore))
            {
                // Prevents Exception "Reference to undeclared entity 'question'"
                PropertyInfo propertyInfo = xmkReaderIgnore.GetType().GetProperty("DisableUndeclaredEntityCheck", BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
                propertyInfo.SetValue(xmkReaderIgnore, true, null);
                var doc = XDocument.Load(xmkReaderIgnore);
                Console.WriteLine(doc.Root.ToString()); // outputs <sgml></sgml> not <sgml>&question;&signature;</sgml>
            }// using xml ignore
            fs.Position = 0;
            using (var xmkReaderIgnore = XmlReader.Create(fs, xmlSettingsParse))
            {
                var doc = XDocument.Load(xmkReaderIgnore);
                Console.WriteLine(doc.Root.ToString()); // outputs <sgml>Why couldn't I publish my books directly in standard SGML? - William Shakespeare.</sgml> not <sgml>&question;&signature;</sgml>
            }
            fs.Position = 0;
            string parseXmlString = String.Empty;
            using (StreamReader sr = new StreamReader(fs))
            {
                for (int i = 0; i < 7; ++i) // Skip DocType
                    sr.ReadLine();
                parseXmlString = sr.ReadLine();
            }
            using (XmlReader xmlReaderSkip = XmlReader.Create(new StringReader(parseXmlString),xmlSettingsParse))
            {
                // Prevents Exception "Reference to undeclared entity 'question'"
                PropertyInfo propertyInfo = xmlReaderSkip.GetType().GetProperty("DisableUndeclaredEntityCheck", BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
                propertyInfo.SetValue(xmlReaderSkip, true, null);
                var doc2 = XDocument.Load(xmlReaderSkip); // Empty sgml tag
            }
        }//using FileStream
    }
}

如何防止.net XML解析器在XML中展开参数实体?

Linq-to-XML不支持实体引用的建模——它们被自动扩展到它们的值(源1,源2)。根本没有为一般实体引用定义XObject的子类。

然而,假设你的XML是有效的(即实体引用存在于DTD中,它们在你的例子中),你可以使用旧的XML文档对象模型来解析你的XML和插入XmlEntityReference节点到你的XML DOM树,而不是扩展实体引用到纯文本:

        using (var sr = new StreamReader(xml))
        using (var xtr = new XmlTextReader(sr))
        {
            xtr.EntityHandling = EntityHandling.ExpandCharEntities; // Expands character entities and returns general entities as System.Xml.XmlNodeType.EntityReference
            var oldDoc = new XmlDocument();
            oldDoc.Load(xtr);
            Debug.WriteLine(oldDoc.DocumentElement.OuterXml); // Outputs <sgml>&question;&signature;</sgml>
            Debug.Assert(oldDoc.DocumentElement.OuterXml.Contains("&question;")); // Verify that the entity references are still there - no assert
            Debug.Assert(oldDoc.DocumentElement.OuterXml.Contains("&signature;")); // Verify that the entity references are still there - no assert
        }

每个XmlEntityReferenceChildNodes将具有通用实体的文本值。如果一个通用实体引用了其他通用实体,就像您的情况一样,相应的内部XmlEntityReference将嵌套在外部的ChildNodes中。然后可以使用旧的XmlDocument API比较新旧XML。

注意您还需要使用旧的XmlTextReader并设置EntityHandling = EntityHandling.ExpandCharEntities