如何从 XML 数据中提取特定数据

本文关键字:数据 提取 XML | 更新日期: 2023-09-27 17:55:36

我使用以下代码片段来解析一些XML数据并将其转换为CSV。我可以转换整个XML数据并将其转储到文件中,但是我的要求发生了变化,现在我很困惑。

public void xmlToCSVfiltered(string p, int e)
        {                 
            string all_lines1 = File.ReadAllText(p);
            all_lines1 = "<Root>" + all_lines1 + "</Root>";
            XmlDocument doc_all = new XmlDocument();
            doc_all.LoadXml(all_lines1);
            StreamWriter write_all = new StreamWriter(FILENAME2);
            XmlNodeList rows_all = doc_all.GetElementsByTagName("XML");
            List<string[]> filtered = new List<string[]>();
            foreach (XmlNode rowtemp in rows_all)
            {
                List<string> children_all = new List<string>();
                foreach (XmlNode childtemp in rowtemp.ChildNodes)
                {
                    children_all.Add(Regex.Replace(childtemp.InnerText, "''s+", " "));     // <------- Fixed the Bug , Advisories dont span          
                }  
                string.Join(",", children_all.ToArray());
                //write_all.WriteLine(string.Join(",", children_all.ToArray()));
                if (children_all.Contains(e.toString()))
                {
                    filtered.Add(children_all.ToArray());
                    write_all.WriteLine(children_all);
                }
            }
            write_all.Flush();
            write_all.Close();
            foreach (var res in filtered)
            {
                Console.WriteLine(string.Join(",", res));
            }
        }

我的输入如下所示...我现在的目标是只转换这些"事件"并编译成具有一定数量的 CSV。例如,假设我只想将元素 <EVENT> 下的第二个数据值为 4627 的事件转换为 CSV。它只会转换这些事件,在下面的输入中,两者都在下面提到。

<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
        tham ALL out. For some reason 
        that is not the case
        please press the on button 
        when trying to activate
        device codes also available on
    list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML> 
<XML><HEADER>2.0,773162,20121009133435,3,</HEADER>20121004133435,761,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,18735166156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
        tham ALL out. For some reason 
        that is not the case
        please press the on button 
        when trying to activate
        device codes also available on
    list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML> 
.. goes on

到目前为止,我的方法是将所有内容转换为CSV并将其存储在某种数据结构中,然后逐行查询该数据结构并查看该数字是否存在,如果是,则逐行将其写入文件。我的函数将 XML 文件的路径和我们在 XML 数据中查找的数字作为参数。我是 C# 的新手,我不明白我将如何更改上面的函数。任何帮助将不胜感激!

编辑:

示例输入:

<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
    tham ALL out. For some reason 
    that is not the case
    please press the on button 
    when trying to activate
    device codes also available on
list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a- 
    <XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4623,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
        tham ALL out. For some reason 
        that is not the case
        please press the on button 
        when trying to activate
        device codes also available on
    list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a- 

所需输出:

1.0,770162,20121009133435,3,,20121009133435,721,5,1,0,0,0,00:00,00:00,,00032134 26064957,4627,1,,1872161156,7,0,10000,1,0,5000000,0,10000000,0,1 ,,Keep it simple or spell
    tham ALL out. For some reason 
    that is not the case
    please press the on button 
    when trying to activate
    device codes also available on
list,,,20121009133435,00-1d-71-0a-71-80,-66,,,0,50 

如果我打电话xmlToCSVfiltered(file, 4627);,以上就是这种情况另请注意,输出将是一条水平线,就像在 CSV 文件中一样,但我无法在此处真正格式化它以使其看起来像那样。

如何从 XML 数据中提取特定数据

我将XmlDocumnet更改为XDocument,以便我可以使用Xml Linq。 我还用于测试使用StringReader来读取字符串而不是从文件中读取。 您可以将代码转换回原始 File.ReadAlltext。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.IO;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME2 = @"c:'temp'test.txt";
        static void Main(string[] args)
        {
            string input = 
            "<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell'n" +
                    "tham ALL out. For some reason 'n" +
                    "that is not the case'n" +
                    "please press the on button'n" + 
                    "when trying to activate'n" +
                    "device codes also available on'n" +
                "list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML>'n" + 
            "<XML><HEADER>2.0,773162,20121009133435,3,</HEADER>20121004133435,761,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,18735166156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell'n" +
                    "tham ALL out. For some reason'n" + 
                    "that is not the case'n" +
                    "please press the on button'n" + 
                    "when trying to activate'n" +
                   "device codes also available on'n" +
                "list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML>'n";
            xmlToCSVfiltered(input, 4627); 
        }
        static public void xmlToCSVfiltered(string p, int e)
        {
            //string all_lines1 = File.ReadAllText(p);
            StringReader reader = new StringReader(p);
            string all_lines1 = reader.ReadToEnd();
            all_lines1 = "<Root>" + all_lines1 + "</Root>";
            XDocument doc_all = XDocument.Parse(all_lines1);
            StreamWriter write_all = new StreamWriter(FILENAME2);
            List<XElement> rows_all = doc_all.Descendants("XML").Where(x => x.Element("EVENT").Value.Split(new char[] {','}).Skip(1).Take(1).FirstOrDefault() == e.ToString()).ToList();
            List<string[]> filtered = new List<string[]>();
            foreach (XElement rowtemp in rows_all)
            {
                List<string> children_all = new List<string>();
                foreach (XElement childtemp in rowtemp.Elements())
                {
                    children_all.Add(Regex.Replace(childtemp.Value, "''s+", " "));     // <------- Fixed the Bug , Advisories dont span          
                }
                string.Join(",", children_all.ToArray());
                //write_all.WriteLine(string.Join(",", children_all.ToArray()));
                if (children_all.Contains(e.ToString()))
                {
                    filtered.Add(children_all.ToArray());
                    write_all.WriteLine(children_all);
                }
            }
            write_all.Flush();
            write_all.Close();
            foreach (var res in filtered)
            {
                Console.WriteLine(string.Join(",", res));
            }
        }
    }
}
​

我做了一些假设,因为我从这个问题中不清楚
假设

1.我假设您知道需要检查节点事件,并且需要从那里对元素进行第二定位。
2.您知道节点中值之间的分隔符。 例如。","在事件中

    public void xmlToCSVfiltered(string p, int e, string nodeName, char delimiter)
    {
        //get the xml node
        XDocument xml = XDocument.Load(p);
        //get the required node. I am assuming you would know. For eg. Event Node
        var requiredNode = xml.Descendants(nodeName);
        foreach (var node in requiredNode)
        {
            if (node == null)
                continue;
            //Also here, I am assuming you have the delimiter knowledge.
            var valueSplit = node.Value.Split(delimiter);
            foreach (var value in valueSplit)
            {
                if (value == e.ToString())
                {
                    AddToCSV();
                }
            }
        }
    }