随着时间的推移，对字典的XML阅读变得越来越慢

本文关键字：越来越 XML 时间字典 | 更新日期: 2023-09-27 18:27:16

myC#应用程序读取以下结构的XML文件。这个150兆字节的文件中大约有25万字。

<word>
     <name>kick</name>
     <id>485</id>
     <rels>12:4;4256:3;754:3;1452:2;86:2;125:2;</rels>
</word>

我想将XML文件读入字典。这些是我读书班的一些成员。

private XmlReader Reader;
public string CurrentWordName;
public int CurrentWordId;
public Dictionary<KeyValuePair<int, int>, int> CurrentRelations;

这是我读书课的主要方法。它只是从文件中读取下一个单词，并获得name、id，这些关系存储在Dictionary中。

CurrentWordId = -1;
CurrentWordName = "";
CurrentRelations = new Dictionary<KeyValuePair<int, int>, int>();
while(Reader.Read())
    if(Reader.NodeType == XmlNodeType.Element & Reader.Name == "word")
    {
        while (Reader.Read())
            if (Reader.NodeType == XmlNodeType.Element & Reader.Name == "name")
            {
                XElement Title = XElement.ReadFrom(Reader) as XElement;
                CurrentWordName = Title.Value;
                break;
            }
        while (Reader.Read())
            if (Reader.NodeType == XmlNodeType.Element & Reader.Name == "id")
            {
                XElement Identifier = XElement.ReadFrom(Reader) as XElement;
                CurrentWordId = Convert.ToInt32(Identifier.Value);
                break;
            }
        while(Reader.Read())
            if (Reader.NodeType == XmlNodeType.Element & Reader.Name == "rels")
            {
                XElement Text = XElement.ReadFrom(Reader) as XElement;
                string[] RelationStrings = Text.Value.Split(';');
                foreach (string RelationString in RelationStrings)
                {
                    string[] RelationsStringSplit = RelationString.Split(':');
                    if (RelationsStringSplit.Length == 2)
                        CurrentRelations.Add(new KeyValuePair<int,int>(CurrentWordId,Convert.ToInt32(RelationsStringSplit[0])), Convert.ToInt32(RelationsStringSplit[1]));
                }
                break;
            }
        break;
    }
if (CurrentRelations.Count < 1 || CurrentWordId == -1 || CurrentWordName == "")
     return false;
else
     return true;

我的Windows窗体有一个backgroundWorker，可以读取所有单词。

private void bgReader_DoWork(object sender, DoWorkEventArgs e)
{
    ReadXML Reader = new ReadXML(tBOpenFile.Text);
    Words = new Dictionary<int, string>();
    Dictionary<KeyValuePair<int, int>, int> ReadedRelations = new Dictionary<KeyValuePair<int, int>, int>();
    // reading
    while(Reader.ReadNextWord())
    {
        Words.Add(Reader.CurrentWordId, Reader.CurrentWordName);
        foreach (KeyValuePair<KeyValuePair<int, int>, int> CurrentRelation in Reader.CurrentRelations)
        {
            ReadedRelations.Add(new KeyValuePair<int, int>(CurrentRelation.Key.Key, CurrentRelation.Key.Value), CurrentRelation.Value);
        }
    }

通过调试，我注意到应用程序启动非常快，而且随着时间的推移会变慢。

前10000个单词用7秒
前20万字30分钟
前220000字35分钟

我无法解释这种行为！但我确信XML文件中的单词平均大小相同。Add()-方法可能会因字典长度而变慢。

如何加快应用程序的速度

随着时间的推移，对字典的XML阅读变得越来越慢

EDIT：好吧，既然我已经运行了代码，我相信这就是问题所在：

foreach (KeyValuePair<KeyValuePair<int, int>, int> CurrentRelation in 
         Reader.CurrentRelations)
{
    ReadedRelations.Add(new KeyValuePair<int, int>(CurrentRelation.Key.Key, 
        CurrentRelation.Key.Value), CurrentRelation.Value);
}

如果没有这个循环，它的工作速度快得多。。。这让我怀疑，您从XML中读取的内容实际上是在转移注意力。

我怀疑问题在于KeyValuePair<,>没有覆盖Equals和GetHashCode。我相信，如果您创建自己的RelationKey值类型，其中包含两个int值，并覆盖GetHashCode和Equals（并实现IEquatable<RelationKey>），速度会快得多。

或者，您可以始终使用long来存储两个int值——这有点麻烦，但它会很好地工作。我现在不能测试这个，但我有更多时间后会试试。

甚至只是将您的循环更改为：

foreach (var relation in Reader.CurrentRelations)
{
    ReadedRelations.Add(relation.Key, relation.Value);
}

会更简单，效率略高。。。

EDIT：这是一个RelationKey结构的示例。只需将所有出现的KeyValuePair<int, int>替换为RelationKey，并使用Source和Target属性而不是Key和Value:

public struct RelationKey : IEquatable<RelationKey>
{
    private readonly int source;
    private readonly int target;
    public int Source { get { return source; } }
    public int Target { get { return target; } }
    public RelationKey(int source, int target)
    {
        this.source = source;
        this.target = target;
    }
    public override bool Equals(object obj)
    {
        if (!(obj is RelationKey))
        {
            return false;
        }
        return Equals((RelationKey)obj);
    }
    public override int GetHashCode()
    {
        return source * 31 + target;
    }
    public bool Equals(RelationKey other)
    {
        return source == other.source && target == other.target;
    }
}