数据结构具有独特的元素和快速添加和删除

本文关键字:添加 删除 元素 数据结构 | 更新日期: 2023-09-27 18:07:14

我需要一个具有以下属性的数据结构:

  • 结构的每个元素必须是唯一的。
  • Add:在数据结构中添加一个元素,除非该元素已经存在存在。
  • Pop:从数据结构中删除一个元素并返回该元素移除。删除哪个元素并不重要。

该结构不需要其他操作。使用列表的简单实现将需要O(1)时间执行Pop, O(N)时间执行Add(因为必须检查整个列表以确保)唯一性)。我目前正在使用红黑树来满足这个数据结构的需求,但是我想知道我是否可以使用一些不那么复杂的东西来实现几乎相同的性能。

我更喜欢c#的答案,但是Java、Javascript和c++也是可以接受的。

我的问题类似于这个问题,但是我不需要查找或删除最大值或最小值(或任何特定类型的值),所以我希望在这方面会有所改进。但是,如果问题中的任何结构在这里是合适的,请告诉我。

那么,什么数据结构只允许唯一元素,支持快速添加和删除,并且比红黑树更简单?

数据结构具有独特的元素和快速添加和删除

内置的HashSet<T>呢?

它只包含唯一的元素。Remove (pop)为0 (1),Add为0(1),除非内部数组必须调整大小

正如Meta-Knight所说,HashSet是最快的数据结构。查找和删除需要恒定的O(1)时间(除非在极少数情况下,当您的哈希很糟糕,然后需要多次重新哈希或使用bucket哈希集)。对哈希集的所有操作都需要O(1)时间,唯一的缺点是它需要更多的内存,因为哈希被用作数组(或其他分配的内存块)的索引。所以除非你对内存要求很严格,否则就用HashSet吧。我只是在解释为什么你应该采用这种方法,你应该接受元骑士的答案,因为他是第一个。

使用散列是可以的,因为通常您会覆盖HashCode()和Equals()函数。HashSet在内部所做的是生成哈希值,然后在它相等的情况下检查是否相等(只是在哈希冲突的情况下)。如果它们不是,它必须调用一个方法来做一些叫做rehashing的事情,它生成一个新的哈希,通常是在原始哈希的奇数素数偏移(不确定是否。net这样做,但其他语言做),并在必要时重复这个过程。

从散列集或字典中删除随机元素非常容易。所有的都是O(1)的平均值,在现实世界中就是O(1)例子:

public class MyNode
{
    ...
}
public class MyDataStructure
{
    private HashSet<MyNode> nodes = new HashSet<MyNode>();
    /// <summary>
    /// Inserts an element to this data structure. 
    /// If the element already exists, returns false.
    /// Complexity is averaged O(1).
    /// </summary>
    public bool Add(MyNode node)
    {
        return node != null && this.nodes.Add(node);
    }
    /// <summary>
    /// Removes a random element from the data structure.
    /// Returns the element if an element was found.
    /// Returns null if the data structure is empty.
    /// Complexity is averaged O(1).
    /// </summary>
    public MyNode Pop()
    {
        // This loop can execute 1 or 0 times.
        foreach (MyNode node in nodes)
        {
            this.nodes.Remove(node);
            return node;
        }
        return null;
    }
}

根据我的经验,几乎所有可以比较的东西都可以散列:)。我想知道是否有人知道不能散列的东西。

根据我的经验,这也适用于一些使用特殊技术的带有公差的浮点数比较。

哈希表的哈希函数不需要是完美的,它只需要足够好。如果你的数据非常复杂通常哈希函数没有红黑树或avl树那么复杂。它们是有用的,因为它们使事情有序,但你不需要这个。

为了演示如何创建一个简单的哈希集,我将考虑一个具有整数键的简单字典。这个实现非常快,对于稀疏数组来说非常好。我没有编写增长桶表的代码,因为这很烦人,而且通常是大bug的来源,但由于这是一个概念证明,它应该足够了。我也没写iterator

我是从头开始写的,可能有bug

public class FixedIntDictionary<T>
{
    // Our internal node structure.
    // We use structs instead of objects to not add pressure to the garbage collector.
    // We mantains our own way to manage garbage through the use of a free list.
    private struct Entry
    {
        // The key of the node
        internal int Key;
        // Next index in pEntries array.
        // This field is both used in the free list, if node was removed
        // or in the table, if node was inserted.
        // -1 means null.
        internal int Next;
        // The value of the node.
        internal T Value;
    }
    // The actual hash table. Contains indices to pEntries array.
    // The hash table can be seen as an array of singlt linked list.
    // We store indices to pEntries array instead of objects for performance
    // and to avoid pressure to the garbage collector.
    // An index -1 means null.
    private int[] pBuckets;
    // This array contains the memory for the nodes of the dictionary.
    private Entry[] pEntries;
    // This is the first node of a singly linked list of free nodes.
    // This data structure is called the FreeList and we use it to
    // reuse removed nodes instead of allocating new ones.
    private int pFirstFreeEntry;
    // Contains simply the number of items in this dictionary.
    private int pCount;
    // Contains the number of used entries (both in the dictionary or in the free list) in pEntries array.
    // This field is going only to grow with insertions.
    private int pEntriesCount;
    ///<summary>
    /// Creates a new FixedIntDictionary. 
    /// tableBucketsCount should be a prime number
    /// greater than the number of items that this
    /// dictionary should store.
    /// The performance of this hash table will be very bad
    /// if you don't follow this rule!
    /// </summary>
    public FixedIntDictionary<T>(int tableBucketsCount)
    {
        // Our free list is initially empty.
        this.pFirstFreeEntry = -1;
        // Initializes the entries array with a minimal amount of items.
        this.pEntries = new Entry[8];
        // Allocate buckets and initialize every linked list as empty.
        int[] buckets = new int[capacity];
        for (int i = 0; i < buckets.Length; ++i)
            buckets[i] = -1;
        this.pBuckets = buckets;
    }
    ///<summary>Gets the number of items in this dictionary. Complexity is O(1).</summary>
    public int Count
    {
        get { return this.pCount; }
    }
    ///<summary>
    /// Adds a key value pair to the dictionary.
    /// Complexity is averaged O(1).
    /// Returns false if the key already exists.
    /// </summary>
    public bool Add(int key, T value)
    {
        // The hash table can be seen as an array of linked list.
        // We find the right linked list using hash codes.
        // Since the hash code of an integer is the integer itself, we have a perfect hash.
        // After we get the hash code we need to remove the sign of it.
        // To do that in a fast way we and it with 0x7FFFFFFF, that means, we remove the sign bit.
        // Then we have to do the modulus of the found hash code with the size of our buckets array.
        // For this reason the size of our bucket array should be a prime number,
        // this because the more big is the prime number, the less is the chance to find an
        // hash code that is divisible for that number. This reduces collisions.
        // This implementation will not grow the buckets table when needed, this is the major
        // problem with this implementation.
        // Growing requires a little more code that i don't want to write now
        // (we need a function that finds prime numbers, and it should be fast and we
        // need to rehash everything using the new buckets array).
        int bucketIndex = (key & 0x7FFFFFFF) % this.pBuckets.Length;
        int bucket = this.pBuckets[bucketIndex];
        // Now we iterate in the linked list of nodes.
        // Since this is an hash table we hope these lists are very small.
        // If the number of buckets is good and the hash function is good this will translate usually 
        // in a O(1) operation.
        Entry[] entries = this.pEntries;
        for (int current = entries[bucket]; current != -1; current = entries[current].Next)
        {
            if (entries[current].Key == key)
            {
                // Entry already exists.
                return false;
            }
        }
        // Ok, key not found, we can add the new key and value pair.
        int entry = this.pFirstFreeEntry;
        if (entry != -1)
        {
            // We found a deleted node in the free list.
            // We can use that node without "allocating" another one.
            this.pFirstFreeEntry = entries[entry].Next;
        }
        else
        {
            // Mhhh ok, the free list is empty, we need to allocate a new node.
            // First we try to use an unused node from the array.
            entry = this.pEntriesCount++;
            if (entry >= this.pEntries)
            {
                // Mhhh ok, the entries array is full, we need to make it bigger.
                // Here should go also the code for growing the bucket table, but i'm not writing it here.
                Array.Resize(ref this.pEntries, this.pEntriesCount * 2);
                entries = this.pEntries;
            }
        }
        // Ok now we can add our item.
        // We just overwrite key and value in the struct stored in entries array.
        entries[entry].Key = key;
        entries[entry].Value = value;
        // Now we add the entry in the right linked list of the table.
        entries[entry].Next = this.pBuckets[bucketIndex];
        this.pBuckets[bucketIndex] = entry;
        // Increments total number of items.
        ++this.pCount;
        return true;
    }
    /// <summary>
    /// Gets a value that indicates wether the specified key exists or not in this table.
    /// Complexity is averaged O(1).
    /// </summary>
    public bool Contains(int key)
    {
        // This translate in a simple linear search in the linked list for the right bucket.
        // The operation, if array size is well balanced and hash function is good, will be almost O(1).
        int bucket = this.pBuckets[(key & 0x7FFFFFFF) % this.pBuckets.Length];
        Entry[] entries = this.pEntries;
        for (int current = entries[bucket]; current != -1; current = entries[current].Next)
        {
            if (entries[current].Key == key)
            {
                return true;
            }
        }
        return false;
    }
    /// <summary>
    /// Removes the specified item from the dictionary.
    /// Returns true if item was found and removed, false if item doesn't exists.
    /// Complexity is averaged O(1).
    /// </summary>
    public bool Remove(int key)
    {
        // Removal translate in a simple contains and removal from a singly linked list.
        // Quite simple.
        int bucketIndex = (key & 0x7FFFFFFF) % this.pBuckets.Length;
        int bucket = this.pBuckets[bucketIndex];
        Entry[] entries = this.pEntries;
        int next;
        int prev = -1;
        int current = entries[bucket];
        while (current != -1)
        {
            next = entries[current].Next;
            if (entries[current].Key == key)
            {
                // Found! Remove from linked list.
                if (prev != -1)
                    entries[prev].Next = next;
                else
                    this.pBuckets[bucketIndex] = next;
                // We now add the removed node to the free list,
                // so we can use it later if we add new elements.
                entries[current].Next = this.pFirstFreeEntry;
                this.pFirstFreeEntry = current;
                // Decrements total number of items.
                --this.pCount;
                return true;
            }
            prev = current;
            current = next;
        }
        return false;
    }
}

如果你不知道这个实现是好是坏,这是一个非常类似于。net框架对Dictionary类的实现:)

要使它成为哈希集,只需去掉T,你就得到了一个整数哈希集。如果你需要获得通用对象的哈希码,只需使用x.GetHashCode或提供你的哈希码函数。

要编写迭代器,你需要修改一些东西,但不想在这篇文章中添加太多其他东西:)