为什么排序集<；T>；.GetViewBetween'；t O(对数N)

本文关键字：对数 gt 排序 lt 为什么 GetViewBetween | 更新日期: 2023-09-27 18:28:57

在.NET 4.0+中，类SortedSet<T>有一个名为GetViewBetween(l, r)的方法，该方法返回树部分上的接口视图，该视图包含两个指定值之间的所有值。假设SortedSet<T>被实现为红黑树，我自然希望它在O(log N)时间内运行。C++中类似的方法是std::set::lower_bound/upper_bound，Java中是TreeSet.headSet/tailSet，它们是对数的。

然而，事实并非如此。以下代码运行32秒，而GetViewBetween的等效O(log N)版本将使此代码运行1-2秒

var s = new SortedSet<int>();
int n = 100000;
var rand = new Random(1000000007);
int sum = 0;
for (int i = 0; i < n; ++i) {
    s.Add(rand.Next());
    if (rand.Next() % 2 == 0) {
        int l = rand.Next(int.MaxValue / 2 - 10);
        int r = l + rand.Next(int.MaxValue / 2 - 10);
        var t = s.GetViewBetween(l, r);
        sum += t.Min;
    }
}
Console.WriteLine(sum);

我使用dotPeek反编译了System.dll，结果如下：

public TreeSubSet(SortedSet<T> Underlying, T Min, T Max, bool lowerBoundActive, bool upperBoundActive)
    : base(Underlying.Comparer)
{
    this.underlying = Underlying;
    this.min = Min;
    this.max = Max;
    this.lBoundActive = lowerBoundActive;
    this.uBoundActive = upperBoundActive;
    this.root = this.underlying.FindRange(this.min, this.max, this.lBoundActive, this.uBoundActive);
    this.count = 0;
    this.version = -1;
    this.VersionCheckImpl();
}
internal SortedSet<T>.Node FindRange(T from, T to, bool lowerBoundActive, bool upperBoundActive)
{
  SortedSet<T>.Node node = this.root;
  while (node != null)
  {
    if (lowerBoundActive && this.comparer.Compare(from, node.Item) > 0)
    {
      node = node.Right;
    }
    else
    {
      if (!upperBoundActive || this.comparer.Compare(to, node.Item) >= 0)
        return node;
      node = node.Left;
    }
  }
  return (SortedSet<T>.Node) null;
}
private void VersionCheckImpl()
{
    if (this.version == this.underlying.version)
      return;
    this.root = this.underlying.FindRange(this.min, this.max, this.lBoundActive, this.uBoundActive);
    this.version = this.underlying.version;
    this.count = 0;
    base.InOrderTreeWalk((TreeWalkPredicate<T>) (n =>
    {
      SortedSet<T>.TreeSubSet temp_31 = this;
      int temp_34 = temp_31.count + 1;
      temp_31.count = temp_34;
      return true;
    }));
}

所以，FindRange显然是O(log N)，但之后我们称之为VersionCheckImpl。。。它对找到的子树进行线性时间遍历，只是为了重新计算它的节点！

为什么你需要一直进行遍历
为什么.NET不包含O(log N)方法来根据关键字（如C++或Java）拆分树？它在很多情况下都非常有用

为什么排序集<；T>；.GetViewBetween'；t O(对数N)

关于version字段

更新1：

在我的记忆中，BCL中的许多（也许所有？）集合都有字段version。

首先，关于`foreach`：

根据这个msdn链接

foreach语句为数组或对象集合中的每个元素重复一组嵌入语句。foreach语句用于遍历集合以获得所需信息，但不应用于更改集合的内容以避免不可预测的副作用。

在许多其他集合中，version受到保护—在foreach 期间不修改数据

例如，HashTable的MoveNext():

public virtual bool MoveNext()
{
    if (this.version != this.hashtable.version)
    {
        throw new InvalidOperationException(Environment.GetResourceString("InvalidOperation_EnumFailedVersion"));
    }
    //..........
}

但在SortedSet<T>的MoveNext()方法中：

public bool MoveNext()
{
    this.tree.VersionCheck();
    if (this.version != this.tree.version)
    {
        ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion);
    }       
    //....
}

更新2：

但O（N）循环可能不仅适用于version，而且适用于Count性质。

因为GetViewBetween的MSDN说：

此方法返回由比较器…定义的介于lowerValue和upperValue之间的元素范围的视图您可以在视图和底层的SortedSet（Of T）中进行更改。

因此，每次更新都应该同步count字段（键和值已经相同）。确保Count是正确的

有两种政策可以达到目标：

Microsoft
单声道

首先，MS在其代码中牺牲了GetViewBetween()的性能，赢得了Count属性的性能。

CCD_ 28是同步CCD_ 29属性的一种方式。

第二，单声道。在mono的代码中，GetViewBetween()更快，但在他们的GetCount()方法中：

internal override int GetCount ()
{
    int count = 0;
    using (var e = set.tree.GetSuffixEnumerator (lower)) {
        while (e.MoveNext () && set.helper.Compare (upper, e.Current) >= 0)
            ++count;
    }
    return count;
}

它总是一个O（N）运算！

以防其他像我这样的人在被问到这个问题10年后回来。https://github.com/dotnet/runtime/blob/fae7ee8e7e3aa7f86836318a10ed676641e813ad/src/libraries/System.Collections/src/System/Collections/Generic/SortedSet.TreeSubSet.cs#L38这是指向TreeSubSet实现的链接，并且似乎已经删除了对VersionCheckImpl（）的调用。

所以我想现在你可以做：

SortedSet<int> ss = new();
ss.Add(1);
ss.Add(2);
//ss.Add(3);
ss.Add(4);
ss.Add(5);
ss.Add(6);
var four = ss.GetViewBetween(3, ss.Max()).First();

在O（logn）中

为什么排序集<；T>；.GetViewBetween'；t O(对数N)

更新1：

首先，关于foreach：

更新2：

首先，关于`foreach`：