函数性能和继承

本文关键字:继承 性能 函数 | 更新日期: 2023-09-27 18:06:25

在使用继承和泛型时,我一直难以理解在整个代码中使用Func<...>的性能特征-这是我发现自己一直在使用的组合。

让我从一个最小的测试用例开始,这样我们都知道我们在谈论什么,然后我将发布结果,然后我将解释我期望什么以及为什么…

最小测试用例

public class GenericsTest2 : GenericsTest<int> 
{
    static void Main(string[] args)
    {
        GenericsTest2 at = new GenericsTest2();
        at.test(at.func);
        at.test(at.Check);
        at.test(at.func2);
        at.test(at.Check2);
        at.test((a) => a.Equals(default(int)));
        Console.ReadLine();
    }
    public GenericsTest2()
    {
        func = func2 = (a) => Check(a);
    }
    protected Func<int, bool> func2;
    public bool Check2(int value)
    {
        return value.Equals(default(int));
    }
    public void test(Func<int, bool> func)
    {
        using (Stopwatch sw = new Stopwatch((ts) => { Console.WriteLine("Took {0:0.00}s", ts.TotalSeconds); }))
        {
            for (int i = 0; i < 100000000; ++i)
            {
                func(i);
            }
        }
    }
}
public class GenericsTest<T>
{
    public bool Check(T value)
    {
        return value.Equals(default(T));
    }
    protected Func<T, bool> func;
}
public class Stopwatch : IDisposable
{
    public Stopwatch(Action<TimeSpan> act)
    {
        this.act = act;
        this.start = DateTime.UtcNow;
    }
    private Action<TimeSpan> act;
    private DateTime start;
    public void Dispose()
    {
        act(DateTime.UtcNow.Subtract(start));
    }
}
结果

Took 2.50s  -> at.test(at.func);
Took 1.97s  -> at.test(at.Check);
Took 2.48s  -> at.test(at.func2);
Took 0.72s  -> at.test(at.Check2);
Took 0.81s  -> at.test((a) => a.Equals(default(int)));

我期望的和为什么

我希望这段代码对所有5种方法都能以完全相同的速度运行,更准确地说,甚至比这任何一种方法都快,也就是说,就像:

using (Stopwatch sw = new Stopwatch((ts) => { Console.WriteLine("Took {0:0.00}s", ts.TotalSeconds); }))
{
    for (int i = 0; i < 100000000; ++i)
    {
        bool b = i.Equals(default(int));
    }
}
// this takes 0.32s ?!?

我预计它将花费0.32秒,因为我看不出JIT编译器在这种特殊情况下不内联代码的任何理由。

仔细一看,我根本不明白这些性能数字:

  • at.func被传递给函数,在执行过程中不能更改。为什么这不是内联的?
  • at.Check的速度明显快于at.Check2,且两者都不能被覆盖。检查类GenericsTest2是否像岩石一样固定
  • 我认为Func<int, bool>在传递内联Func而不是转换为Func的方法时没有理由变慢
  • 为什么测试用例2和3之间的差异是惊人的0.5秒,而用例4和5之间的差异是0.1秒——它们不应该是一样的吗?

我真的很想了解这个…为什么使用泛型基类要比内联整个类慢10倍?所以,基本上问题是:为什么会发生这种情况,我该如何解决它?

基于目前所有的评论(谢谢!)我又做了些调查。

首先,当重复测试并将循环扩大5倍并执行它们4次时,会产生一组新的结果。我已经使用了诊断秒表,并添加了更多的测试(也添加了描述)。

(Baseline implementation took 2.61s)
--- Run 0 ---
Took 3.00s for (a) => at.Check2(a)
Took 12.04s for Check3<int>
Took 12.51s for (a) => GenericsTest2.Check(a)
Took 13.74s for at.func
Took 16.07s for GenericsTest2.Check
Took 12.99s for at.func2
Took 1.47s for at.Check2
Took 2.31s for (a) => a.Equals(default(int))
--- Run 1 ---
Took 3.18s for (a) => at.Check2(a)
Took 13.29s for Check3<int>
Took 14.10s for (a) => GenericsTest2.Check(a)
Took 13.54s for at.func
Took 13.48s for GenericsTest2.Check
Took 13.89s for at.func2
Took 1.94s for at.Check2
Took 2.61s for (a) => a.Equals(default(int))
--- Run 2 ---
Took 3.18s for (a) => at.Check2(a)
Took 12.91s for Check3<int>
Took 15.20s for (a) => GenericsTest2.Check(a)
Took 12.90s for at.func
Took 13.79s for GenericsTest2.Check
Took 14.52s for at.func2
Took 2.02s for at.Check2
Took 2.67s for (a) => a.Equals(default(int))
--- Run 3 ---
Took 3.17s for (a) => at.Check2(a)
Took 12.69s for Check3<int>
Took 13.58s for (a) => GenericsTest2.Check(a)
Took 14.27s for at.func
Took 12.82s for GenericsTest2.Check
Took 14.03s for at.func2
Took 1.32s for at.Check2
Took 1.70s for (a) => a.Equals(default(int))

我从这些结果中注意到,当您开始使用泛型时,它会变得慢得多。深入了解非泛型实现的IL:

L_0000: ldarga.s 'value'
L_0002: ldc.i4.0 
L_0003: call instance bool [mscorlib]System.Int32::Equals(int32)
L_0008: ret 

对于所有泛型实现:

L_0000: ldarga.s 'value'
L_0002: ldloca.s CS$0$0000
L_0004: initobj !T
L_000a: ldloc.0 
L_000b: box !T
L_0010: constrained. !T
L_0016: callvirt instance bool [mscorlib]System.Object::Equals(object)
L_001b: ret 

虽然大部分可以优化,但我认为callvirt在这里可能是个问题。

为了使它更快,我在方法的定义中添加了'T: IEquatable'约束。结果是:

L_0011: callvirt instance bool [mscorlib]System.IEquatable`1<!T>::Equals(!0)

虽然我现在对性能有了更多的了解(它可能不能内联,因为它创建了一个虚值表查找),但我仍然很困惑:为什么它不简单地调用T::Equals?毕竟,我确实指定它会在那里…

函数<T>性能和继承

总是运行3次微基准测试。第一种方法将触发JIT并排除这种情况。检查第二轮和第三轮是否相等。这给了:

... run ...
Took 0.79s
Took 0.63s
Took 0.74s
Took 0.24s
Took 0.32s
... run ...
Took 0.73s
Took 0.63s
Took 0.73s
Took 0.24s
Took 0.33s
... run ...
Took 0.74s
Took 0.63s
Took 0.74s
Took 0.25s
Took 0.33s

func = func2 = (a) => Check(a);

添加了一个额外的函数调用。删除

func = func2 = this.Check;

给:

... 1. run ...
Took 0.64s
Took 0.63s
Took 0.63s
Took 0.24s
Took 0.32s
... 2. run ...
Took 0.63s
Took 0.63s
Took 0.63s
Took 0.24s
Took 0.32s
... 3. run ...
Took 0.63s
Took 0.63s
Took 0.63s
Took 0.24s
Took 0.32s

这显示了(JIT?)在1。和2。由于删除了函数调用,Run消失了。前3个测试现在相等

在测试4和5中,编译器可以将函数参数内联到void test(Func<>),而在测试1到3中,编译器要花很长时间才能发现它们是常量。有时候编译器有一些约束,从我们编码器的角度来看是不容易看到的,比如。net和Jit约束来自于。net程序的动态特性,而不是由c++生成的二进制。无论如何,是函数arg的内联造成了这里的差异。

4和5的区别?test5看起来编译器也可以很容易地内联这个函数。也许他为闭包构建了一个上下文,并将其解决得比需要的更复杂一些。

用。net 4.5测试上面的内容。这里用3.5演示了编译器在使用内联时会变得更好:

... 1. run ...
Took 1.06s
Took 1.06s
Took 1.06s
Took 0.24s
Took 0.27s
... 2. run ...
Took 1.06s
Took 1.08s
Took 1.06s
Took 0.25s
Took 0.27s
... 3. run ...
Took 1.05s
Took 1.06s
Took 1.05s
Took 0.24s
Took 0.27s

和。net 4:

... 1. run ...
Took 0.97s
Took 0.97s
Took 0.96s
Took 0.22s
Took 0.30s
... 2. run ...
Took 0.96s
Took 0.96s
Took 0.96s
Took 0.22s
Took 0.30s
... 3. run ...
Took 0.97s
Took 0.96s
Took 0.96s
Took 0.22s
Took 0.30s

现在将GenericTest更改为GenericTest !!

... 1. run ...
Took 0.28s
Took 0.24s
Took 0.24s
Took 0.24s
Took 0.27s
... 2. run ...
Took 0.24s
Took 0.24s
Took 0.24s
Took 0.24s
Took 0.27s
... 3. run ...
Took 0.25s
Took 0.25s
Took 0.25s
Took 0.24s
Took 0.27s

这是c#编译器的一个惊喜,类似于我在密封类以避免虚函数调用时遇到的情况。也许埃里克·利珀特对此有什么看法?

删除对聚合的继承可以恢复性能。我学会了永远不要使用继承,非常非常少,我强烈建议你避免使用它,至少在这种情况下。(这是我对这个问题的务实解决方案,无意挑起战火)。我一直严格使用接口,它们不会带来性能损失。

我将解释我认为这里和所有泛型发生了什么。我需要一些空间来写,所以我把这个作为一个答案。感谢大家的评论和帮助,我一定会给大家加分的。

To get started…

编译泛型

我们都知道,泛型是编译器在运行时填充类型信息的"模板"类型。它可以根据约束条件做出假设,但它不会改变IL代码……(但稍后会详细介绍)。

我的问题中的一个方法:

public class Foo<T>
{
    public void bool Handle(T foo) 
    {
        return foo.Equals(default(T));
    }
}

这里的约束是TObject,这意味着对Equals的调用将会是Object.Equals。因为T正在实现Object。等于,这看起来像:

L_0016: callvirt instance bool [mscorlib]System.Object::Equals(object)

我们可以通过添加约束T : IEquatable<T>来明确T实现Equals来改进这一点。这将调用更改为:

L_0011: callvirt instance bool [mscorlib]System.IEquatable`1<!T>::Equals(!0)

然而,由于T还没有被填充,显然IL不支持直接调用T::Equals(!0),即使它肯定在那里。编译器显然只能假设约束已经满足,因此它需要对定义该方法的IEquatable 1 '发出调用。

显然,像sealed这样的提示没有什么区别,即使它们应该有。

结论:由于不支持T::Equals(!0),因此需要一个虚函数表查找才能使其工作。一旦它变成了callvirt, JIT编译器就很难弄清楚它应该只使用call

应该发生什么:基本上微软应该支持T::Equals(!0)当这个方法明确存在。这改变了对IL中正常call的调用,使其更快。

但情况更糟

那么调用Foo::Handle呢?

令我惊讶的是,对Foo<T>::Handle的调用也是callvirt而不是call。同样的行为也可以在fex中找到。List<T>::Add等等。我的观察是,只有使用this的调用才会成为正常的call;其他所有内容都将编译为callvirt

结论:行为就好像你得到一个像Foo<int>:Foo<T>:[the rest]这样的类结构,这并没有真正的意义。显然,从泛型类外部对该类的所有调用都将编译虚函数表查找。

应该发生的事情:如果该方法是非虚拟的,Microsoft应该将callvirt更改为call。callvirt真的没有任何理由。

结论

如果使用其他类型的泛型,即使没有必要,也要准备好获得callvirt而不是call。结果的性能基本上是您可以从这样的调用中期望的…

恕我直言,这真是一个耻辱。类型安全可以帮助开发人员,同时使您的代码更快,因为编译器可以对正在发生的事情做出假设。我从这一切中学到的教训是:不要使用泛型,除非你不关心额外的虚函数表查找(直到Microsoft修复了这个问题).

未来工作

首先,我要把这篇文章发布在Microsoft Connect上。我认为这是。net中一个严重的bug,它毫无理由地降低了性能。(https://connect.microsoft.com/VisualStudio/feedback/details/782346/using-generics-will-always-compile-to-callvirt-even-if-this-is-not-necessary)


Microsoft Connect查询结果

是的,我们有结果了,我要感谢Mike Danes!

foo.Equals(default(T))的方法调用将编译为Object.Equals(boxed[new !0]),因为所有T的唯一相同等于是Object.Equals。这将导致装箱操作和虚函数表查找。

如果我们想要这个东西使用正确的Equals,我们必须给编译器一个提示,即该类型实现bool Equals(T)。这可以通过告诉编译器类型T实现IEquatable<T>来实现。

换句话说:按如下方式更改类的签名:

public class GenericsTest<T> where T:IEquatable<T>
{
    public bool Check(T value)
    {
        return value.Equals(default(T));
    }
    protected Func<T, bool> func;
}

当您这样做时,运行时将找到正确的Equals方法。唷…

要完全解决这个难题,还需要一个元素:. net 4.5。. net 4.5的运行时能够内联此方法,从而使其再次达到应有的速度。在。net 4.0中(这是我目前使用的版本),这个功能似乎不存在。在IL中,调用仍然是callvirt,但运行时将解决这个难题。

如果你测试这段代码,它应该和最快的测试用例一样快。有人能确认一下吗?