我可以给编译器/JIT提供哪些优化提示?

本文关键字:优化 提示 编译器 JIT 我可以 | 更新日期: 2023-09-27 18:11:38

我已经分析过了,现在我希望从我的热点中挤出每一点可能的性能。

我知道[MethodImplOptions。和ProfileOptimization类。还有其他的吗?


[Edit] 我刚刚发现了[TargetedPatchingOptOut]。没关系,显然不需要。

我可以给编译器/JIT提供哪些优化提示?

是的,还有更多的技巧:-)

我实际上对优化c#代码做了相当多的研究。到目前为止,这些是最重要的结果:

    直接传递的Func's和Action's通常由JIT'ter内联。请注意,您不应该将它们存储为变量,因为它们随后将作为委托调用。
  1. 小心超载。调用Equals而不使用IEquatable<T>通常是一个糟糕的计划-所以如果你使用fex。使用哈希时,一定要实现正确的重载和接口,因为它会为你带来大量的性能保障。
  2. 从其他类调用的
  3. 泛型永远不会内联。原因就在于这里所描述的"魔力"。
  4. 如果你使用数据结构,确保尝试使用数组代替:-)真的,这些东西是快如地狱相比…嗯,我想什么都可以。通过使用我自己的哈希表和使用数组而不是列表,我已经优化了很多东西。
  5. 在很多情况下,表查找比计算东西或使用虚函数表查找、开关、多个if语句甚至计算等结构要快。如果你有分支,这也是一个很好的技巧;失败的分支预测通常会成为一个大麻烦。参见这篇文章——这是我在c#中经常使用的一个技巧,在很多情况下都很有效。哦,查找表当然是数组。
  6. 尝试制作(小)类结构。由于值类型的性质,一些优化对于结构类型和类类型是不同的。例如,方法调用更简单,因为编译器确切地知道将要调用哪个方法。此外,结构数组通常比类数组更快,因为它们每次数组操作需要的内存操作更少。
  7. 不要使用多维数组。虽然我更喜欢Foo[],但Foo[][]通常比Foo[,]快。
  8. 如果你正在复制数据,首选Buffer。BlockCopy over Array。一周中的任何一天都可以复制。对字符串也要小心:字符串操作可能会降低性能。

曾经也有一个名为"英特尔奔腾处理器的优化"的指南,其中有大量的技巧(比如移动或乘而不是除)。虽然现在编译器做了很好的工作,但这有时也会有所帮助。

当然这些只是优化;最大的性能提升通常是改变算法和/或数据结构的结果。一定要看看你有哪些可用的选项,不要被。net框架限制太多……而且我有一种自然的倾向,不信任。net实现,直到我自己检查了反编译的代码……有很多东西可以更快地实现(大多数时候是有充分理由的)。

HTH


Alex向我指出,根据一些人的说法,Array.Copy实际上更快。既然我真的不知道这些年来发生了什么变化,我决定,唯一合适的做法就是创建一个新的基准,并对其进行测试。

如果你只对结果感兴趣,往下看。在大多数情况下,对Buffer.BlockCopy的调用明显优于Array.Copy。在。net 4.5.2的Intel Skylake上测试,内存为16gb (> 10gb空闲)。

代码:

static void TestNonOverlapped1(int K)
{
    long total = 1000000000;
    long iter = total / K;
    byte[] tmp = new byte[K];
    byte[] tmp2 = new byte[K];
    for (long i = 0; i < iter; ++i)
    {
        Array.Copy(tmp, tmp2, K);
    }
}
static void TestNonOverlapped2(int K)
{
    long total = 1000000000;
    long iter = total / K;
    byte[] tmp = new byte[K];
    byte[] tmp2 = new byte[K];
    for (long i = 0; i < iter; ++i)
    {
        Buffer.BlockCopy(tmp, 0, tmp2, 0, K);
    }
}
static void TestOverlapped1(int K)
{
    long total = 1000000000;
    long iter = total / K;
    byte[] tmp = new byte[K + 16];
    for (long i = 0; i < iter; ++i)
    {
        Array.Copy(tmp, 0, tmp, 16, K);
    }
}
static void TestOverlapped2(int K)
{
    long total = 1000000000;
    long iter = total / K;
    byte[] tmp = new byte[K + 16];
    for (long i = 0; i < iter; ++i)
    {
        Buffer.BlockCopy(tmp, 0, tmp, 16, K);
    }
}
static void Main(string[] args)
{
    for (int i = 0; i < 10; ++i)
    {
        int N = 16 << i;
        Console.WriteLine("Block size: {0} bytes", N);
        Stopwatch sw = Stopwatch.StartNew();
        {
            sw.Restart();
            TestNonOverlapped1(N);
            Console.WriteLine("Non-overlapped Array.Copy: {0:0.00} ms", sw.Elapsed.TotalMilliseconds);
            GC.Collect(GC.MaxGeneration);
            GC.WaitForFullGCComplete();
        }
        {
            sw.Restart();
            TestNonOverlapped2(N);
            Console.WriteLine("Non-overlapped Buffer.BlockCopy: {0:0.00} ms", sw.Elapsed.TotalMilliseconds);
            GC.Collect(GC.MaxGeneration);
            GC.WaitForFullGCComplete();
        }
        {
            sw.Restart();
            TestOverlapped1(N);
            Console.WriteLine("Overlapped Array.Copy: {0:0.00} ms", sw.Elapsed.TotalMilliseconds);
            GC.Collect(GC.MaxGeneration);
            GC.WaitForFullGCComplete();
        }
        {
            sw.Restart();
            TestOverlapped2(N);
            Console.WriteLine("Overlapped Buffer.BlockCopy: {0:0.00} ms", sw.Elapsed.TotalMilliseconds);
            GC.Collect(GC.MaxGeneration);
            GC.WaitForFullGCComplete();
        }
        Console.WriteLine("-------------------------");
    }
    Console.ReadLine();
}

x86 JIT的结果:

Block size: 16 bytes
Non-overlapped Array.Copy: 4267.52 ms
Non-overlapped Buffer.BlockCopy: 2887.05 ms
Overlapped Array.Copy: 3305.01 ms
Overlapped Buffer.BlockCopy: 2670.18 ms
-------------------------
Block size: 32 bytes
Non-overlapped Array.Copy: 1327.55 ms
Non-overlapped Buffer.BlockCopy: 763.89 ms
Overlapped Array.Copy: 2334.91 ms
Overlapped Buffer.BlockCopy: 2158.49 ms
-------------------------
Block size: 64 bytes
Non-overlapped Array.Copy: 705.76 ms
Non-overlapped Buffer.BlockCopy: 390.63 ms
Overlapped Array.Copy: 1303.00 ms
Overlapped Buffer.BlockCopy: 1103.89 ms
-------------------------
Block size: 128 bytes
Non-overlapped Array.Copy: 361.18 ms
Non-overlapped Buffer.BlockCopy: 219.77 ms
Overlapped Array.Copy: 620.21 ms
Overlapped Buffer.BlockCopy: 577.20 ms
-------------------------
Block size: 256 bytes
Non-overlapped Array.Copy: 192.92 ms
Non-overlapped Buffer.BlockCopy: 108.71 ms
Overlapped Array.Copy: 347.63 ms
Overlapped Buffer.BlockCopy: 353.40 ms
-------------------------
Block size: 512 bytes
Non-overlapped Array.Copy: 104.69 ms
Non-overlapped Buffer.BlockCopy: 65.65 ms
Overlapped Array.Copy: 211.77 ms
Overlapped Buffer.BlockCopy: 202.94 ms
-------------------------
Block size: 1024 bytes
Non-overlapped Array.Copy: 52.93 ms
Non-overlapped Buffer.BlockCopy: 38.84 ms
Overlapped Array.Copy: 144.39 ms
Overlapped Buffer.BlockCopy: 154.09 ms
-------------------------
Block size: 2048 bytes
Non-overlapped Array.Copy: 45.64 ms
Non-overlapped Buffer.BlockCopy: 30.11 ms
Overlapped Array.Copy: 118.33 ms
Overlapped Buffer.BlockCopy: 109.16 ms
-------------------------
Block size: 4096 bytes
Non-overlapped Array.Copy: 30.93 ms
Non-overlapped Buffer.BlockCopy: 30.72 ms
Overlapped Array.Copy: 119.73 ms
Overlapped Buffer.BlockCopy: 104.66 ms
-------------------------
Block size: 8192 bytes
Non-overlapped Array.Copy: 30.37 ms
Non-overlapped Buffer.BlockCopy: 26.63 ms
Overlapped Array.Copy: 90.46 ms
Overlapped Buffer.BlockCopy: 87.40 ms
-------------------------

x64 JIT的结果:

Block size: 16 bytes
Non-overlapped Array.Copy: 1252.71 ms
Non-overlapped Buffer.BlockCopy: 694.34 ms
Overlapped Array.Copy: 701.27 ms
Overlapped Buffer.BlockCopy: 573.34 ms
-------------------------
Block size: 32 bytes
Non-overlapped Array.Copy: 995.47 ms
Non-overlapped Buffer.BlockCopy: 654.70 ms
Overlapped Array.Copy: 398.48 ms
Overlapped Buffer.BlockCopy: 336.86 ms
-------------------------
Block size: 64 bytes
Non-overlapped Array.Copy: 498.86 ms
Non-overlapped Buffer.BlockCopy: 329.15 ms
Overlapped Array.Copy: 218.43 ms
Overlapped Buffer.BlockCopy: 179.95 ms
-------------------------
Block size: 128 bytes
Non-overlapped Array.Copy: 263.00 ms
Non-overlapped Buffer.BlockCopy: 196.71 ms
Overlapped Array.Copy: 137.21 ms
Overlapped Buffer.BlockCopy: 107.02 ms
-------------------------
Block size: 256 bytes
Non-overlapped Array.Copy: 144.31 ms
Non-overlapped Buffer.BlockCopy: 101.23 ms
Overlapped Array.Copy: 85.49 ms
Overlapped Buffer.BlockCopy: 69.30 ms
-------------------------
Block size: 512 bytes
Non-overlapped Array.Copy: 76.76 ms
Non-overlapped Buffer.BlockCopy: 55.31 ms
Overlapped Array.Copy: 61.99 ms
Overlapped Buffer.BlockCopy: 54.06 ms
-------------------------
Block size: 1024 bytes
Non-overlapped Array.Copy: 44.01 ms
Non-overlapped Buffer.BlockCopy: 33.30 ms
Overlapped Array.Copy: 53.13 ms
Overlapped Buffer.BlockCopy: 51.36 ms
-------------------------
Block size: 2048 bytes
Non-overlapped Array.Copy: 27.05 ms
Non-overlapped Buffer.BlockCopy: 25.57 ms
Overlapped Array.Copy: 46.86 ms
Overlapped Buffer.BlockCopy: 47.83 ms
-------------------------
Block size: 4096 bytes
Non-overlapped Array.Copy: 29.11 ms
Non-overlapped Buffer.BlockCopy: 25.12 ms
Overlapped Array.Copy: 45.05 ms
Overlapped Buffer.BlockCopy: 47.84 ms
-------------------------
Block size: 8192 bytes
Non-overlapped Array.Copy: 24.95 ms
Non-overlapped Buffer.BlockCopy: 21.52 ms
Overlapped Array.Copy: 43.81 ms
Overlapped Buffer.BlockCopy: 43.22 ms
-------------------------

您已经用尽了。net 4.5中添加的直接影响编译代码的选项。下一步是查看生成的机器代码,以发现任何明显的低效率。对调试器这样做,首先防止它禁用优化器。工具+选项,调试,常规,取消勾选"模块加载时抑制JIT优化"选项。在热代码上设置一个断点,调试+反汇编来查看它。

没有那么多要考虑的,抖动优化器通常做得很好。要查找的一件事是尝试消除数组边界检查失败,fixed关键字是一个不安全的解决方案。一个极端情况是尝试内联方法失败,并且抖动没有有效地使用cpu寄存器,这是x86抖动的问题,并通过MethodImplOptions.NoInlining修复。优化器在将不变代码从循环中提升出来方面并不是非常有效,但这是你在寻找优化方法时盯着c#代码时几乎总是首先考虑的事情。

最重要的是要知道你什么时候完成,不能指望更快。只有通过比较苹果和橘子,并使用c++/CLI用本地代码编写热门代码,才能真正实现这一点。确保这段代码是用#pragma unmanaged有效编译的,这样它就能得到充分的优化器的支持。从托管代码切换到本机代码执行是有成本的,所以一定要确保本机代码的执行时间足够长。否则这并不容易做到,你当然也不能保证成功。虽然知道你已经完成了,可以节省你很多时间跌跌撞撞进入死胡同。