为什么操作符比方法调用慢这么多?(结构体只在较旧的jit上较慢)

本文关键字:jit 结构体 方法 操作符 调用 为什么 | 更新日期: 2023-09-27 18:10:14

简介:我用c#编写高性能代码。是的,我知道c++会给我更好的优化,但我仍然选择使用c#。我不想就这一选择进行辩论。相反,我更希望听到那些像我一样,正在尝试在。net框架上编写高性能代码的人的声音。

问题:

  • 为什么下面代码中的操作符比等效操作符慢方法调用?
  • 为什么在下面的代码中方法传递两个双精度值比传递具有两个结构体的等效方法快双打在里面?(A:旧的jit优化结构很差)
  • 有没有办法让。net JIT编译器处理简单的结构和结构的成员一样有效?(A:更新JIT)

我想我知道的:最初的。net JIT编译器不会内联任何涉及结构体的东西。奇异的给定结构应该只在需要小值类型时使用,应该像内置结构一样进行优化,但这是正确的。幸运的是,在。net 3.5SP1和。net 2.0SP2中,他们对JIT优化器做了一些改进,包括对内联的改进,特别是对结构体的改进。(我猜他们这么做是因为否则他们引入的新Complex结构会表现得很糟糕……所以Complex团队可能是在打击JIT优化器团队。)因此,. net 3.5 SP1之前的任何文档都可能与此问题不太相关。

我的测试显示:我已经通过检查C:'Windows'Microsoft.NET'Framework'v2.0.50727'mscorwks.dll文件的版本>= 3053来验证我确实有较新的JIT优化器,因此应该对JIT优化器进行这些改进。然而,即使这样,我的计时和查看反汇编都显示:

jit生成的用于传递两个双精度类型的结构体的代码远不如直接传递两个双精度类型的代码效率高。

jit生成的struct方法代码传递'this'要比传递struct作为参数有效得多。

如果传递两个双精度体而不是传递一个有两个双精度体的结构体,JIT仍然可以更好地内联,即使是在循环中使用乘数也是如此。

时间:实际上,查看反汇编,我意识到循环中的大部分时间只是从List中访问测试数据。如果将循环的开销代码和对数据的访问排除在外,那么进行相同调用的四种方法之间的差异就会大不相同。我得到从5倍到20倍的加速做PlusEqual(double, double)而不是PlusEqual(Element)。执行PlusEqual(double, double)而不是操作符+=则是执行PlusEqual(double, double)的10倍到40倍。哇。伤心。

下面是一组计时:

Populating List<Element> took 320ms.
The PlusEqual() method took 105ms.
The 'same' += operator took 131ms.
The 'same' -= operator took 139ms.
The PlusEqual(double, double) method took 68ms.
The do nothing loop took 66ms.
The ratio of operator with constructor to method is 124%.
The ratio of operator without constructor to method is 132%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 64%.
If we remove the overhead time for the loop accessing the elements from the List...
The ratio of operator with constructor to method is 166%.
The ratio of operator without constructor to method is 187%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 5%.

代码:

namespace OperatorVsMethod
{
  public struct Element
  {
    public double Left;
    public double Right;
    public Element(double left, double right)
    {
      this.Left = left;
      this.Right = right;
    }
    public static Element operator +(Element x, Element y)
    {
      return new Element(x.Left + y.Left, x.Right + y.Right);
    }
    public static Element operator -(Element x, Element y)
    {
      x.Left += y.Left;
      x.Right += y.Right;
      return x;
    }    
    /// <summary>
    /// Like the += operator; but faster.
    /// </summary>
    public void PlusEqual(Element that)
    {
      this.Left += that.Left;
      this.Right += that.Right;
    }    
    /// <summary>
    /// Like the += operator; but faster.
    /// </summary>
    public void PlusEqual(double thatLeft, double thatRight)
    {
      this.Left += thatLeft;
      this.Right += thatRight;
    }    
  }    
  [TestClass]
  public class UnitTest1
  {
    [TestMethod]
    public void TestMethod1()
    {
      Stopwatch stopwatch = new Stopwatch();
      // Populate a List of Elements to multiply together
      int seedSize = 4;
      List<double> doubles = new List<double>(seedSize);
      doubles.Add(2.5d);
      doubles.Add(100000d);
      doubles.Add(-0.5d);
      doubles.Add(-100002d);
      int size = 2500000 * seedSize;
      List<Element> elts = new List<Element>(size);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        int di = ii % seedSize;
        double d = doubles[di];
        elts.Add(new Element(d, d));
      }
      stopwatch.Stop();
      long populateMS = stopwatch.ElapsedMilliseconds;
      // Measure speed of += operator (calls ctor)
      Element operatorCtorResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        operatorCtorResult += elts[ii];
      }
      stopwatch.Stop();
      long operatorCtorMS = stopwatch.ElapsedMilliseconds;
      // Measure speed of -= operator (+= without ctor)
      Element operatorNoCtorResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        operatorNoCtorResult -= elts[ii];
      }
      stopwatch.Stop();
      long operatorNoCtorMS = stopwatch.ElapsedMilliseconds;
      // Measure speed of PlusEqual(Element) method
      Element plusEqualResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        plusEqualResult.PlusEqual(elts[ii]);
      }
      stopwatch.Stop();
      long plusEqualMS = stopwatch.ElapsedMilliseconds;
      // Measure speed of PlusEqual(double, double) method
      Element plusEqualDDResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        Element elt = elts[ii];
        plusEqualDDResult.PlusEqual(elt.Left, elt.Right);
      }
      stopwatch.Stop();
      long plusEqualDDMS = stopwatch.ElapsedMilliseconds;
      // Measure speed of doing nothing but accessing the Element
      Element doNothingResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        Element elt = elts[ii];
        double left = elt.Left;
        double right = elt.Right;
      }
      stopwatch.Stop();
      long doNothingMS = stopwatch.ElapsedMilliseconds;
      // Report results
      Assert.AreEqual(1d, operatorCtorResult.Left, "The operator += did not compute the right result!");
      Assert.AreEqual(1d, operatorNoCtorResult.Left, "The operator += did not compute the right result!");
      Assert.AreEqual(1d, plusEqualResult.Left, "The operator += did not compute the right result!");
      Assert.AreEqual(1d, plusEqualDDResult.Left, "The operator += did not compute the right result!");
      Assert.AreEqual(1d, doNothingResult.Left, "The operator += did not compute the right result!");
      // Report speeds
      Console.WriteLine("Populating List<Element> took {0}ms.", populateMS);
      Console.WriteLine("The PlusEqual() method took {0}ms.", plusEqualMS);
      Console.WriteLine("The 'same' += operator took {0}ms.", operatorCtorMS);
      Console.WriteLine("The 'same' -= operator took {0}ms.", operatorNoCtorMS);
      Console.WriteLine("The PlusEqual(double, double) method took {0}ms.", plusEqualDDMS);
      Console.WriteLine("The do nothing loop took {0}ms.", doNothingMS);
      // Compare speeds
      long percentageRatio = 100L * operatorCtorMS / plusEqualMS;
      Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio);
      percentageRatio = 100L * operatorNoCtorMS / plusEqualMS;
      Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio);
      percentageRatio = 100L * plusEqualDDMS / plusEqualMS;
      Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio);
      operatorCtorMS -= doNothingMS;
      operatorNoCtorMS -= doNothingMS;
      plusEqualMS -= doNothingMS;
      plusEqualDDMS -= doNothingMS;
      Console.WriteLine("If we remove the overhead time for the loop accessing the elements from the List...");
      percentageRatio = 100L * operatorCtorMS / plusEqualMS;
      Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio);
      percentageRatio = 100L * operatorNoCtorMS / plusEqualMS;
      Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio);
      percentageRatio = 100L * plusEqualDDMS / plusEqualMS;
      Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio);
    }
  }
}

IL:上面的一些被编译成)

public void PlusEqual(Element that)
    {
00000000 push    ebp 
00000001 mov     ebp,esp 
00000003 push    edi 
00000004 push    esi 
00000005 push    ebx 
00000006 sub     esp,30h 
00000009 xor     eax,eax 
0000000b mov     dword ptr [ebp-10h],eax 
0000000e xor     eax,eax 
00000010 mov     dword ptr [ebp-1Ch],eax 
00000013 mov     dword ptr [ebp-3Ch],ecx 
00000016 cmp     dword ptr ds:[04C87B7Ch],0 
0000001d je     00000024 
0000001f call    753081B1 
00000024 nop       
      this.Left += that.Left;
00000025 mov     eax,dword ptr [ebp-3Ch] 
00000028 fld     qword ptr [ebp+8] 
0000002b fadd    qword ptr [eax] 
0000002d fstp    qword ptr [eax] 
      this.Right += that.Right;
0000002f mov     eax,dword ptr [ebp-3Ch] 
00000032 fld     qword ptr [ebp+10h] 
00000035 fadd    qword ptr [eax+8] 
00000038 fstp    qword ptr [eax+8] 
    }
0000003b nop       
0000003c lea     esp,[ebp-0Ch] 
0000003f pop     ebx 
00000040 pop     esi 
00000041 pop     edi 
00000042 pop     ebp 
00000043 ret     10h 
 public void PlusEqual(double thatLeft, double thatRight)
    {
00000000 push    ebp 
00000001 mov     ebp,esp 
00000003 push    edi 
00000004 push    esi 
00000005 push    ebx 
00000006 sub     esp,30h 
00000009 xor     eax,eax 
0000000b mov     dword ptr [ebp-10h],eax 
0000000e xor     eax,eax 
00000010 mov     dword ptr [ebp-1Ch],eax 
00000013 mov     dword ptr [ebp-3Ch],ecx 
00000016 cmp     dword ptr ds:[04C87B7Ch],0 
0000001d je     00000024 
0000001f call    75308159 
00000024 nop       
      this.Left += thatLeft;
00000025 mov     eax,dword ptr [ebp-3Ch] 
00000028 fld     qword ptr [ebp+10h] 
0000002b fadd    qword ptr [eax] 
0000002d fstp    qword ptr [eax] 
      this.Right += thatRight;
0000002f mov     eax,dword ptr [ebp-3Ch] 
00000032 fld     qword ptr [ebp+8] 
00000035 fadd    qword ptr [eax+8] 
00000038 fstp    qword ptr [eax+8] 
    }
0000003b nop       
0000003c lea     esp,[ebp-0Ch] 
0000003f pop     ebx 
00000040 pop     esi 
00000041 pop     edi 
00000042 pop     ebp 
00000043 ret     10h 

为什么操作符比方法调用慢这么多?(结构体只在较旧的jit上较慢)

我得到了非常不同的结果,少了很多戏剧性。但没有使用测试运行器,我将代码粘贴到控制台模式应用程序中。当我尝试它时,5%的结果是~87%在32位模式下,~100%在64位模式下。

对齐对于双精度是至关重要的,. net运行时只能承诺在32位机器上对齐4。在我看来,测试运行器正在启动测试方法,其堆栈地址对齐为4而不是8。当双子星越过缓存线边界时,不对齐的惩罚会变得非常大。

我复制你的结果有困难。

我拿了你的代码:

  • 使它成为一个独立的控制台应用程序
  • 构建了一个优化(发布)构建
  • 将"尺寸"因子从2.5M增加到10M
  • 从命令行(IDE外)运行

当我这样做的时候,我得到的时间与你的大不相同。为避免疑义,我将准确地发布我使用的代码。

这是我的时间

Populating List<Element> took 527ms.
The PlusEqual() method took 450ms.
The 'same' += operator took 386ms.
The 'same' -= operator took 446ms.
The PlusEqual(double, double) method took 413ms.
The do nothing loop took 229ms.
The ratio of operator with constructor to method is 85%.
The ratio of operator without constructor to method is 99%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 91%.
If we remove the overhead time for the loop accessing the elements from the List...
The ratio of operator with constructor to method is 71%.
The ratio of operator without constructor to method is 98%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 83%.

这些是我对你的代码的编辑:

namespace OperatorVsMethod
{
  public struct Element
  {
    public double Left;
    public double Right;
    public Element(double left, double right)
    {
      this.Left = left;
      this.Right = right;
    }    
    public static Element operator +(Element x, Element y)
    {
      return new Element(x.Left + y.Left, x.Right + y.Right);
    }
    public static Element operator -(Element x, Element y)
    {
      x.Left += y.Left;
      x.Right += y.Right;
      return x;
    }    
    /// <summary>
    /// Like the += operator; but faster.
    /// </summary>
    public void PlusEqual(Element that)
    {
      this.Left += that.Left;
      this.Right += that.Right;
    }    
    /// <summary>
    /// Like the += operator; but faster.
    /// </summary>
    public void PlusEqual(double thatLeft, double thatRight)
    {
      this.Left += thatLeft;
      this.Right += thatRight;
    }    
  }    
  public class UnitTest1
  {
    public static void Main()
    {
      Stopwatch stopwatch = new Stopwatch();
      // Populate a List of Elements to multiply together
      int seedSize = 4;
      List<double> doubles = new List<double>(seedSize);
      doubles.Add(2.5d);
      doubles.Add(100000d);
      doubles.Add(-0.5d);
      doubles.Add(-100002d);
      int size = 10000000 * seedSize;
      List<Element> elts = new List<Element>(size);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        int di = ii % seedSize;
        double d = doubles[di];
        elts.Add(new Element(d, d));
      }
      stopwatch.Stop();
      long populateMS = stopwatch.ElapsedMilliseconds;
      // Measure speed of += operator (calls ctor)
      Element operatorCtorResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        operatorCtorResult += elts[ii];
      }
      stopwatch.Stop();
      long operatorCtorMS = stopwatch.ElapsedMilliseconds;
      // Measure speed of -= operator (+= without ctor)
      Element operatorNoCtorResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        operatorNoCtorResult -= elts[ii];
      }
      stopwatch.Stop();
      long operatorNoCtorMS = stopwatch.ElapsedMilliseconds;
      // Measure speed of PlusEqual(Element) method
      Element plusEqualResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        plusEqualResult.PlusEqual(elts[ii]);
      }
      stopwatch.Stop();
      long plusEqualMS = stopwatch.ElapsedMilliseconds;
      // Measure speed of PlusEqual(double, double) method
      Element plusEqualDDResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        Element elt = elts[ii];
        plusEqualDDResult.PlusEqual(elt.Left, elt.Right);
      }
      stopwatch.Stop();
      long plusEqualDDMS = stopwatch.ElapsedMilliseconds;
      // Measure speed of doing nothing but accessing the Element
      Element doNothingResult = new Element(1d, 1d);
      stopwatch.Reset();
      stopwatch.Start();
      for (int ii = 0; ii < size; ++ii)
      {
        Element elt = elts[ii];
        double left = elt.Left;
        double right = elt.Right;
      }
      stopwatch.Stop();
      long doNothingMS = stopwatch.ElapsedMilliseconds;
      // Report speeds
      Console.WriteLine("Populating List<Element> took {0}ms.", populateMS);
      Console.WriteLine("The PlusEqual() method took {0}ms.", plusEqualMS);
      Console.WriteLine("The 'same' += operator took {0}ms.", operatorCtorMS);
      Console.WriteLine("The 'same' -= operator took {0}ms.", operatorNoCtorMS);
      Console.WriteLine("The PlusEqual(double, double) method took {0}ms.", plusEqualDDMS);
      Console.WriteLine("The do nothing loop took {0}ms.", doNothingMS);
      // Compare speeds
      long percentageRatio = 100L * operatorCtorMS / plusEqualMS;
      Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio);
      percentageRatio = 100L * operatorNoCtorMS / plusEqualMS;
      Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio);
      percentageRatio = 100L * plusEqualDDMS / plusEqualMS;
      Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio);
      operatorCtorMS -= doNothingMS;
      operatorNoCtorMS -= doNothingMS;
      plusEqualMS -= doNothingMS;
      plusEqualDDMS -= doNothingMS;
      Console.WriteLine("If we remove the overhead time for the loop accessing the elements from the List...");
      percentageRatio = 100L * operatorCtorMS / plusEqualMS;
      Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio);
      percentageRatio = 100L * operatorNoCtorMS / plusEqualMS;
      Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio);
      percentageRatio = 100L * plusEqualDDMS / plusEqualMS;
      Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio);
    }
  }
}

这里运行。net 4.0。我用"任意CPU"进行编译,目标是。net 4.0的发布模式。从命令行执行。它在64位模式下运行。我的时间安排有点不同。

Populating List<Element> took 442ms.
The PlusEqual() method took 115ms.
The 'same' += operator took 201ms.
The 'same' -= operator took 200ms.
The PlusEqual(double, double) method took 129ms.
The do nothing loop took 93ms.
The ratio of operator with constructor to method is 174%.
The ratio of operator without constructor to method is 173%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 112%.
If we remove the overhead time for the loop accessing the elements from the List
...
The ratio of operator with constructor to method is 490%.
The ratio of operator without constructor to method is 486%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 163%.

特别是PlusEqual(Element)PlusEqual(double, double)稍快。

无论。net 3.5中的问题是什么,在。net 4.0中似乎都不存在。

像@Corey Kosak一样,我只是在vs2010 Express中运行这段代码,作为发布模式下的简单控制台应用程序。我得到了非常不同的数字。但我也有Fx4.5,所以这些可能不是一个干净的Fx4.0的结果。

Populating List<Element> took 435ms.
The PlusEqual() method took 109ms.
The 'same' += operator took 217ms.
The 'same' -= operator took 157ms.
The PlusEqual(double, double) method took 118ms.
The do nothing loop took 79ms.
The ratio of operator with constructor to method is 199%.
The ratio of operator without constructor to method is 144%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 108%.
If we remove the overhead time for the loop accessing the elements from the List
...
The ratio of operator with constructor to method is 460%.
The ratio of operator without constructor to method is 260%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 130%.

编辑:现在从cmd行运行。这确实有影响,而且数字上的变化更小。

除了其他答案中提到的JIT编译器的差异之外,结构方法调用和结构操作符的另一个区别是结构方法调用将this作为ref参数传递(也可以将其他参数写为接受ref参数),而结构操作符将按值传递所有操作数。传递任何大小的结构体作为ref参数的代价是固定的,无论结构体有多大,而传递更大的结构体的代价与结构体大小成正比。如果可以避免不必要的复制,那么使用大型结构体(甚至数百字节)并没有什么错;虽然在使用方法时通常可以防止不必要的复制,但在使用操作符时却无法防止不必要的复制。

不确定这是否相关,但这里是。net 4.0 64位在Windows 7 64位上的数字。我的mscorwks.dll版本是2.0.50727.5446。我只是将代码粘贴到LINQPad并从那里运行它。结果如下:

Populating List<Element> took 496ms.
The PlusEqual() method took 189ms.
The 'same' += operator took 295ms.
The 'same' -= operator took 358ms.
The PlusEqual(double, double) method took 148ms.
The do nothing loop took 103ms.
The ratio of operator with constructor to method is 156%.
The ratio of operator without constructor to method is 189%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 78%.
If we remove the overhead time for the loop accessing the elements from the List
...
The ratio of operator with constructor to method is 223%.
The ratio of operator without constructor to method is 296%.
The ratio of PlusEqual(double,double) to PlusEqual(Element) is 52%.

我想当你访问结构体的成员时,它实际上是在做一个额外的操作来访问成员,THIS指针+偏移量。

也许你应该使用double[]与"众所周知"的偏移量和索引增量来代替List ?