数组边界检查for循环中的优化
本文关键字:优化 循环 for 边界 检查 数组 | 更新日期: 2023-09-27 18:02:22
var ar = new int[500000000];
var sw = new Stopwatch();
sw.Start();
var length = ar.Length;
for (var i = 0; i < length; i++)
{
if (ar[i] == 0);
}
sw.Stop();
西南。ElapsedMilliseconds: ~ 2930 ms
var ar = new int[500000000];
var sw = new Stopwatch();
sw.Start();
for (var i = 0; i < ar.Length; i++)
{
if (ar[i] == 0);
}
sw.Stop();
西南。ElapsedMilliseconds: ~ 3520 ms
Win8x64, VS12, . net 4.5, Release build, "Optimize code" on.
据我所知,由于数组边界检查优化,第二种方法应该更快。我错过什么了吗?
我也使用Win8 x64, .NET 4.5,发布版本,调试器之外(这是一个重要的);我:
0: 813ms vs 421ms
1: 439ms vs 420ms
2: 440ms vs 420ms
3: 431ms vs 429ms
4: 433ms vs 427ms
5: 424ms vs 437ms
6: 427ms vs 434ms
7: 430ms vs 432ms
8: 432ms vs 435ms
9: 430ms vs 430ms
10: 427ms vs 418ms
11: 422ms vs 421ms
12: 434ms vs 420ms
13: 439ms vs 425ms
14: 426ms vs 429ms
15: 426ms vs 426ms
16: 417ms vs 432ms
17: 442ms vs 425ms
18: 420ms vs 429ms
19: 420ms vs 422ms
第一种方法需要付出JIT/"融合"的代价,但总体上是一样的(每列中的一些看起来更快,但总体上没什么可说的)。
using System;
using System.Diagnostics;
static class Program
{
static void Main()
{
var ar = new int[500000000];
for (int j = 0; j < 20; j++)
{
var sw = Stopwatch.StartNew();
var length = ar.Length;
for (var i = 0; i < length; i++)
{
if (ar[i] == 0) ;
}
sw.Stop();
long hoisted = sw.ElapsedMilliseconds;
sw = Stopwatch.StartNew();
for (var i = 0; i < ar.Length; i++)
{
if (ar[i] == 0) ;
}
sw.Stop();
long direct = sw.ElapsedMilliseconds;
Console.WriteLine("{0}: {1}ms vs {2}ms", j, hoisted, direct);
}
}
}
我对此进行了更多的研究,发现很难制作一个实际显示边界检查消除优化效果的基准测试。
旧基准的一些问题:
- 反汇编表明JIT编译器能够优化第一个版本。这对我来说是一个惊喜,但拆卸不会说谎。当然,这完全违背了这个基准的目的。修复:将长度作为函数参数。
- 数组太大,这意味着缓存丢失,这给我们的信号增加了很多噪音。修复:使用一个短数组,但循环多次。
但是现在真正的问题是:它做了一些非常聪明的事情。内部循环中没有数组边界测试,即使循环的长度来自函数参数。生成的代码不同,但内部循环本质上是相同的。不完全(不同的寄存器等),但它遵循相同的模式:
_loop: mov eax, [somewhere + index]
add index, 4
cmp index, end
jl _loop
执行时间没有显著差异,因为生成的代码中最重要的部分没有显著差异。
我认为答案是垃圾收集器正在运行并且改变了您的计时。
免责声明:我不能看到OP代码的整个上下文,因为你没有发布一个可编译的例子;我假设您正在重新分配数组,而不是重用它。如果不是,那么这不是正确答案!
考虑以下代码:
using System;
using System.Diagnostics;
namespace Demo
{
internal class Program
{
private static void Main(string[] args)
{
var ar = new int[500000000];
test1(ar);
//ar = new int[500000000]; // Uncomment this line.
test2(ar);
}
private static void test1(int[] ar)
{
var sw = new Stopwatch();
sw.Start();
var length = ar.Length;
for (var i = 0; i < length; i++)
{
if (ar[i] == 0);
}
sw.Stop();
Console.WriteLine("test1 took " + sw.Elapsed);
}
private static void test2(int[] ar)
{
var sw = new Stopwatch();
sw.Start();
for (var i = 0; i < ar.Length; i++)
{
if (ar[i] == 0);
}
sw.Stop();
Console.WriteLine("test2 took " + sw.Elapsed);
}
}
}
在我的系统上它打印:
test1 took 00:00:00.6643788
test2 took 00:00:00.3516378
如果取消注释标记为// Uncomment this line.
的行,则计时更改为:
test1 took 00:00:00.6615819
test2 took 00:00:00.6806489
这是因为GC收集了之前的数组。
[EDIT]为了避免JIT启动成本,我将整个测试放入一个循环中:
for (int i = 0; i < 8; ++i)
{
test1(ar);
ar = new int[500000000]; // Uncomment this line.
test2(ar);
}
然后我的第二个数组分配注释掉的结果是:
test1 took 00:00:00.6437912
test2 took 00:00:00.3534027
test1 took 00:00:00.3401437
test2 took 00:00:00.3486296
test1 took 00:00:00.3470775
test2 took 00:00:00.3675475
test1 took 00:00:00.3501221
test2 took 00:00:00.3549338
test1 took 00:00:00.3427057
test2 took 00:00:00.3574063
test1 took 00:00:00.3566458
test2 took 00:00:00.3462722
test1 took 00:00:00.3430952
test2 took 00:00:00.3464017
test1 took 00:00:00.3449196
test2 took 00:00:00.3438316
在启用第二个数组分配时:
test1 took 00:00:00.6572665
test2 took 00:00:00.6565778
test1 took 00:00:00.3576911
test2 took 00:00:00.6910897
test1 took 00:00:00.3464013
test2 took 00:00:00.6638542
test1 took 00:00:00.3548638
test2 took 00:00:00.6897472
test1 took 00:00:00.4464020
test2 took 00:00:00.7739877
test1 took 00:00:00.3835624
test2 took 00:00:00.8432918
test1 took 00:00:00.3496910
test2 took 00:00:00.6471341
test1 took 00:00:00.3486505
test2 took 00:00:00.6527160
注意,由于GC, test2始终需要更长的时间。
不幸的是,GC使计时结果变得毫无意义。
例如,如果我将测试代码更改为:for (int i = 0; i < 8; ++i)
{
var ar = new int[500000000];
GC.Collect();
test1(ar);
//ar = new int[500000000]; // Uncomment this line.
test2(ar);
}
去掉注释后得到:
test1 took 00:00:00.6354278
test2 took 00:00:00.3464486
test1 took 00:00:00.6672933
test2 took 00:00:00.3413958
test1 took 00:00:00.6724916
test2 took 00:00:00.3530412
test1 took 00:00:00.6606178
test2 took 00:00:00.3413083
test1 took 00:00:00.6439316
test2 took 00:00:00.3404499
test1 took 00:00:00.6559153
test2 took 00:00:00.3413563
test1 took 00:00:00.6955377
test2 took 00:00:00.3364670
test1 took 00:00:00.6580798
test2 took 00:00:00.3378203
并且不加注释:
test1 took 00:00:00.6340203
test2 took 00:00:00.6276153
test1 took 00:00:00.6813719
test2 took 00:00:00.6264782
test1 took 00:00:00.6927222
test2 took 00:00:00.6269447
test1 took 00:00:00.7010559
test2 took 00:00:00.6262000
test1 took 00:00:00.6975080
test2 took 00:00:00.6457846
test1 took 00:00:00.6796235
test2 took 00:00:00.6341214
test1 took 00:00:00.6823508
test2 took 00:00:00.6455403
test1 took 00:00:00.6856985
test2 took 00:00:00.6430923
我认为这个测试的寓意是:与其他代码相比,这个特定测试的GC开销如此之大,以至于它完全扭曲了计时结果,并且它们不能被信任为有任何意义。
你正在调用第二个属性,所以它会更慢ar.Length