C#慢速搜索

本文关键字：搜索 | 更新日期: 2023-09-27 18:28:14

我制作了一个简单的数组，其中包含2000000 int用于保存所有rgb，而第二个数组则包含2000000 INT用于检测到rgb的次数。然后我让它循环通过一张图片的所有6000000字节，就像这样：

        uint[] colors = new uint[rawImageBytes.Length / 3];
        int[] hits    = new int[rawImageBytes.Length / 3];
        int colorAmount     = 0;
        int totalRepeats    = 0;
        int lastTime        = Environment.TickCount;
        int lastCount       = 0;
        uint currentColor   = 0;
        bool found;
        for (int i = 0; i < rawImageBytes.Length - 3; i += 3)
        {
            if (Environment.TickCount - lastTime > 10000)
            {
                setStatus(((i - lastCount)/10) + " checks per second");
                lastTime = Environment.TickCount;
                lastCount = i;
            }

            currentColor = (uint)((rawImageBytes[i] << 0) | (rawImageBytes[i + 1] << 8) | (rawImageBytes[i + 2] << 16));
            //set it to false to see if pattern exists
            found = false;
            //check all patterns
            for (int k = 0; k < colorAmount; k++)
            {
                //if pattern exists
                if (colors[k] == currentColor)
                {
                    //dont add it and increase the hit instead
                    found = true;
                    hits[k]++;
                }
            }
            //if pattern does not exist, set it
            if (found == false)
            {
                colors[colorAmount] = currentColor;
                hits[colorAmount] = 0;
                colorAmount++;
            }
        }

我的日志显示，由于搜索范围的增加，的速度明显减慢

每秒5724次检查
每秒5847次检查
每秒5959次检查
每秒6044次检查
每秒6318次检查
每秒7096次检查
每秒8530次检查
每秒10680次检查
每秒16233次检查
每秒11469次检查

如何提高搜索效率，使其不需要20分钟？

C#慢速搜索

我看到的第一个问题是，您的hits数组非常大。如果你假设一种颜色可能被多次命中，那么你的命中数数组必须比你的颜色数组短。

我看到的第二个问题是，在颜色数组中找到颜色后，您不会停止迭代。您应该放置break在找到=true之后语句。

为什么你不喜欢Dictionary<uint，int>为您的点击量集合键入？你的颜色应该是一个键，点击次数应该是一值：

    uint[] colors = new uint[rawImageBytes.Length / 3];
    Dictionary<uint,int> hits    = new Dictionary<uint,int>();
    int colorAmount     = 0;
    int totalRepeats    = 0;
    int lastTime        = Environment.TickCount;
    int lastCount       = 0;
    uint currentColor   = 0;

    for (int i = 0; i < rawImageBytes.Length - 3; i += 3)
    {
        if (Environment.TickCount - lastTime > 10000)
        {
            setStatus(((i - lastCount)/10) + " checks per second");
            lastTime = Environment.TickCount;
            lastCount = i;
        }

        currentColor = (uint)((rawImageBytes[i] << 0) | (rawImageBytes[i + 1] << 8) | (rawImageBytes[i + 2] << 16));

        if (hits.ContainsKey(currentColor))
        {
            hits[currentColor]++;
        }
        else
        {
            hits.Add(currentColor,1);
        }
    }

并尝试为编译器启用优化指令。

我认为这里没有应用一个原则，它会提高性能。据我所见，您的数组没有排序，搜索是线性的。因此，对于第一个数组中的每一行，都会搜索第二个数组的所有行。

以下是一些需要测试的内容：-对第二个数组（在其上执行搜索）进行排序-Array.find（）而不是自己循环

您可以尝试使用颜色计数对列表（而不是两个数组），并根据颜色索引对其进行排序。然后使用二进制搜索来查找重复的颜色。我怀疑这会比使用字典更快，但这可能也值得一试（？）。

排序和二进制搜索：http://msdn.microsoft.com/en-us/library/b0zbh7b6.aspx#Y700

词典：http://msdn.microsoft.com/en-us/library/xfhwa508.aspx

这看起来是一个很好的并行化用例与其自己做所有的计数等事情，不如让LINQ来处理

首先，让我们将迭代器逻辑放入它自己的方法中，这样您就可以单独调整它：

IEnumerable<uint> GetColors(byte[] rawImageBytes) 
{ 
    int lastTime        = Environment.TickCount;
    for (int i = 0; i < rawImageBytes.Length - 3; i += 3)
    {
        if (Environment.TickCount - lastTime > 10000)
        {
           setStatus(((i - lastCount)/10) + " checks per second");
           lastTime = Environment.TickCount;
           lastCount = i;
        }

        currentColor = (uint)((rawImageBytes[i] << 0) | (rawImageBytes[i + 1] << 8) | (rawImageBytes[i + 2] << 16));
        yield return currentColor; 
   }
}

现在让我们使用PLINQ:稍微重写一下您的方法

var results = (from color in GetColors(rawImageBytes).AsParallel()
              group by color into g
              select new { Color = g.Key,  Count = g.Count()}).ToList(); 
var uniqueColours = results.Count(); 
var totalHits = results.Select(r=>r.Count()).Sum();

（在没有编译器的情况下编写，所以您可能需要对其进行调整）

看看进展如何。