将数组元素的出现次数置于限制之间的最快方式

本文关键字：之间于限制方式数组元素 | 更新日期: 2023-09-27 18:12:25

例如…

我有一个整数数组，它由1到1000之间的随机值初始化
数组有1M个元素(它可能会有更多的元素，但这只是一个例子)
每个元素的出现次数必须在10到1010之间

调整这个数组的元素，使它们符合上述标准的最快方法是什么?

我的第一个解决方案只是太慢，如果最大次数出现接近数组。长度(1M)/valuesSpan (1000)

我尝试了这样的东西(这只是为了对齐出现的最大值，下限的解决方案几乎是相同的):

Int64[] DistinctArrayElements = distinctArrayElements;
Dictionary<Int64, Int32> occurrences = new Dictionary<Int64, Int32>();
foreach (Int64 DistinctElement in DistinctArrayElements)
{
    occurrences.Add(DistinctElement, 0);
}
foreach (Int64 ArrayElement in Arr)
{
    occurrences[ArrayElement] += 1;
}
//I know this initialization can be done more nicely, so don't bother with this.
for (int j = 0; j < Arr.Length; j++)
{
    if (occurrences[Arr[j]] > upperNoOfOccurrences)
    {
        for (int i = 0; i < Arr.Length; i++)        
        {
            if (occurrences[Arr[i]] < upperNoOfOccurrences)
            {
                Arr[j] = Arr[i];
                occurrences[Arr[i]] += 1;
                occurrences[Arr[j]] -= 1;
            }
        }
    }
}

将数组元素的出现次数置于限制之间的最快方式

我不太明白你想做什么。然而，处理这个数组这么多次似乎是一种浪费。您可以只使用一个循环(当用完"空闲"唯一值时，还可以进行一点点前向窥视)。下面的代码当然不是我写的最好的代码，但我认为它可以解决你的问题。

HashSet<long> forbidden = new HashSet<long>(); // maximum size of 1000, contains values that exceeded the limit
Queue<long> remaining = new Queue<long>(1000); // stores found unique values within the limit in a queue, that will be used if we bounce into the limit
Dictionary<long, int> frequencies = new Dictionary<long, int>(1000);
int lastPeekIndex = 0;
for (int i = 0; i < Arr.Length; i++) {
  if (!frequencies.ContainsKey(Arr[i])) {
    frequencies[Arr[i]] = 0;
    remaining.Add(Arr[i]);
  }
  if (frequencies[Arr[i]] == upperLimit) {
    if (!forbidden.Contains(Arr[i])) forbidden.Add(Arr[i]);
    var next = Int64.MinValue;
    try {
      next = remaining.Dequeue();
      while (forbidden.Contains(next)) next = remaining.Dequeue();
    } catch (InvalidOperationException) { // Arrr! we have not yet observed enough unique values
      for (int j = Math.Max(i, lastPeekIndex) + 1; j < Arr.Length; j++)
        if (!frequencies.ContainsKey(Arr[j])) {
          frequencies[Arr[j]] = 0;
          next = Arr[j];
          lastPeekIndex = j;
        }
    }
    Arr[i] = next;
    frequencies[next]++;
    if (frequencies[next] < upperLimit) remaining.Enqueue(next);
  } else frequencies[Arr[i]]++;
}

注意，这不会检查下限，因为您也没有检查这个。我认为你必须关心那些在第二遍中不经常出现的值。您可以在第一次传递后将它们放入另一个队列中，然后一次又一次地遍历数组，直到队列为空(甚至可能在第二次传递中需要少于一次完整的迭代)。

我会对你的字典进行排序，使出现较少的数字排在前面。这样你就不用每次都找一个合适的数字，只要把它替换成出现次数少的那个。下面是一个伪代码来解释:

struct dict {
    key, value
}
linkedList<dict> occurrences;
initialize occurrences
sort it (smallest values first)
// start from the one with greatest number of occurrences
n = occurrences.last;
// keep going until occurrences of n is greater than upperNoOfOccurrences
while n.value.value > upperNoOfOccurrences and didn't reach first element
    repeat = true
    do:
        // required occurrences to subtract to be within the limit
        required = upperNoOfOccurrences - n.value.value
        // maximum occurrences we can add to the first
        maxAllowed = upperNoOfOccurrences - occurrences.first.value.value
        // if we can do that
        if required < maxAllowed:
            occurrences.first.value.value += required
            n.value.value -= required
            repeat = false
        else:    // n.value.value is still greater than upperNoOfOccurrences
            occurrences.first.value.value += maxAllowed 
            n.value.value -= maxAllowed 
            repeat = true
        end if
        // keep occurrences sorted
        newPos = occurrences.first.next
        while occurrences.first.value > newPos.value.value:
            newPos = newPos.next
        move occurrences.first before newPos
    while repeat
end while
now rebuild your array with occurrences. it will
be sorted  but it doesn't matter does it? ;)

这里有一个简单而精确的方法，可以统一采样符合您的标准的数字集。

设M =不同值的个数;N =数组元素数;L =每个值的实例数的下限;U =计数上限;D = u-l。例如，M=1000, N=1000000, L=10, U=1010, D=1000。
Shuffle S通过Fisher-Yates Shuffle。(见链接)
为i M,集合T[我]= L + S[我 D] +[我 D + 1] +……+ S[我* D + D 1]。
创建数组V，大小为n，用第0个值的T[0]个实例填充，以此类推，即。有T[i]个i的实例值，对应每个i。因为S包含N个1，所以V将被完全准确地填充。
Shuffle V通过Fisher-Yates Shuffle。则数组V满足原条件。

注意步骤2-5是O(MD)，而6-7是O(N+M)，后者是可以达到的最好的，而前者可能同样，因为MD在你的问题陈述中是O(N)。