以给定的概率从列表中选择一个项目的最有效方法
本文关键字:一个 项目 方法 有效 选择 概率 列表 | 更新日期: 2023-09-27 18:09:02
我正试图解决一个相当简单的问题,我有一个项目列表,其中找到项目的概率也取决于项目本身(我想象在干草堆中找到一把铲子比找到一根针更容易)。
我想要一个方法,考虑到找到它们每一个的概率,随机返回这些项中的一个。
所以项目可以这样列出:
A - 100
B - 50
C - 10
数字表示找到物品的难易程度,数值越大越容易找到。
运行下面的方法10000次,结果发现物品的数量如下:
A - 6249 (100 / 160 = 0,625)
B - 3139 (50 / 160 = 0,3125)
C - 612 (10 / 160 = 0,0625)
这基本上证明了下面的代码是有效的。
所以现在我的问题是,考虑到列表本身可以包含数千个项目,如何改进这一点。目前,该方法将在最坏情况下对列表中的每个项运行至少一次,即O(n)。
可以写入LINQ/LAMBDA语句,以便SQL服务器可以处理它,而不是将所有项都提升到c#中吗?
public long GetRandomItem()
{
var allItems = _db.AllItems
.Where(x => x.CanBeFound == true)
.OrderByDescending(x => x.Rarity)
.Select(x => new
{
x.Id, // id of item
x.Rarity, // rarity between 1 and 100
}).ToList();
int totalRarity = allItems.Sum(x => x.Rarity);
var random = new Random(DateTime.Now.Millisecond);
var randomNumber = random.NextDouble() * totalRarity;
double totalSoFar = 0;
long chosenId = -1;
foreach (var i in allItems)
{
totalSoFar += i.Rarity;
if (totalSoFar > randomNumber)
{
chosenId = i.Id;
break;
}
}
return chosenId;
}
----- EDIT ------
将LINQ重新制作为只对数据库进行两次查询的版本,并且不需要循环。不完全确定这是否更好,因为这将迫使SQL执行更多的连接和数据选择。
public long GetRandomGamePiece()
{
int totalRarity = _db.GamePieceTemplates.Sum(x => x.Rarity);
var randomNumber = 1 + Math.Round(_Random.NextDouble() * (totalRarity - 1));
var randomItem = _db.GamePieceTemplates
.Where(x => x.CanBeFound == true)
.OrderBy(x => x.Id)
.Select((x) => new
{
x.Id, // id of item
x.Rarity, // rarity between 1 and 100
// +1 so that it dosent overlap previous level
MinRarity = _db.GamePieceTemplates.Where(y => y.Id <= x.Id).Sum(y => y.Rarity) - x.Rarity + 1,
MaxRarity = _db.GamePieceTemplates.Where(y => y.Id <= x.Id).Sum(y => y.Rarity)
})
.Single(x => x.MinRarity <= randomNumber && x.MaxRarity >= randomNumber);
long chosenId = -1;
return randomItem.Id;
}
这被转换成这个TSQL:
SELECT TOP (2)
[Project6].[Rarity] AS [Rarity],
[Project6].[Id] AS [Id],
[Project6].[C1] AS [C1],
[Project6].[C2] AS [C2]
FROM ( SELECT
[Project5].[Id] AS [Id],
[Project5].[Rarity] AS [Rarity],
([Project5].[C1] - [Project5].[Rarity]) + 1 AS [C1],
[Project5].[C2] AS [C2]
FROM ( SELECT
[Project4].[Id] AS [Id],
[Project4].[Rarity] AS [Rarity],
[Project4].[C1] AS [C1],
(SELECT
SUM([Extent5].[Rarity]) AS [A1]
FROM [dbo].[GamePieceTemplates] AS [Extent5]
WHERE [Extent5].[Id] <= [Project4].[Id]) AS [C2]
FROM ( SELECT
[Project3].[Id] AS [Id],
[Project3].[Rarity] AS [Rarity],
(SELECT
SUM([Extent4].[Rarity]) AS [A1]
FROM [dbo].[GamePieceTemplates] AS [Extent4]
WHERE [Extent4].[Id] <= [Project3].[Id]) AS [C1]
FROM ( SELECT
[Project2].[Id] AS [Id],
[Project2].[Rarity] AS [Rarity]
FROM ( SELECT
[Project1].[Id] AS [Id],
[Project1].[Rarity] AS [Rarity],
[Project1].[C1] AS [C1],
(SELECT
SUM([Extent3].[Rarity]) AS [A1]
FROM [dbo].[GamePieceTemplates] AS [Extent3]
WHERE [Extent3].[Id] <= [Project1].[Id]) AS [C2]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Rarity] AS [Rarity],
(SELECT
SUM([Extent2].[Rarity]) AS [A1]
FROM [dbo].[GamePieceTemplates] AS [Extent2]
WHERE [Extent2].[Id] <= [Extent1].[Id]) AS [C1]
FROM [dbo].[GamePieceTemplates] AS [Extent1]
WHERE 1 = [Extent1].[CanBeFound]
) AS [Project1]
) AS [Project2]
WHERE ( CAST( ([Project2].[C1] - [Project2].[Rarity]) + 1 AS float) <= 130) AND ( CAST( [Project2].[C2] AS float) >= 130)
) AS [Project3]
) AS [Project4]
) AS [Project5]
) AS [Project6]
ORDER BY [Project6].[Id] ASC
如果可以在数据中添加新列,则可以在SQL中执行此操作。这个新栏目将包括迄今为止"可能性"的总和。按列排序,您将看到样例值如下所示:
Id AccumP
A 100
B 150
C 160
如果你保持这个属性,你可以找到一个加权随机项目:
- 查找
AccumP
订购的最后一件商品。 - 从0到最后一个项目的
AccumP
之间选择一个随机数。 - 找到
AccumP
值大于随机AccumP
但最接近它的项目。这是加权随机结果。
如果你索引AccumP
,这应该很快!
我的方法是根据选项的总数做一个简单的计算。不需要循环-随机值本身决定结果。
伪代码将是:
int maxValueA = 100;
int maxValueB = 50;
int maxValueC = 10;
int total = maxValueA + maxValueB + maxValueC;
int x = random number between zero and total;
if (x <= maxValueA) return A;
else if (x <= maxValueA + maxValueB) return B;
else return C;
因此,如果您有一个有序的结果列表,您真正需要做的就是在结果集中选择与随机数对应的项。
它的实际用途是根据ID出现的概率%来填充数组(还是伪代码):
int[] IDsList = { A, A, A, A, B, B, C }; // ID's populated based on % chance being chosen
x = random int between 0 and IDsList.Count;
return IDsList[x];
另一种方法-创建一个列表,每个数字重复的次数。
10出现了10次,50出现了50次-然后得到1和列表项数量之间的随机数,这给出了一个索引,然后使用该索引获取列表项。
void Main()
{
var items = new int [] {100,50,10};
var dict = new Dictionary<int,int>();
var test = Enumerable.Range(1,10000);
foreach (var t in test)
{
var result = SelectItem(items);
if (!dict.ContainsKey(result))
{
dict.Add(result,0);
}
dict[result]++;
}
foreach (var d in dict.Keys)
{
Console.WriteLine("{0} - {1}",d,dict[d]);
}
}
private static Random rand = new Random(DateTime.Now.Millisecond);
private int SelectItem(IEnumerable<int> numbers)
{
var num = rand.Next(1,numbers.Sum());
var list = numbers.OrderBy(n=>n)
.SelectMany(n=> Enumerable.Range(1,n).Select(rr=>n)).ToList();
//list.GroupBy(x=>x).Dump();
//Console.WriteLine("Rand num = {0}, selected num = {1}",num,ret);
return list[num-1];;
}