为长时间运行的IO绑定进程管理内存
本文关键字:绑定 进程管理 内存 IO 长时间 运行 | 更新日期: 2023-09-27 18:12:33
我有一个方法,在记录上运行,涉及第三方API调用,所以它不是CPU绑定,在后台运行缓慢,但因为数据集是如此之大,它导致我的内存问题。我不能一次加载整个列表(这会导致异常),所以我分批地对它进行分页。这是有效的,但是每个批处理都增加了ram使用,因为我假设它是由上下文跟踪的。
我认为要解决这个问题,我可以在完成处理后分离每个批次。我试过了:
using (var db = new PlaceDBContext())
{
int count = 0;
while(count < total)
{
var toCheck = db.Companies
.Include(x => x.CompaniesHouseRecords)
.Where(x => x.CheckedCompaniesHouse == false)
.OrderBy(x => x.ID)
.Skip(count)
.Take(1000)
.ToList();
foreach (var company in toCheck)
{
// do all the stuff that needs to be done
// removed for brevity but it makes API calls
// and creates/updates records
company.CheckedCompaniesHouse = true;
db.SaveChanges();
count++;
}
// attemmpted to detach to free up ram but doesn't work
db.Entry(toCheck).State = EntityState.Detached;
}
}
在批处理完成后导致此异常:
实体类型List ' 1不是当前模型的一部分上下文
我猜这是因为我把它枚举到一个列表中,它实际上是在跟踪列表中的记录。
什么是正确的方式来分离记录和嵌套的记录,使ram不填满?我是不是应该换一种方式来处理这件事?
编辑:我也试着分离每个公司的记录,因为我循环它,但ram仍然上升
foreach (var company in toCheck)
{
// do all the stuff that needs to be done
// removed for brevity but it makes API calls
// and creates/updates records
company.CheckedCompaniesHouse = true;
db.SaveChanges();
count++;
foreach(var chr in company.CompaniesHouseRecords.ToList())
{
db.Entry(chr).State = EntityState.Detached;
}
db.Entry(company).State = EntityState.Detached;
}
上下文意味着是短暂的和廉价的创建,如果你担心它占用内存,这是一个问题,因为你的while是长时间运行也许你可以尝试这个:
int count = 0;
while(count < total)
{
using (var db = new PlaceDBContext()) // create a new context each time
{
var toCheck = db.Companies
.Include(x => x.CompaniesHouseRecords)
.Where(x => x.CheckedCompaniesHouse == false)
.OrderBy(x => x.ID)
.Skip(count)
.Take(1000)
.ToList();
foreach (var company in toCheck)
{
// do all the stuff that needs to be done
// removed for brevity but it makes API calls
// and creates/updates records
company.CheckedCompaniesHouse = true;
db.SaveChanges();
count++;
}
}
}
加载时,在var toCheck = ...
中使用.AsNoTracking()
在你的foreach
循环中,保存每一行的id,然后使用第二个db上下文来加载这些公司,除非没有Include(),它会拉入许多额外的链接对象。
然后对循环中的那些执行更新,但是在循环之后只发出单个db. savechanges () ,否则您将为每一行进行db往返,每次为1000
using (var db = new PlaceDBContext())
{
int count = 0;
while(count < total)
{
var toCheck = db.Companies
.AsNoTracking()
.Include(x => x.CompaniesHouseRecords)
.Where(x => x.CheckedCompaniesHouse == false)
.OrderBy(x => x.ID)
.Skip(count)
.Take(1000)
.ToList();
foreach (var company in toCheck)
{
int tempID = company.ID // Use whatever field is the id
// do all the stuff that needs to be done
// removed for brevity but it makes API calls
// and creates/updates records
var companyUpdate = db2.Companies.Where(c => c.ID == tempid).FirstOrDefault();
companyUpdate.CheckedCompaniesHouse = true;
count++;
}
db2.SaveChanges();
}
}