为匹配两个表的id寻找成本更低的解决方案
本文关键字:寻找 id 解决方案 两个 | 更新日期: 2023-09-27 18:05:23
我正在构建的应用程序允许用户上传包含多行和多列数据的.csv文件。每行包含一个惟一的varchar Id。这将最终填充现有SQL表中存在匹配Id的字段。
步骤1:我使用LinqToCsv和foreach循环将.csv完全导入临时表。
第2步:然后我有另一个foreach循环,我试图将行从临时表循环到现有表,只有,其中id匹配。
控制器完成此过程的动作:
[HttpPost]
public ActionResult UploadValidationTable(HttpPostedFileBase csvFile)
{
var inputFileDescription = new CsvFileDescription
{
SeparatorChar = ',',
FirstLineHasColumnNames = true
};
var cc = new CsvContext();
var filePath = uploadFile(csvFile.InputStream);
var model = cc.Read<Credit>(filePath, inputFileDescription);
try
{
var entity = new TestEntities();
var tc = new TemporaryCsvUpload();
foreach (var item in model)
{
tc.Id = item.Id;
tc.CreditInvoiceAmount = item.CreditInvoiceAmount;
tc.CreditInvoiceDate = item.CreditInvoiceDate;
tc.CreditInvoiceNumber = item.CreditInvoiceNumber;
tc.CreditDeniedDate = item.CreditDeniedDate;
tc.CreditDeniedReasonId = item.CreditDeniedReasonId;
tc.CreditDeniedNotes = item.CreditDeniedNotes;
entity.TemporaryCsvUploads.Add(tc);
}
var idMatches = entity.PreexistingTable.Where(x => x.Id == tc.Id);
foreach (var number in idMatches)
{
number.CreditInvoiceDate = tc.CreditInvoiceDate;
number.CreditInvoiceNumber = tc.CreditInvoiceNumber;
number.CreditInvoiceAmount = tc.CreditInvoiceAmount;
number.CreditDeniedDate = tc.CreditDeniedDate;
number.CreditDeniedReasonId = tc.CreditDeniedReasonId;
number.CreditDeniedNotes = tc.CreditDeniedNotes;
}
entity.SaveChanges();
entity.Database.ExecuteSqlCommand("TRUNCATE TABLE TemporaryCsvUpload");
TempData["Success"] = "Updated Successfully";
}
catch (LINQtoCSVException)
{
TempData["Error"] = "Upload Error: Ensure you have the correct header fields and that the file is of .csv format.";
}
return View("Upload");
}
上面代码中的问题是tc
在第一个循环中,但是匹配是在循环之后与var idMatches = entity.PreexistingTable.Where(x => x.Id == tc.Id);
定义的,所以我只得到第一个循环的最后一项。
如果我嵌套第二个循环,那么它就会变慢(在10分钟后停止),因为.csv中大约有1000行,而先前存在的表中有7000行。
找一个更好的方法来做这件事让我很困扰。假设临时表甚至不是来自.csv,只考虑从表1中填充该行id匹配的表2中的行的最有效方法。谢谢你的帮助!在编写代码时,大部分工作都是由应用程序完成的,而SQL Server可以更有效地完成这些工作。您正在对数据库进行数百次不必要的往返调用。当您大量导入数据时,您需要这样的解决方案:
- 批量导入数据。有关使用EF批量导入效率的有用指导,请参阅此答案。
- 连接并更新目标表。
处理导入应该只需要一个批量更新查询:
update PT set
CreditInvoiceDate = CSV.CreditInvoiceDate
,CreditInvoiceNumber = CSV.CreditInvoiceNumber
,CreditInvoiceAmount = CSV.CreditInvoiceAmount
,CreditDeniedDate = CSV.CreditDeniedDate
,CreditDeniedReasonId = CSV.CreditDeniedReasonId
,CreditDeniedNotes = CSV.CreditDeniedNotes
from PreexistingTable PT
join TemporaryCsvUploads CSV on PT.Id = CSV.Id
该查询将替换整个嵌套循环,并在单个数据库调用中应用相同的更新。只要你的表索引正确,这应该运行得非常快。
将CSV记录保存到与主表具有相同字段的第二个表中后,在sqlserver
中执行以下步骤create proc [dbo].[excel_updation]
as
set xact_abort on
begin transaction
-- First update records
update first_table
set [ExamDate] = source.[ExamDate],
[marks] = source.[marks],
[result] = source.[result],
[dob] = source.[dob],
[spdate] = source.[spdate],
[agentName] = source.[agentName],
[companycode] = source.[companycode],
[dp] = source.[dp],
[state] = source.[state],
[district] = source.[district],
[phone] = source.[phone],
[examcentre] = source.[examcentre],
[examtime] = source.[examtime],
[dateGiven] = source.[dateGiven],
[smName] = source.[smName],
[smNo] = source.[smNo],
[bmName] = source.[bmName],
[bmNo] = source.[bmNo]
from tbUser
inner join second_table source
on tbUser.[UserId] = source.[UserId]
-- And then insert
insert into first_table (exprdate, marks, result, dob, spdate, agentName, companycode, dp, state, district, phone, examcentre, examtime, dateGiven, smName, smNo, bmName, bmNo)
select [ExamDate], [marks], [result], [dob], [spdate], [agentName], [companycode], [dp], [state], [district], [phone], [examcentre], [examtime], [dateGiven], [smName], [smNo], [bmName], [bmNo]
from second_table source
where not exists
(
select *
from first_table
where first_table.[UserId] = source.[UserId]
)
commit transaction
delete from second_table
这段代码的条件是两个表必须有相同的id匹配数据。如果id在两个表中都匹配,则该特定行的数据将在第一个表中更新。
只要匹配的概率很高,就应该尝试更新CSV中的每一行,条件是id匹配,
UPDATE table SET ... WHERE id = @id