I'm working on an application where I need to get large amount of data from a database, do some data manipulation and then insert the data in another database.
However, I am struggling to find the best way to check if a record from the source db is already present in the destination db. If it is not present, I need to add a new record. If it is present, I need to update the existing record.
My code looks something like this (a simplified version):
List<Data> data_from_db = new();
var existingData = await _context.Data.ToListAsync();
var sourceData = await _dbAccess.LoadData<Data, dynamic>(storedProcedure: "sp_ApiGetAllData", new { }, "Db");
data_from_db = sourceData.ToList();
//loop through data. If the datapoint is already present in db, it will be updated. If not, it will be added as a new datapoint.
foreach (var datapoint in data_from_db)
{
//check if price is already present in db.
var existingDatapoint = existingData.Find(x => x.ItemId== datapoint.ItemId);
if (existingDatapoint != null)
{
//Update datapoint
_context.Entry(existingDatapoint).State = EntityState.Modified;
processedIds.Add(existingDatapoint.Id);
}
else
{
//Create new datapoint
_context.Data.Add(newDatapoint);
}
}
await _context.SaveChangesAsync();
This works fine. However, when the DB has +400K rows, the process gets painfully slow. Specifically, it is the "Find" function which takes a lot of time. And of course this makes sense as it is going to do 400k * 400k searches in the list.
Is there a better way to handle this issue?
Update: The application has to do some complicated price calculations (which is why i simplified it in my original post). But to sum it up: I get prices, discount-info and min. amount from the source db, calculate the pricing, then insert the calculated prices in to the destination db.
The tricky part comes when i want to check if the price-info is already present, because a individual price is a combination of Itemnumber, discount group and min. amount.
So in reality the find statement looks like this:
var existingPrice = existingPrices.Find(x => x.ItemNumber == priceEntry.ItemNumber && x.DiscountGroupId == priceEntry.DiscountGroupId && x.MinAmount == priceEntry.MinAmount);
None of the above 3 paramenters is enough by it self, to identify a price. An item can have alot of different prices base on discount groups, but each discountgroup can also several prices base on how many of a given product is ordered (min amount).
SELECT * FROM TableNameHere WHERE ItemId = x;
which shouldn't have a problem with the read. Convert that to the dialect of your DBMS, make it as a stored procedure, and then run it using something like ADO.NET or Dapper maybe, passing inx
as a parameter..Find()
is likely an O(n) operation as it has to iterate through the entire table, and could be loading it into memory as well (so maybe profile that as well to see)Dictionary
fromexistingData
. (var existingDataMap = existingData.ToDictionary(d => d.ItemId);
outside the loop. Test for existence using theDictionary
inside the loop:if (existingDataMap.TryGetValue(datapoint.ItemId, out var existingDatapoint))
ValueTuple
of the three values as the key. Showing your actual code can be important if you want valid answers :)