0

I'm working on an application where I need to get large amount of data from a database, do some data manipulation and then insert the data in another database.

However, I am struggling to find the best way to check if a record from the source db is already present in the destination db. If it is not present, I need to add a new record. If it is present, I need to update the existing record.

My code looks something like this (a simplified version):

List<Data> data_from_db = new();

var existingData = await _context.Data.ToListAsync();

var sourceData = await _dbAccess.LoadData<Data, dynamic>(storedProcedure: "sp_ApiGetAllData", new { }, "Db");

data_from_db = sourceData.ToList();

//loop through data. If the datapoint is already present in db, it will be updated. If not, it will be added as a new datapoint.
foreach (var datapoint in data_from_db)
{
    //check if price is already present in db.
    var existingDatapoint = existingData.Find(x => x.ItemId== datapoint.ItemId);

    if (existingDatapoint != null)
    {
        //Update datapoint

        _context.Entry(existingDatapoint).State = EntityState.Modified;
        
        processedIds.Add(existingDatapoint.Id);
    }

    else
    {
        //Create new datapoint
        _context.Data.Add(newDatapoint);
    }
}

await _context.SaveChangesAsync();

This works fine. However, when the DB has +400K rows, the process gets painfully slow. Specifically, it is the "Find" function which takes a lot of time. And of course this makes sense as it is going to do 400k * 400k searches in the list.

Is there a better way to handle this issue?

Update: The application has to do some complicated price calculations (which is why i simplified it in my original post). But to sum it up: I get prices, discount-info and min. amount from the source db, calculate the pricing, then insert the calculated prices in to the destination db.

The tricky part comes when i want to check if the price-info is already present, because a individual price is a combination of Itemnumber, discount group and min. amount.

So in reality the find statement looks like this:

var existingPrice = existingPrices.Find(x => x.ItemNumber == priceEntry.ItemNumber && x.DiscountGroupId == priceEntry.DiscountGroupId && x.MinAmount == priceEntry.MinAmount);

None of the above 3 paramenters is enough by it self, to identify a price. An item can have alot of different prices base on discount groups, but each discountgroup can also several prices base on how many of a given product is ordered (min amount).

7
  • I don't work with entity framework but the pure SQL would be SELECT * FROM TableNameHere WHERE ItemId = x; which shouldn't have a problem with the read. Convert that to the dialect of your DBMS, make it as a stored procedure, and then run it using something like ADO.NET or Dapper maybe, passing in x as a parameter. .Find() is likely an O(n) operation as it has to iterate through the entire table, and could be loading it into memory as well (so maybe profile that as well to see)
    – Narish
    Commented Apr 28, 2023 at 18:58
  • 1
    Use LINQ to create a Dictionary from existingData. (var existingDataMap = existingData.ToDictionary(d => d.ItemId); outside the loop. Test for existence using the Dictionary inside the loop: if (existingDataMap.TryGetValue(datapoint.ItemId, out var existingDatapoint))
    – NetMage
    Commented Apr 28, 2023 at 18:59
  • @NetMage Hm, thats a great idea. I simplyfied the code, in reality i check for 3 different parameters ind the Find function. Will the dictionary way still be a good solution?
    – Glacierdk
    Commented Apr 28, 2023 at 19:07
  • A Dictionary has O(1) lookups for keys and the linq to load this entire table into memory may be costly in terms of space. To also check for 3 things at once would defeat the purpose, because you are no longer checking for just the key, and any LINQ will have to iterate the table to find your specific row
    – Narish
    Commented Apr 28, 2023 at 19:27
  • 1
    You could use a ValueTuple of the three values as the key. Showing your actual code can be important if you want valid answers :)
    – NetMage
    Commented Apr 28, 2023 at 19:44

2 Answers 2

0

Using a ValueTuple, you can create a Dictionary to map the three identifiers of a price to the existing price in the database your are updating. The C# compiler automatically does equality for a ValueTuple based on the values in all of its items, including hash value. Then you can just lookup the existing price, or create a new one if it isn't found:

var existingPrices = await _context.Prices.ToListAsync();

var sourcePrices = await _dbAccess.LoadData<Prices, dynamic>(storedProcedure: "sp_ApiGetAllData", new { }, "Db");

var existingPriceMap = existingPrices.ToDictionary(p => (p.ItemNumber, p.DiscountGroupId, p.MinAmount));
//loop through data. If the datapoint is already present in db, it will be updated. If not, it will be added as a new datapoint.
foreach (var priceEntry in sourcePrices)
{
    //check if price is already present in db.
    if (existingPriceMap.TryGetValue((priceEntry.ItemNumber, priceEntry.DiscountGroupId, priceEntry.MinAmount), out var existingPrice))
    {
        //Update datapoint
        _context.Entry(existingPrice).State = EntityState.Modified;

        processedIds.Add(existingPrice.Id);
    }
    else
    {
        //Create new datapoint
        _context.Data.Add(newDatapoint);
    }
}

await _context.SaveChangesAsync();
0

I dont know if the Compiler optimises anything here, but my idea for the find operation would be to assure that existing_data is sorted by id and then perform a binary search

1
  • Have you got some data to back your answer? It's better to have strong indication that a solution work before posting it as an answer.
    – XouDo
    Commented May 2, 2023 at 9:45

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.