I've got a Cosmos DB SQL API with several large collections (over 500k documents) and I need to expose OData endpoints for them. I found a Microsoft blog on the topic and initially had a lot of success when I tested against a much smaller collection (~1,000 documents). Once I connected to the large collections, I found that the OData server queries over the in-memory collection retrieved from Cosmos, not directly against Cosmos itself.
For example, if I want ?$top=10
and I've got 850k documents, it pulls 850k into memory then the OData server plucks out the top 10. If I immediately request ?$skip=10&$top=10
to get the next page, it'll fetch all 850k again then pass it off to the OData server to apply $skip
and $top
. This has dismal performance implications (~5 minutes for a collection that size on my dev machine).
I tried adding logic so my OData controller passed paging information along to my Cosmos data layer which then used OFFSET
and LIMIT
in the query. The result was the data layer fetching the page in question but then the OData server applied paging logic to the already-paged results.
I need to enforce pagination between Cosmos and my OData server, not just between the OData server and my end users. I'm almost to the point where I'm going to cut out the OData package entirely and roll my own logic for parsing OData query strings.