How does batch inference work in pytorch c++

I am looking for ways to batch inference instead of inference one request at a time in c++

For example, assuming each request in requests is a single prediction containing n feature, each feat

for (const auto& request : requests) {
    std::vector<torch::Tensor> feature_tensors = convert_request_to_tensor(request);
    std::vector<torch::jit::IValue> inputs;
    for (const auto& feature_tensor : feature_tensors) {
        inputs.push_back(feature_tensor);
    }

    torch::Tensor output = model.forward(inputs).toTensor();
 }

Now I want to batch inference, how would i do that? Can I simply stack them

std::vector<torch::Tensor> inputs = {input_1, input_2, ..., input_N};
torch::Tensor batch = torch::stack(inputs, 0);

but if you have multiple feature per request, ie .input_1 is std::vector<torch::Tensor>, how does that work?

And how would you know what is the optimal batch ? Is there a formula that you can derive from your cpu core num?

edited Oct 19 at 3:39

trialNerror

3,54310 silver badges23 bronze badges

asked Oct 18 at 15:14

progr

635 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Collectives™ on Stack Overflow

How does batch inference work in pytorch c++

0

Your Answer

Browse other questions tagged
c++
pytorch
libtorch
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged c++pytorchlibtorch or ask your own question.

Browse other questions tagged
c++
pytorch
libtorch
or ask your own question.