I want to use a pretrained Faster-RCNN to extract RoI Features for the detected objects in images.
I couldn't find an easy way of doing that so I started reading the source code. The solution I managed to come up with is the following:
import torch
from torchvision.models.detection import fasterrcnn_resnet50_fpn
model = fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()
images # torch.Tensor of shape (batch_size, channels, height, width)
features = model.backbone(images)
output = model(images)
list_of_boxes = [out["boxes"] for out in output]
list_of_image_sizes = [(height, width) for _ in range(batch_size)]
features_of_predicted_boxes = model.roi_heads.box_roi_pool(features, list_of_boxes, list_of_image_sizes)
features_of_predicted_boxes = model.roi_heads.box_head(features_of_predicted_boxes)
So my questions to anyone who understands Faster-RCNN better than me are the following:
- Is the solution that I came up with doing what I want it to do?
- Is there an easier way to do it?