Is there documentation for what yolo-v5's output is?
We (or I) are trying to use YOLOv5 (for now) to detect faces in real time from an HDMI input. Cartoon faces, video game faces, natural faces - all are welcome for our application. I've followed the instructions to retrain it, but, the results seem nonsensical.
I've red the embedding/export instructions - but - don't understand the details of how to interpret the output tensor.
TLDR; I'm trying to interpret the yolo-v5 output but the examples don't seem to illustrate what's going on to me.