I'm trying to use Ffmpeg for creating a hevc realtime stream from a Decklink input. The goal is high quality HDR stream usage with 10 bits. The Decklink SDI input is fed RGB 10 bits, which is well handled by ffmpeg with the decklink option -raw_format rgb10, which gets recognized by ffmpeg as 'gbrp10le'.
I have a Nvidia pascal-based card, which supports yuv444 10 bit (as 'yuv444p16le') and when when using '-c:v hevc_nvenc' the auto_scaler kicks in and converts to 'yuv444p16le', which I guess is the same conversion as giving '-pix_fmt yuv444p16le'.
This is working very well in 1920x1080 resolution, but in 4096x2160 resolution ffmpeg can't keep up realtime 24 or 25 fps, and I get input buffer overruns. The culprit seems to be the RGB->YUV conversion in ffmpeg swscale because;
- When piping the Decklink 4K RGB input with '-c:v copy' straight to /dev/null, there's is no problems with buffer underruns,
- And when feeding the Decklink YUV and giving '-raw_format yuv422p10’ (no YUV444 input for decklink seems available for decklink in ffmpeg) I get no underrun and everything works well in 4K. Even if I set '-pix_fmt yuv444p16le'.
Any ideas how I could accomplish a 4K hevc in NVENC with the 10-bit RGB signal from the Decklink? Is there a way to make NVENC accept and use the RGB data without first converting to YUV? Or is there maybe a way to convert gbrp10le->yuv444p16le with cuda or scale_npp filter? I have compiled ffmpeg with npp and cuda, but I cannot figure out if I can get it to work with RGB? Whenever I try to do '-vf "hwupload_cuda"', auto_scaler kicks in and tries to convert to yuv on the cpu, which again creates underruns.
Another thing I guess could help is if there was a way to make the swscale cpu filter(or if there is another suitable filter?) use multiple threads? Right now it seems to only use one thread at a time, maxing out at 99% on my Ryzen 3950x (3,5GHz, 32 threads).
Example ffmpeg output:
$ ffmpeg -loglevel verbose -f decklink -raw_format rgb10 -i "Blackmagic Card 1" -c:v hevc_nvenc -preset medium -profile:v main10 -cbr 1 -b:v 20M -f nut - > /dev/null
--
Stream #0:1: Video: r210, 1 reference frame, gbrp10le(progressive), 4096x2160, 6635520 kb/s, 25 tbr, 1000k tbn, 1000k tbc
--
[graph 0 input from stream 0:1 @ 0x4166180] w:4096 h:2160 pixfmt:gbrp10le tb:1/1000000 fr:25000/1000 sar:0/1
[auto_scaler_0 @ 0x4168480] w:iw h:ih flags:'bicubic' interl:0
[format @ 0x4166080] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_null_0' and the filter 'format'
[auto_scaler_0 @ 0x4168480] w:4096 h:2160 fmt:gbrp10le sar:0/1 -> w:4096 h:2160 fmt:yuv444p16le sar:0/1 flags:0x4
[hevc_nvenc @ 0x4139640] Loaded Nvenc version 11.0
--
Stream #0:0: Video: hevc (Rext), 1 reference frame (HEVC / 0x43564548), yuv444p16le(tv, progressive), 4096x2160 (0x0), q=2-31, 2000 kb/s, 25 fps, 51200 tbn
--
[decklink @ 0x40f0900] Decklink input buffer overrun!:02.52 bitrate= 30471.3kbits/s speed=0.627x