Background
Ideally, we'd like headless Chromium and post-processing to use a single service rather than having the post-processing steps have their own python service.
In T175853: [Spike 16hr] Investigate the ability of Python wrapped headless Chromium to render large books we found that it's best to interact with headless Chromium using the official JS library. Thus we'd like to investigate Node.js libraries that allow us to manipulate PDF's. We also want to look at libraries that are written in JS only. There was a push back from the ops and services when we wanted to use wkhtmltopdf to render PDFs partly because it was written in C++. Some of the reasons given were: (a) hard to maintain, (b) security risk, (c) if something goes wrong, we don't have C++ developers handy to fix the issue with the underlying library.
A/C
Find a Node.js library that has the capability to do the following (in addition to being written in JS only and not offloading the work to external programs):
- Add page numbers to PDF pages;
- Add pages to a PDF;
- Remove pages from a PDF;
- Given the table of contents with links that point to headings in the PDF, find page numbers of headings in the PDF;
- Add an outline;
- Add metadata such as the author, title, etc.