2

Want to do it via C#, all inline, no Process.Start()...and free...could be RTF, HTML, whatever the case may be...as long as I can open in Word, which I can then save off as RTF, which I can then load within a RichTextBox.

I'm aware similar questions have flooded this forum over the years, nothing that seems to address what I am asking though.

EDIT:

Looks like it can be done here: http://www.itextpdf.com/examples/iia.php?id=275

1
  • The link is not working.
    – Code Pope
    Commented Jul 18, 2019 at 12:39

2 Answers 2

3

Use a PDF library, such as iTextSharp to parse the PDF. You will be able to access all text and images from the PDF and convert to whatever representation you want.

There are other solutions (such as installing xpdf and shelling to it - it will convert to html if the right command line arguments are passed in).

1
  • I keep hearing talk of parsing via iTextSharp to get text/images, fair enough. Where are some samples in doing this other then making use of the PRTokeniser within iTextSharp? Commented Sep 10, 2010 at 21:25
0

I am not sure if Word could open a pdf unless you created the pdf in a word document.

I think the only quick solution to that would be to purchase or find a 3rd party library that does PDF handling, then use it's API to pull out the text you need. The text any any case would be extremely badly formatted at that point i am sure. Also be aware that some pdfs that show text actually have it saved as an image, so there would be no way to get the data out.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.