I am extracting data from a large number of Microsoft Word documents. To make them viewable in Monarch, I printed them to PDFs. However, when I open perhaps a hundred of these files at a time (I have to extract data from thousands of them), the data seems garbled (e.g. text gets mixed together).
I think is either be the PDF printer I used or the number of PDF files opened, because I did not have the same problem when opening a smaller number of PDF files generated by another PDF printer.
I wanted to try using an XPS printer. The difficulty is, Microsoft's default XPS printer prompts the user after each document is printed for the file name. This is problematic, since I have thousands of documents I need to convert. Does anyone know of an XPS printer that can be configured to auto-generate file names, or a way to automate the printing of the Microsoft XPS printer?
Alternatively, if anyone knows of a better way of parsing Word documents in Monarch than converting them to XPS or PDF, I'd be eager to hear it. Please let me know.