Although PDF files look great on the outside the internal structure can be horrible and in some circumstances a long way from any standard that Adobe would like to see used.
I would suggest 2 things.
Send the file(s) with problems (assuming you can make them available) to the Datawatch support team for assessment.
In parallel try a few PDF editing applications that allow you to re-write the file to see what they make of it. If it's a mess internally editiing and re-writing might, with a decent program doing it, eliminate or reduce the problem.
As a quick check of the potential, how does an Adobe PDF related program get on with the file when presented and, say, instructed to turn it into a text file?
There are quite a few low cost or free PDF editor and writer tools that might be worth checking with to see if a basic re-write of the file saving after some form of edit, looks like it might reduce or eliminate the problem.
Doing that may help to uncover more about why the process seems to hang up and whether it is file related rather than, say, some sort of extreme or obscure memory management problem that causes the hanging.
Thanks Grant. You may have finally shed some light on this. I went back and reviewed some of what I've done since getting v14. I do have other PDFs that have worked fine in Monarch. But a few give me problems. And some of them give odd looking data in the Report View. Basically borrower names sometimes get jammed together or spread out amongst other data. Datawatch has looked at that problem in the past and determined that the data is jumbled within the report itself. I will look into other PDF software with the ability to repair these files. Thanks again.
Seeing lines from a PDF that look like they are severely misplaced is a classic sign of some sort of internal referencing issue that the extraction engine is unable to cope with for some reason.
When I have looked at files showing those symptoms in the past I have often found that different reader applications will give different interpretations when asked to output a text file.The PDF core data can become quite complex especially where some "clever" formatting and font selection has been used by the author(s). PDF file templates that have been modified over time, possibly after being processed through different "Writer" applications - especially those embedded in older database applications - seem to be the most prone to problems.
Just for interest you could test the problem files with different version of the PDF interpretation engine available within Monarch. If your model has bee in existence for a while it might be set to use an older version than is currently available. It is just possible that changing to and older or newer version may make a difference. You could also try a few changes to the PDF settings to see what difference they make. From some of the things you have mentioned I suspect that would not be a solution to the problem (and the model would likely need changes) but may provide some insight as to the nature of the challenge.
Are you able to share the files without fear of data security issues? If so I would be happy to take a look to see if I can spot anything. If you could share the model(s) I could look at them in V13 and V14.
The objective for using a "re-write" edit for a troublesome report is to clean it up as part of the re-write and some writer programs may not do that. So if you try a few programs expect variable results!
It might be worth starting with Adobe Acrobat if you have it available.
Bear in mind that identifying a problem with a file does not necessarily solve your problem long term. You may need to look at the options for improving the reliability of the incoming reports. (Is the requirement for making changes in the model or the table an indication of new variation in the incoming reports?)
Bear in mind it's entirely possible that I could be way off target here but these things can be tried relatively quickly and the results should provide good insight into the nature of the problem whatever it turns out to be.
The files belong to a client so sharing them is out of the question. We have a number of clients that use the same servicing system that generated these files but they all seem to use different PDF software in the creation of their files, which would seem to explain some of the inconsistencies we see.
I have tried using different PDF engine versions and have found 4.1 to work the best. The reports are still somewhat jumbled, but I've managed to compensate with some creative modeling. The only real problem at this point is Monarch bogging down and becoming non-responsive. As I said, I can work with it but it becomes tedious and time-consuming. So I will look into cleaning up these particular files. I'll let you know if I find anything that works. Thanks!
You have my sympathy for this problem.
If you try some of the Adobe "Export to Text" options in their various programs to see what happens you may work out a way to improve the source files via re-creation which, hopefully, might improve the code under the hood.
Changing the models to deal with the problem is usually possible but not an ideal thing to have to do frequently.
There are some generic approaches that one can use if the best assumption is that there are always likely to be problems ... however I suspect, based on your description, that challenges you have would just tip the scale beyond what those techniques can really safely deal with from a repeatable reliability point of view.
If you get really stuck for a solution of any sort there are some more things we could look at but they are so far from being "normal" that I would be a little reluctant to attempt to work through them publicly until we had a better understanding of what would work and what would not. So let's see how you get on and come back that if required.