Sounds like your pdf files Look like Text but are in fact created as graphics blocks - in which case there is little Monarch can do about it.
If you open the pdf file in Acrobat Reader and then try and highlight the text and 'cut' it to paste it into another document, what do you get?
Or you could just open the pdf and try a 'Save as text' and see what results.
If the pdf files are being written as graphics files at source you have two options I can think of.
The best option would be to go back to the source of the document and look for an alternative way of writing it - non-pdf of pdf as a text file.
The second best way would require the use of an OCR scanner to attempt to convert the graphics to text. Some of those are pretty good these days, but rarely perfect and rarely complete without asking for operator confirmation of accuracy of their interpretation in my experience (though that is limited.)
If, on the other hand, the PDF file does contain real text yet is still blank that would suggest further ideas are required!
Let us know what you find.
I rescanned the document using the HP SCANNER and checked off Scan for Editable Text (OCR) 300 ppi. I now can see the jumbled mess in Monarch. When I try to copy and paste it is not the text I am scanning, if I do a search for text it finds some of it not all. If I save the file as text it is the same as the copy and paste. We got the upgrade to the PRO thinking we could import PDF files, I am very disappointed that my files are not working properly. Any other ideas, this scanner is next to my work area and I do not have access to any others.
When I try to copy and paste it is not the text I am scanning, if I do a search for text it finds some of it not all. /b[/quote]One of the wonders of pdf files it that you can present text as graphics and APPEAR to present graphics as text. Sounds like your document has a mix of the two and that even when you OCR a scanned version you don't get very usable results.
This is not unheard of and suggests that even the specialist graphics to text converters are struggling with the file(s) you get. All I can say is that there are some horrid pdf files around that only look reasonable as pdf files and are terrible as text files.
I think it is reasonable to say that if you have 8.01 installed and the results you can achieve through Monarch or not useful in any way then you have a file which is not worth sopending time on as text. If the Adobe program can't do much with it what hope is there?
In theory some OCR programs should be able to work directly with a pdf file rather than printing and re-scanning. However I suspect that in this case you are not going to get much out of it anyway.
What is the source for the file? Is it coming from an external company or is it being produced by an in-house system? If it is in-house, can you obtain the report in another, non-pdf, format?
I am aware the Datawatch are interested in assessing problems with PDF processing - there are so many different pdf creation programs and other varaibles that it is a huge subject - but if it is clear that the inforamtion presented is in a graphics block image rather than a text presentation there is little that can be done successfully that is not already available as an OCR type system - and if they are not working well converting to text the signs are not good.
[size="1"][ December 09, 2005, 02:11 PM: Message edited by: Grant Perkins ][/size]
I have this same problem with a PDF file that I receive from an external source.
If I convert the PDF in Monarch it is all BLANK.
What I do is open the PDF in Adobe Acrobat the full version (not the reader) and them re-save it as a PDF.
When I convert the re-saved PDF in Monarch; Monarch converts the PDF just fine.
If you have the full version of Adobe Acrobat give that a try.