    Scanned/faxed PDFs

    Norm _

      Can Monarch Pro v8 process Scanned/faxed PDFs?

          Grant Perkins



          V8 is looking to analyse recognisable text within a pdf file but not the much more complicated graphics content.


          As far as I know most pdf files created from scans are likely to be graphics images (basically like bit maps) rather than text documents.


          I'm not a great user of fax so not sure what angle you are looking at there but I suspect the situation is much the same.


          The only way I know of to try to extract text from the graphics versions is to run through an OCR process and see what comes out at the end. That is not usually a rapid process, being quite complex and specialised, and tends to be interactive when seeking to correct possible interpretation errors.


          OCR of a document to a text file before submitting to Monarch is still a possibility of course and has been used successfully by people pre version 8 to get at data content. Whether it is viable and useful to seek to include OCR level functionality in Monarch I don't know.


          My initial feeling is that it is specialised enough that I can't quite see how embedding it in Monarch would provide much advantage to users over pre-processing externally to Monarch.


          However experience and feedback over time may identify that internal OCR would be valuable and the development team could then seek a suitable and cost effective way to implement it. However I do still have reservations about assumptions that OCR can provide consistent results which are useful to the sort of tasks that Monarch addresses. That may change in the future with more powerful hardware and more advanced OCR software engines.


          Does this answer you question? I would not be surprised if you tell me I missed the point!





