8 Replies Latest reply: May 15, 2014 10:16 AM by RalphB _ RSS

    The Dreaded PDF

    GuinnessDrinker _

      Good Afternoon (UK)

      Does anyone know if the latest version of Monarch will read PDF files generated from a document scanned as a PDF

      Thank You in anticipation

        • The Dreaded PDF
          Olly Bond

          Hello,

           

          No, Monarch v11.8 does not have OCR built in. I certainly got excellent results using Abbyy FineReader and Monarch. If you need to automate this solution, Abbyy FineReader has a server product, but also there's a corporate edition which offers some automation and could be usefully deployed alongside DataPump.

           

          Best wishes,

           

          Olly

            • The Dreaded PDF
              GuinnessDrinker _

              Thanks Olly.

              my company will not allow any other software so it looks like I am still stuck with the same problem until Monarch comes up with the solution

                • The Dreaded PDF
                  Olly Bond

                  Hello,

                   

                  Perhaps you could get a quote for re-keying the scanned data, then a quote for Abbyy Fine Reader Corporate Edition, (last time I checked about EUR 1000)? How many pages, how complex a layout, and how many languages are we dealing with here?

                   

                  All the best,

                   

                  Olly

                  • The Dreaded PDF
                    Grant Perkins

                    Thanks Olly.

                    my company will not allow any other software so it looks like I am still stuck with the same problem until Monarch comes up with the solution[/QUOTE]

                     

                    I think the problem there is that OCR is a very specific technique that goes way beyond the scope of document modeling in Datawatch terms.

                     

                    Not only that but if you are using OCR extraction for anything important you would need to be extremely sure that the interpretation is accurate and have some method of built in verification and warnings to assist with the assessment. This would be especially true if you were automating the process due to the volume of work.

                     

                    Have you considered looking back towards the source to see if the PDFs could be supplied in a different format? Even just written out as a text based PDF rather than something graphical?

                     

                    OCR type scanning, now possible via cameras on smartphones, is very clever but does have limitations that require human intervention to ensure accuracy, even from visually "good" sources, if the output is to be used elsewhere.

                     

                    I hope these thoughts help in some way.

                     

                    Grant

                      • The Dreaded PDF
                        RalphB _

                        If you have the full version of Adobe, Adobe Standard, it has OCR capability.  I have used it in the past.  It isn't perfect and takes some work at times but it may do what you want.

                          • The Dreaded PDF
                            GuinnessDrinker _

                            Thank you all for your suggestions

                            I am going to try and get the suppliers involved to send me PDF files as my company will not consider any other solutions.

                            I have been trying to find on the Datawatch site prices (in sterling ) for a version 10 to 11 upgrade CD but can get no results, The website has changed since I last went there and does not seem to advertise the products just gives a pitch as to what it can do.

                            Any suggestions