7 Replies Latest reply: May 15, 2014 10:11 AM by Olly Bond RSS

    Pdf

    Kayla _

      Hi all,

       

      I am very new to Monarch. I currently downloaded the trial version and I am testing it before I commit to purchasing the software.

       

      I am having a lot of issues with PDF files. At my company, we want to use Monarch to convert all of our received invoices into one standardized format. We receive our  invoices in many different ways. For instance, some principals send their invoices in PDF, hard copy, excel, just to name a few. They are all very different in their format and design.

       

      I am trying to create a separate template for each individual principal. I didn't think this would be such a complicated process since each individual principal typically sends their invoices the same way each time. However, with the PDFs, sometimes the fields are not located in the exact same position each time. I've tried using floating traps, but it doesn't seem to work. I have only VERY basic skills/knowledge with Monarch, so it could be user error.

       

      Also, sometimes when I import PDFs, the text does not read correctly. For instance, periods are mistaken for commas, 5's look like S's, and this is also making it very hard to create a trap that will capture the correct information. I realized that I have to use OCR recognition within Adobe BEFORE importing the PDF file to Monarch in order to get Monarch to read anything. If I try to import a PDF into Monarch without using the OCR regonition within Adobe first, Monarch reads the PDF as a blank document. Is there any way to get Monarch to read these PDF's without using OCR recognition? And, is there any way to fix/change the characters that have been mistaken for the wrong character?

       

      I know this is very basic stuff, so any help to get me in the right direction would be greatly appreciated!

       

      Thanks!

      Kayla

        • Pdf
          Olly Bond

          Hello Kayla,

           

          It sounds like you're doing everything right using OCR and then trying the floating trap, but it is tricky.

           

          When you have data that hasn't been perfectly OCR'd you can have a problem with Date and Numeric fields, as 1234S will be read as either 1234, or Null, instead of 12345. Periods and commas can cause errors of a factor of a hundred or thousand quite easily too. You can avoid these by using Character fields, and then in the Table window defining a Calculated field that cleans it up a little.

           

          Replace(;"S";5")

           

          would swap S for 5, and you can extend this approach with:

           

          Replace(Replace(;"S";5");"I";"1")

           

          To convert the output into a number, just use:

           

          val()

           

          If someone has helpfully designed an invoice so the amount is floating in the centre of the page, without a field name like "Amount: " to anchor it, then you might find the Numeric Or trap to be more helpful than the floating trap. The Numeric Or trap symbol looks like "¦" - if you put a few of these in the range of columns where you'd expect to see numbers, and perhaps a few blank traps "B" in the space around it, you should find you can trap most data. Bear in mind that Monarch won't let you combine the Numeric Or trap and the Floating trap.

           

          If you'd like to send me a model and some example files by email, I'd be happy to have a look.

           

          Best wishes,

           

          Olly

            • Pdf
              Grant Perkins

              Kayla,

               

              Just to add to Olly's advice...

               

              With all the PDFs coming from different sources I would be slightly surprised if they were all created the same way into 'graphics' images rather than text documents. Monarch can, in principle but with some reservations at times, read PDFs containing text without the need for OCR. However if you operate within a specific industry it is possible that the source systems producing the PDF files are all the same or very similar and use 'graphics' representations of documents rather than 'originals' and that forces you to head down the OCR route.

               

              The well known problem with OCR activity is that it is rarely totally error free. This is why most of the OCR software products include potential error information at the review stage. If you can be certain that any errors (per document source) can be identified and corrected according to infallible rules (no matter whether it is a human or a computer doing the correcting) then certainly you can put Monarch to work doing the correcting. But you really do need to be certain about that in my opinion. Humans can often spot and check possible errors that a machine will not unoess coded very cleverly indeed.

               

              One option would be to rean the entire OCR output into Character fields 'as extracted' and then have Monarch generate a calculated field for each extracted field and build in error checking at that point. So for a simple example - if the calculated field should be numeric but the extracted data does not translate to a pure numeric that would flag an error. Potentially errors in other field types could be spotted in the same way, excess spaces could be tidied up and so.

               

              You could then allow the operator to enter the correct information via an interactive session of Monarch and make the 'translated' fields the export at the end of the process.

               

              I won't go into detail at this point as it is the concept that you need to consider at the moment and I am relutant to overload you with too much by way of Monarch instructions when you don't know the product well. Suffice it to say that if I wrote everything down in a comprehensive document step by step it would look like a lot of work and quite complicated which it is not. But how mush effort is really required will be a function of the success or failure of the OCR process and analysing those inputs thoroughly might take a while.

               

              As Olly suggested - if you have one or two PDF file examples that you could share I would be happy have a look and make some rather more specific proposals.

               

              HTH.

               

               

              Grant

                • Pdf
                  Kayla _

                  I can not figure out how to post a sample document. I am copying and pasting the document, with before the paste and /code after, but it is coming up as a scramble of words.

                   

                  Any help?

                   

                  Thanks,

                   

                  Kayla

                    • Pdf
                      Kayla _

                      ACCOUNT

                      NO.

                      10879187

                      Your partner beyond the plate:

                      INNOVATIVEISAMPLE 2220/VR

                      7950 SPENCE RD

                      FAIRBURN GA

                      30213

                      RAKFR

                      7950 SPENCE ROAD

                      NET 30 DAYS

                      QTY. SALES PRODUCT

                      SHIPPED 1/ UNIT NUMBER

                      FROZEN

                      INVOICE

                      NO.

                      2042732

                      SHIP

                      TO:

                      FAIRBURN

                      INVOICE CUSTOMER

                      DATE NO.

                      05/27/11 10879187

                      DaNERV ROI1TE: CJ::>::JL I

                      INNOVATIVE/SAMPLE

                      7950 SPENCE RD

                      FAIRBURN

                      30213

                      770 993 0111

                      DEPT # 00

                      GA SHIP DATE:

                      SPECIAL

                      INSTRUCTIONS:

                      DESCRIPTlON PACK SIZE

                      FOR BILLING

                      10 CS 4793048 APPETIZER. ASST PTITE QUICH 4/25 EA

                          • INVOICE SUMMARY ***

                      PURCHASE ORDER

                      NUMBER

                      "

                      2220/VR REMIT

                      TO:

                      GA

                      OS/27/11

                      sharon pu pxs

                      C

                      SALES SALES

                      LOC. REP.

                      DATE

                      ORDERED

                      2220 8000 05/26/11

                      I\IUMRFR' 1 J,,"'il~"

                      U. S. FOODSERVICE. INC.

                      PO BOX 281945

                      ATLANTA

                      30384-1945

                      800 241 7677

                      Page 01

                      GA

                      of 01

                      EXTENDED

                      LABEL 0 WEIGHT

                      PRICING UNIT

                      D UNIT PRICE PRICE

                      E

                      ONLY

                      PRESENTATN CS 50.3900 $ 503.90

                      TOTAL ~GT SHIPPED: 52.00 PIECES ORDERED: 10 PIECES SHIPPED: 10 ITEMS SHIPPED: 1

                      PLEASE REMIT THIS AMOUNT BY

                      ir':fll::S: +"". t:01tta:IlC pry. 1ft! in d"!fa Qb.J.al1d n Ii:EDIWa 0"dI ~ 1/:_.

                      Visit www.us1ood.com for a fast and easy way to order. DISTRICT COpy

                      PRODUCT TOTAL $

                      TAXABLE AMOUNT $

                      GEN SALES TAX

                      .00

                      Yo

                      06126/11 AMOUNT $

                      x

                      CUSTOMERS SIGNATURE:

                      W.;1I¥udate. ~ If..,..,..,.

                      503.90 /code

                        • Pdf
                          Olly Bond

                          Hello Kayla,

                           

                          Not sure if that's the OCR software or your text editor that's losing the layout, but perhaps it might be helpful if you emailed us a file as the original PDF so we could have a look at it properly?

                           

                          Best wishes,

                           

                          Olly

                            • Pdf
                              psamson _

                              Good morning,

                              I have an issue with pdf report. Every week I want to import payroll stub in Monarch. The pdf is generated by payroll system. For a reason that I can't understand during many it's working properly. But after this every week I have to modify my model because fiels move place . When we open the pdf the size is exactly the same but in Monarch there is huge differences.

                               

                              This is very time consumming can someone help me, because it's always the same reports from the same system weeks after weeks.

                               

                              Thanks

                               

                              Patrick

                                • Pdf
                                  Olly Bond

                                  Hello Patrick, and welcome,

                                   

                                  In Monarch, there's an option when you define a template to use the "floating trap" - which will handle data that wobbles left and right across the line.

                                   

                                  Have a bash at that, and if you're still stuck drop me a mail.

                                   

                                  Best wishes,

                                   

                                  Olly