3 Replies Latest reply: May 15, 2014 10:11 AM by Shmuli Wenger RSS

    PDF import doesn't keep column format

      I have been using Monarch for many years since version 4 and it is excellent when importing from Text files. However in the last few years we are receiving more and more of PDF files. I have been finding an issue with many of these PDF as follows. Although when using Adobe or any other 3rd party PDF Reader, these files view and print with the columns lined up perfectly, for some reason, when importing to Monarch, many rows are pushed to the right or left and make it virtually impossible to create a good Model. I have tried everything from freeform, to monospaced (which has so far been the best) and tried changing the stretch, etc…

       

      Have other users having the same issue and is this perhaps this is just a limitation within the Monarch product.

       

      Thanks,

        • PDF import doesn't keep column format
          Olly Bond

          Hello Swenger and welcome,

           

          It isn't Monarch's fault If you write a PDF you can tell Acrobat to write a letter (a glyph, I think the geeks call it) at x pixels along and y pixels down from the top corner of the page. That's a different scale of complexity from reading a fixed width text file where the letter is in column x of line y. And Monarch does a pretty job of reading in PDF files and re-interpreting them as text files - I've handled scanned forms with multiple fonts, handwriting, Arabic and European characters, and recently a Visio diagram using Monarch, and got the data I needed.

           

          You're doing the right thing by checking out monospace and freeform, and changing the scaling. If you have a regular report that you have several versions of, try to find the most stable scaling that gives you a usable text layout. Then, in Monarch, experiment with the floating trap option on your templates, which should give you the chance to handle data that drifts left and right a little.

           

          If you need to handle multiple column regions in a PDF, please let me know, as there's a particular trick to deal with this that requires v10 or above.

           

          Best wishes,

           

          Olly

            • PDF import doesn't keep column format
              elginreigner _

              Welcome aboard.

               

              Everything Olly has stated it correct. Working with PDFs can be cumbersome. A PDF is only a graphical representation of data stored within the PDF. The other issue is the amount of PDF engines and 3rd party writers that can produce PDFs. Sometimes the best results can also be to trap entire data lines and parse out what is needed via Monarch functions and logic.

               

              If you have any questions the forums is an excellent place for help.

                • PDF import doesn't keep column format

                  A floating Trap won't help me, because the column I am using to trap is constant, it is some of the other columns that are not aligned.

                   

                  However the last suggestion will work. I grabbed the 2 columns as one and then used the Formulas TRIM and FIND( a space between the two columns) to split them.

                   

                  Thanks everyone.