12 Replies Latest reply: May 15, 2014 10:06 AM by Grant Perkins RSS

    Missing rows when importing PDF

    jbvinny _

      Does anyone have any idea why I would be missing the first 6 rows or so of data on each page when importing a pdf?

        • Missing rows when importing PDF
          Data Kruncher

          Missing as in:

           

          six rows or so don't appear in the report window once the PDF has been read into Monarch, or

          six rows or so don't appear in the table window once your model has been applied?

          /LIST

            • Missing rows when importing PDF
              jbvinny _

              Missing as in not pulling into the report window.

                • Missing rows when importing PDF
                  Data Kruncher

                  Rats. Feared as much. :confused:

                   

                  Is it possible for you to email the PDF? If so, send me a Private Message with your email address and I'll contact you.

                  • Missing rows when importing PDF
                    Grant Perkins

                    Missing as in not pulling into the report window.[/quote]

                     

                    First suspicion would be that those lines, whatever they are,  are presented as a graphic not text.

                     

                    If you use Adobe PDF Reader to convert to text do the lines appear or not?

                     

                     

                    Grant.

                      • Missing rows when importing PDF
                        jbvinny _

                        The document is not an image. No need to run OCR.  I am able to search the text that is in the first six lines. 

                         

                        I cannot send the PDF due to the nature of the document.

                          • Missing rows when importing PDF
                            Grant Perkins

                            The document is not an image. No need to run OCR. I am able to search the text that is in the first six lines.

                             

                            /quote

                             

                            I seem to recall that it is possible to index the text that appears in a PDF embedded graphic, thus making it searchable in the pdf but still not text as such. I could be wrong about that.

                             

                            Did you try converting to text with Acrobat Reader? The Reader does not, afaik, OCR any graphics blocks for their content so normally what you get out of the process is just the text content - the same that Monarch will produce.

                             

                            Beyond that without the opportunity to assess the file directly (for the good reason you mention) it's not obvious what else to suggest. Perhaps others have more direct similar experiences and will find the thread.

                             

                            Is there any chance of obtaining a test sample file that could be shared?

                             

                             

                            Grant

                              • Missing rows when importing PDF
                                Olly Bond

                                Hello jbvinny,

                                 

                                It's very tricky working blindfold, but I appreciate you can't send confidential data over.

                                 

                                I'd suggest printing the PDF as a new PDF - I've overcome similarly badly-behaved files this way. You could also print it as an XPS file and try that in Monarch.

                                 

                                HTH,

                                 

                                Olly

                                  • Missing rows when importing PDF
                                    jbvinny _

                                    I understand that its very tricky to troubleshoot blindfolded. I was hoping someone out there had experienced this issue before and would know what I am talking about. This is a log file that I cannot share. As far as creating a sample, I thought about that but I am not sure doing so would create  the same result.

                                     

                                    I ran OCR again just to be sure and that did not correct the problem.

                                     

                                    I think a big part of the problem has to do with the width of the document.  One page on the PDF view is actually like 4 pages if I were to print.

                                     

                                    I have tried printing as xps and am in the process of using ocr on that document to see if that will work.  I have also emailed the creator of the report to see if it can be sent in a different file format.

                                      • Missing rows when importing PDF
                                        Olly Bond

                                        Hello jbvinny,

                                         

                                        Monarch doesn't like lines longer than 4000 characters, and depending on the PDF input options and scaling it may be that the top six lines of each page, perhaps with some text over on the right hand side?, exceed this limit. This might explain they're having been dropped from the report window, although a warning message would be nice :-).

                                         

                                        Try adjusting the scaling to the tightest possible setting to see if this brings it back within the limit?

                                         

                                        Best wishes,

                                         

                                        Olly

                                          • Missing rows when importing PDF
                                            jbvinny _

                                            I have tried adjusting the scaling with Monarch to everything possible....It's as if the first six lines just do not exist when imported into Monarch. 

                                             

                                            I have tried exporting at XPS, but because of the width of the pages in Acrobat it turns a 31 page report into 132....and what looks to be a nightmare to trap.

                                             

                                             

                                            Really wish I could forward this to you all. Its driving me nuts.

                                            • Missing rows when importing PDF
                                              jbvinny _

                                              I swear I tried this once already and it didnt work. My solution was to print as a PDF using the shrink to printable area option.  This made the data almost impossible to read (very small).  However, I was then able to import into Monarch and with a minor tweak of the scale its perfect!

                                               

                                              I appreciate all the help!