5 Replies Latest reply: May 15, 2014 10:10 AM by Grant Perkins RSS

    PDF Multi-Line Record Across Page Breaks

    Sam Chambers

      I have a large PDF file that's giving me fits. First, I can't seem to get the spacing right when importing the PDF file. I've played with the settings, and things don't line up like they do on the PDF.

       

      I was able to get the far left part of the file to line up well enough. I am using a multi-line field, with "MEMBER NAME" as my trap. I need to grab the Member Name on line 3 (DOE, JOHN), the second date field on the 4th line (7/15/10 in this example), and the date on line 5 (7/17/10).

       

       

      [FONT=Courier][FONT=Courier]MEMBER NAME/ PAID DTE/ SERVICE **** DAYS **** COVERED[/FONT][/FONT]

      [FONT=Courier]MEMBER ID NO. ICN TOB DATES ********** CHARGES ********** ******* PAYMENTS *******/FONT[/FONT]

      [FONT=Courier]DOE               ,JOHN[/FONT][/FONT]

      [FONT=Courier]11111111111 2222222222222 09/27/10 07/15/10 RTN NRY CCU ICU 1,445.37 MEDICAID 8,959.71[/FONT][/FONT][/LEFT]

      [FONT=Courier]                          0111 07/17/10 1 0 0 INJECTABLE DRUGS 508.50 COB 0.00[/FONT]

      /FONT[/CODE]

       

      It all works, until a page break happens in the middle of the record, like this:

       

      [FONT=Courier][FONT=Courier]MEMBER NAME/ PAID DTE/ SERVICE **** DAYS **** COVERED[/FONT][/FONT]

      [FONT=Courier]MEMBER ID NO. ICN TOB DATES ********** CHARGES ********** ******* PAYMENTS *******/FONT[/FONT][/LEFT]

       

      (PAGE BREAK HERE)[/LEFT]

       

      Report : CLM-0820-O DEPARTMENT OF HEALTH Run Date: 04/01/2011

      Process : CLMJO0820 BILLING MANAGEMENT INFORMATION SYSTEM Run Time: 13:56:24

      Location: CLMP0820 STATISTICAL AND REIMBURSEMENT REPORT Page: 20

      PAID CLAIMS AS INITIALLY PAID -- TYPE A

      BOB'S MEDICAL CENTER PROVIDER NUMBER PAYMENT DATES 00/00/00 THROUGH 00/00/00

      PO BOX 690 000000129A SERVICE DATES 10/01/09 THROUGH 09/30/10[/LEFT]

      NOWHERE,GA 11111 ADMISSION DATES 00/00/00 THROUGH 00/00/00

       

      [FONT=Courier]DOE               ,JOHN[/FONT]

      11111111111 2222222222222 09/27/10 07/15/10 RTN NRY CCU ICU 1,445.37 MEDICAID 8,959.71[/FONT][/LEFT]

                                 0111 07/17/10 1 0 0 INJECTABLE DRUGS 508.50 COB 0.00[/FONT]

      /CODE

       

      Does anyone have any ideas for dealing with this? I tried saving the PDF file as a text file, and it has issues.

        • PDF Multi-Line Record Across Page Breaks
          Steve Caiels

          [SIZE=3]Hi Sam,[/SIZE][/FONT]

          [SIZE=3]Highlight all the lines that make up the page break, then create a new template. Create a trap using the normal Monarch techniques (maybe ‘Report:’). There is no need to highlight any fields. Set the template type to Page Header and ‘OK’ it. /SIZE[/FONT]

          [SIZE=3]This tells Monarch to ignore all those page header lines: So as long as the headers are a consistent number of lines throughout the report, then this should do the trick.[/SIZE][/FONT]

          [SIZE=3]Regards,[/SIZE][/FONT]

          [FONT=Calibri]Steve.[/FONT][/SIZE]

          [SIZE=3][/SIZE][/FONT]

          [SIZE=3]/Edit - This can be easier to visualise if you turn off Monarch's display of page breaks by using the 'Options->Input->Ignore Form Feed Characters' setting.[/SIZE][/FONT]

            • PDF Multi-Line Record Across Page Breaks
              Sam Chambers

              Thanks, Steve.  I've been using Monarch for nearly 5 years, and never realized what the Page Header button meant!

               

              Another question about loading PDFs...Why is it that I can't get the entire report to line up just like it is in the PDF?  I selected "Monospace" and played with the Stretch numbers until I got the left side of the report to line up, but the right side is still out of sync.

               

              Thanks again!

               

              Sam.

                • PDF Multi-Line Record Across Page Breaks
                  elginreigner _

                  To be honest, it is very rare to have your PDF look the same in Monarch as it does in your PDF. The PDF document is only a graphical enhancement to the data that is inside the PDF. Monarch tries it's best, but due to the size of each letter in the font used, letters may move data around and not look exactly the same as the PDF.

                   

                  I personally (using 10.5 pro) select the auto adjust, allowing Monarch to choose the the 'Stretch' that it wants, then I select Monospaced and FreeForm. This has had the best results for me. I process about 500 - 600 PDF files a month for my clients. Thank god, only 1% of them place in PDF format.

                    • PDF Multi-Line Record Across Page Breaks
                      Sam Chambers

                      I'm also on 10.5 Pro, and I wish that method worked for me.  I ended up selecting Monospaced and 7.2 for the "Stretch".  That got the dates to line up on the left side so I could build a model to extract them.  Thankfully I didn't need anything from the right side, because it's all over the place.

                       

                      Thanks!

                       

                      To be honest, it is very rare to have your PDF look the same in Monarch as it does in your PDF. The PDF document is only a graphical enhancement to the data that is inside the PDF. Monarch tries it's best, but due to the size of each letter in the font used, letters may move data around and not look exactly the same as the PDF.

                       

                      I personally (using 10.5 pro) select the auto adjust, allowing Monarch to choose the the 'Stretch' that it wants, then I select Monospaced and FreeForm. This has had the best results for me. I process about 500 - 600 PDF files a month for my clients. Thank god, only 1% of them place in PDF format.[/QUOTE]

                        • PDF Multi-Line Record Across Page Breaks
                          Grant Perkins

                          Sam,

                           

                          Elgin has it right. Plus there are a number of 'standards' and many independent PDF 'writer' programs and .... well, let's just say that unfortunately reverse engineering a PDF can be tricky - to say the least.

                           

                          Datawatch have spent a lot of time and effort on this matter over the years and as far as I know are still interested in assessing challenging PDF input fiels to look for new 'tweak' requirements. However it can often be interesting, as a first assessment, to see what the conversion to text option in the Adobe Reader does with a problem PDF file. In theory it will be attempting to do much the same sort of conversion to text that Monarch is attempting, so it can be interesting to see how it comes out.

                           

                          Fonts, scaling, spacing and alignment rules can all combine to make the task difficult along with other factors, even in fairly 'clean' files. Not all files are 'clean' though, even if they come, theoretically, from a corporate system as a substitue for the old 'greenbar' output types.

                           

                          One day there will be a more consistent solution - some say it may be XPS related. We will see.