3 Replies Latest reply: May 15, 2014 10:05 AM by Grant Perkins RSS

    PDF Rpt Column Shift

    bbracken _

      I have a monthly pdf report that is the same format every month except the left margin.  Sometimes the detail info starts 10 spaces from the left.  Sometimes it starts at 8 spaces or 15 spaces.  The detail I am trying to capture is multi-line and looks something like this:


      '^' = a space and the columns are aligned...











      Is there a way to capture this with one model if the margin continues to move?

        • PDF Rpt Column Shift
          mdyoung _

          Have you tried using a floating trap to capture the detail line?

            • PDF Rpt Column Shift
              Olly Bond

              Hello bbracken,


              From the sample you posted, it looks as if the fields themselves do not contain spaces, so you could trap blocks of text and use textline(), lsplit() and intrim() to get the data you need.


              Best wishes,



                • PDF Rpt Column Shift
                  Grant Perkins

                  Soounds like the PDF output may be centred and so position varies according to the data presented.


                  Olly's solution should cover most eventualities. In fact if the report columns are always consistently spaced in relation to each other but are moving around for some other reason you could simply grab an entire line as a field and then spolit it based on known relative positions for each component. However with some types of multiline records that might get a little too interesting.


                  One option might be to pre-process the initial PDF interpretation to assess the size of the left margin for that instance of the report, extravt every line in its entirety, strip the leading spaces for the left margin from every line and then save the remainder as a new text report. That way, assuming the column realtionships are always constant, your main data extraction model can be greatly simplified.