1 Reply Latest reply: May 15, 2014 10:03 AM by Grant Perkins RSS

    issue with "bottom align" format in pdf

    jimmyscookie _

      hi there,

       

      i have a quick question.:confused:

       

      background:

      an excel report was exported/saved as PDF file

      task:

      extract the report from PDF to excel (why did they converted it into pdf?!?!)

       

      Sample data in PDF:

       

      row               Name                 address                   note

      1               123123               new york, ny             123123

      2                                        14th st

      3               456456               new york, ny             123123

      4                                        12th st

      5                                        apt 123

      6               456456               new york, ny             123123

       

      Problem ... ...

      row 1 is one record

      row 2 & 3 are one record

      row 4,5 & 6 are another record

       

      the original excel report was using bottom align, so the "beginning" of the string is hard to define ...

       

      what can i do to extract these records out?

      there are millions of records in total ... damn ... it ... :(

       

      please help me out !!

      thanks alot !!!!

       

      regards,

      ImSoLost

        • issue with "bottom align" format in pdf
          Grant Perkins

          Hello jimmyscookie and welcome to the forum.

           

          (why did they converted it into pdf?!?!)

          /quote

           

          ... has to be one of the best questions I have seen in a long time ...:D

           

          I'm not able to test out this suggestion - it's a bit data specific and further influenced by what the PDF interpretation is doing - but this may be one of the rare occasions for the suggestion to use the original (read elderly) Postal Line Trap concept. I'm assuming that your sample as reported indicates a regular US postal line as the 'always present' detail and the variable number of lines above represent any extra information pertaining to the address.

           

          Do read the Help entry for the Postal Line Trap. In particular the section on "Capturing Address Blocks with Varying Lines via the Postal Line Trap".

           

          There are some logical constraints in how it works. Especially note the observations regarding the number of lines selected for the template. You need to assees the number of lines possible for the largest address record you are likely to find in the report and select the number of sample lines according. The trap and template will deal with shorter records.

           

          If these are not addresses you may need a different solution - and I can't guarantee that this will work for you but it's where I would start (I think).

           

          HTH.

           

           

          Grant