1 Reply Latest reply: May 15, 2014 10:04 AM by Grant Perkins RSS

    PDF 'Shrink'

    Bubbers _

      I am using V10 Pro (upgrade) and am using a PDF file of W2c's (Corrected W2's) to produce summary totals for W3c's (corrected W3 for the company).

      The layout always starts with the tax year at or near the first row(depending on the scan).  This is how I keyed all the demographic data to be retrieved - based on where the tax year shows up.

       

      The data in the W2c could start with box 1 or box 12.  On my test W2c, I had every field filled out to it's maximum capacity to establish field limits.  This translated into row numbers I could use to define which field was where.  {IF =41, "BOX 12", IF.....}.

       

      The problem is, when no information is entered in the fields above Box 12 (which is common), the row number for Box 12 becomes 14 (or something other than 41).  The way the PDF is 'read' is scrunched up, even though the relative position of Box 12 is identical to my test document in the PDF itself.

       

      Is there any remedy?  There are no other identifying elements that will be able to tell me which field is represented - so the row number seems to be the key. 

       

      I go through LOTS of these by hand, using a 10 key to add up the various fields, sometimes for dozens of employees per company, and this would save tremendous amounts of work - assuming it can be modified.

       

      Bob

        • PDF 'Shrink'
          Grant Perkins

          Hello Bob and welcome to the forum.

           

          I have no idea what a W2 looks like so my comments may be way off here.

           

          The typical solution to a problem where a field can 'float' up and down in a logical record area is to attempt to find a way to use a preceding string to identify the vertical position of the field, usually establish a known starting point line for othe first line of a record.

           

          I suspect you can't identify a suitable preceding string and have therefore to rely upon the row number but I don't think that will reliably give you what you want in this context. (As you seem to have discovered already ...!)

           

          If you have already looked at the preceding string option and found it wanting I'm not sure what else to suggest but if you can share your test pdf file I would be happy to take a look to see if anything looks like it might provide a way forward for modeling the PDF conversion.

           

          HTH.

           

           

          Grant