7 Replies Latest reply: May 15, 2014 10:01 AM by Grant Perkins RSS

    Data "shifting" issue

    cmlauer _

      First off, let me say that I'm a Monarch newb.  I inherited the role of using Monarch from someone else on the team, but heretofore was able to just use the models that they created.  This is my first attempt at creating my own model and may be missing something very simple, so I apologize if I'm asking what may be a really dumb question.

       

      I have a PDF of contract data that I'm looking to extract into MS Access.  The records that I'm trying to extract follow a pretty straight-forward formatting convention, but I'm having a whale of a time getting data extracted properly.  The model I made went something like this:

       

      Header: Item/Qty/Purch Unit/Unit Price/Total Item Amount (these elements are always present)

       

      Detail: Noun field (only part of columnar "body" that is always present); JUST first line (more on this later)

       

      Append: Rest of the fields (ACRN, PR, etc.) except Descriptive Data; these fields are usually, but not always, present

       

      Footer:  Descriptive Data (always present)

      /LIST

      The problem that I'm having is that when I go to table view, all of the data gets "shifted" down by one record.  For example, the Item 0204 record will have all of the append data from the 0203 record.

       

      I also can't figure out how to make Monarch grab the additional lines that sometimes crop up for one of the fields.  For example, the Noun field is occasionally two lines long.  I can't just do a NOT ACRN trap because sometimes ACRN isn't present in a record and another element is next.  There's also the issue of the Descriptive Data being free form, with lots of line breaks.  If there were a way to have it grab everything until the next record, that would be nice, but it's really low priority relative to the other issues.

       

      I'm really hoping there's something simple that I'm missing here because the thought of manually pulling all of this data (~100 pages worth) is already ruining my day.  :confused:

       

                                                                     Qty                                      Unit Price                                                    

      ITEM              SUPPLIES OR SERVICES                         Purch Unit                        Total Item Amount                                                                               

      0203                                                           1                                         $9,999.00                                                                               

      Lot                                       $9,999.00                                                    

                        Noun:                        THIS IS WHERE THERE WOULD BE DESCRIPTIVE TEXT                                                                               

      ACRN:                        AC                                                                               

      PR:                          XXXXXXXXXXXXXX                                      $9,999.00                                                          

                        NSN:                         N - Not Applicable                                                                               

      Contract type:               U - COST PLUS FIXED FEE                                                                               

      Inspection:                  DESTINATION                                                                               

      Acceptance:                  DESTINATION                                                                               

      FOB:                         DESTINATION                                                                               

      Descriptive Data:                                                                               

      The contractor shall prepare blah, blah, blah.                                                                               

      0204                                                           1                                       $111,111.00                                                                               

      Lot                                     $111,111.00                                                    

                        Noun:                        THIS IS WHERE THERE WOULD BE DESCRIPTIVE TEXT                                                                               

      SOMETIMES IT LOOPS TO A SECOND LINE                                                                               

      ACRN:                        AE                                                                               

      PR:                          XXXXXXXXXXXXXX                                    $111,111.00                                                          

                        NSN:                         N - Not Applicable                                                                               

      Contract type:               U - COST PLUS FIXED FEE                                                                               

      Inspection:                  DESTINATION                                                                               

      Acceptance:                  DESTINATION                                                                               

      FOB:                         DESTINATION                                                                               

      Descriptive Data:                                                                               

      The contractor shall prepare blah, blah, blah.

                  As with the noun field, sometimes it loops around to a second field.

       

                  Worse, there will occasionally be info after a line feed.[/CODE]

       

      If anyone can help, you'll be a hero in my book!

       

      Thanks.

        • Data "shifting" issue
          Data Kruncher

          Hello and welcome to the forum!

           

          Ah, it looks like you have another likely candidate for the [URL="http://www.monarchforums.com/showthread.php?p=9873#post9873"]"guru trap"[/URL].

           

          I totally feel for you as someone just beginning to create your own models. As you've discovered, it's one thing to use models that someone else has created; it's yet another to model the data yourself.

           

          Here's how I modeled your sample:

           

          1) Create a one line detail template on the Noun: line. Make the trap noun: Paint the field and on the Advanced options for the field, set the End Field On to None of the above.

           

          2) Next, I used the second record to select the rows for an append template. Select the 9 rows from Noun to FOB. Set the trap to be exactly[/B] the same as the trap used for the detail template, that being noun:.

           

          3) Paint each of the fields to capture, and for each field set the Advanced option for Start Field on to include a value for the Preceding string in current line, such as acrn:, pr:, nsn:, etc.

           

          This almost gets you there, but the noun field includes way too much data, since it doesn't stop capturing where it really should. So the last step is

           

          4) Create a one line footer template on the ACRN: line. Set the trap to be acrn:, and paint the title as a "dummy" field. You don't need this field, so simply hide it in the table.

           

          Now the detail template will stop capturing data when it runs into the footer template (since they can't overlap), and all of the fields in the append template will work just fine regardless of the footer template.

           

          HTH,

          Kruncher

            • Data "shifting" issue
              Data Kruncher

              No sooner did I post the above than I re-read your post and noted that you did in fact want to pick up the "Descriptive Data". So some minor changes are in order.

               

              Make the append template a 12 line sample. Paint only the first line of the descriptive data. On the Advanced options, select Start Field on String descriptive data: anywhere in previous line, and End Field On None of the above.

               

              Make one more 2 line append template to pick up the Item Number and Lot information (if you want the lot stuff). Just as the acrn: trap prevented the noun from overrunning, this append will prevent your descriptive data field from overrunning too.

               

              Much better.

               

              Kruncher

                • Data "shifting" issue
                  Grant Perkins

                  Part of the trick here is that the Appends have to be trapped 'before' or 'the same as' the detail trap since appends are 'before' and footers are 'after' conceptually.

                   

                  That said, the preceding string concept allows us some 'adjustments' when the report is formatted as yours is.

                   

                  Kruncher's solution looks good. But I like a challenge as much as he does and if you don't have the possibility of hundreds of fields like ACRN that may or may not appear it looks to me like you could probably get everything out of a single detail template.  If the record samples you have presented are largely typical of the report.

                   

                  Multi line field concepts and preceding strings to ID fields are still the order of the day  - just all in a single template. Maybe more if you may have a lot of variable fields. The reason I say that is that the sections of the record are always present so the main concept of appends and footers being applicable to several records just does not apply providing you can create a sample size that has enough lines in it (the data does not matter, just the number of lines) to hold all the fields somewhere. If not then yep, you will need an append record as a 'continuation' of the detail using the exact same trap.

                   

                  If you are getting a good and stable extraction from the PDF file that too is a great bonus for this sort of strategy.

                   

                  The first model is always the hardest. Doubly so when you need to dive into some of the more 'interesting' possibilities!

                   

                  HTH.

                   

                   

                  Grant

                    • Data "shifting" issue
                      cmlauer _

                      OK, the alignment issues seemed to have stopped, but there are a couple issues:

                       

                      1.  All of the fields in the append (ACRN, PR, etc) are picking up everything in the fields below them (i.e. ACRN has all of its data, plus PR, plus NSN, etc.)

                       

                      2.  A couple things (aside from #1) looked odd when I saw what came up in table view.  Closer inpection of the document showed that not only can the Noun field be more than one line, but some of the others can as well (especially PR).

                       

                      3.  Also, sometimes ACRN isn't present; I'm assuming that complicates the footer issue.

                       

                      It feels like we're probably a tweak or two away from having it.  If it would make things easier, we could drop the Descriptive Data field.  I'm guessing it wouldn't be hard to make a template that pulls just the Descriptive Data into a separate table and then do an update query to pull it all together.

                       

                      Thanks again for all your help guys.

                        • Data "shifting" issue
                          Grant Perkins

                          From the sample posted you can everything into a single template.

                           

                           

                          1. All of the fields in the append (ACRN, PR, etc) are picking up everything in the fields below them (i.e. ACRN has all of its data, plus PR, plus NSN, etc.)[/quote]

                           

                          Leave non multiple lines set to 'End on' 1 line. Any that can be multi line - NOUN, PR, etc., - set to end on 'Blank Preceding string of (say) 30 characters'. Whatever number of chars you need to get over to the tags on the left.

                           

                           

                          2. A couple things (aside from #1) looked odd when I saw what came up in table view. Closer inpection of the document showed that not only can the Noun field be more than one line, but some of the others can as well (especially PR).[/quote]

                           

                          See above.

                           

                           

                          3. Also, sometimes ACRN isn't present; I'm assuming that complicates the footer issue. /quote

                           

                          Not if everything is on one template. The preceding string concept will populate the field if it exists but otherwise leave the column empty for records with no value.

                           

                           

                          It feels like we're probably a tweak or two away from having it. If it would make things easier, we could drop the Descriptive Data field. I'm guessing it wouldn't be hard to make a template that pulls just the Descriptive Data into a separate table and then do an update query to pull it all together.

                          /quote

                           

                          No need, it should work fine if you play with the 'End field on' settings. If all else fails try 'none of the above'. In fact I would start with that ....

                           

                           

                           

                          HTH.

                           

                           

                           

                          Grant