14 Replies Latest reply: May 15, 2014 10:15 AM by monarchman7 _ RSS

    Complex PDF documents -- help needed

    sdem _

      This is my first post, am new to Monarch, and I have so many questions, not sure where to start.  I'm working with large PDF documents from multiple sources that are pricing/options catalogs for large equipment, and I need the pricing data for a quote application we are developing.  I've managed to come a long way with Monarch, but some of the PDF documents are so varied in structure that I'll need multiple models for each.  There's not a lot of consistency from one section of the report to another.  My first question is can PDF documents be segmented in any way that makes modeling easier?  In other words, is it possible to restrict a model to a specific section of the report?  For example, a report contains a section for the base models which is fairly consistent, but not entirely, then subsequent sections for options and attachments, each with their own varying layouts but must be related back to the base model numbers.

       

      My second, more specific question is, what would cause the "Copy from previous record property" to NOT work on an append?

       

      I've spent much time on the forum and even spent a few months with trial copies of the software, but I'm still having trouble with these reports.  From what I can see, you guys almost always have a solution, so thanks in advance for whatever help you can provide to get me going in the right direction.

        • Complex PDF documents -- help needed
          Olly Bond

          Hello sdem, and welcome,

           

          You can't split the report - Monarch will read from the top of page one to the end looking for lines that match the detail trap. But there are a few things you can do - like adjusting the scaling for the appropriate section, and defining a filter based on the Page() function to exclude unwanted data.

           

          In ten years I've never come across the need to copy value from previous on an append - I'd attack that with less specific templates and Cleared By if needed.

           

          Hope this helps,

           

          Olly

            • Complex PDF documents -- help needed
              sdem _

              Thanks, Olly, I was afraid of that.  I'll research the page() function to see how I might use it.  As for the copy value problem, I usually see the append data copy down as needed without this, but have some data that won't copy, even when I specifically select the property.

                • Complex PDF documents -- help needed
                  Olly Bond

                  Hello sdem,

                   

                  I'm on the road for a few days but could take a quick look at your data towards the end of next week of you need. Let me know how you get on.

                   

                  Best wishes,

                   

                  Olly

                    • Complex PDF documents -- help needed
                      sdem _

                      That would be great, Olly.  I was hoping you would say that!  I will find the most troublesome of the reports to send, but will work on it in the meantime.  I just need some suggestions to get me going in the right direction, because I really do want to make this work.

                        • Complex PDF documents -- help needed
                          sdem _

                          I hope it's OK to resume this thread after so long an absence.  My project is huge, and extracting data from the PDFs is only part of it so now getting back into it.  I'm narrowing down my focus for this posting to ask how to "reassemble" data after using two models to extract data.  I want the two extracts, once recombined in a database table to be in the original order as in the report.  I've considered using recno() to add a sort field.  This will assure that the first extract (and primary one) will be in the correct order, but it doesn't really help with getting the second extract re-integrated into the primary data.  It could be that I'm overlooking something very simple, but I want to avoid too much post-export manipulation if possible -- will be loading the data into a SQL database.  My source data, as I mentioned earlier is a PDF price list.  The SQL data will be presented via a web app to look very similar to the original price book or configurator, thus the need for maintaining the original order.  I may even be overlooking a way to get all the data in one pass.

                           

                          The primary extract has detail defined by a PartID that is unique and has a description and price with several appends that define the categories to which the option belongs (not all represented in the example below).  Each detail record contains a PartID, description and price.  For each AB category, only one PartID can be selected.  There are some options, however, where the PartID is dependent upon another factor so that the PartID and description are on one line and the condition description and price are on separate lines.  In that case, in the second model, the PartID and its description become an append to the detail that contains a blank, description, and price.  Here's a sample of how it looks for the primary model:

                           

                          AB_54321   Category 1                                                                                (This is an append with two fields)

                                             1234567       description 1                                100.00                (These are details with 3 fields trapped on PartID, description, and price)

                                             2345678       description 2                                150.00

                                             3456789       description 3                                200.00

                           

                          Here's the layout for the exceptions requiring the second model

                                                                            

                          AB_12345    Category 2                                                                                (This is an append with two fields in both models)

                                             7654321        description 1                               100.00                     (This is a detail that will be picked up in the first model)

                                             6543212        description 2                                                               (This is an append in the second model)

                                                                   if condition 1                               150.00                     (These are details in the second model, trapped only on description and price; PartID is not repeated)

                                                                   if condition 2                               200.00

                                                                   if condition 3                               300.00

                           

                          Basically, I have both models giving me what I want, but wondering if one pass would do it, and if not, how to get them back into the original order in one table.  Sorting on the AB_##### is not an option as they are not sorted numerically in the price list.  I want the final combined data table to put 6543212 after 7654321 when it shows the options for AB_12345.

                           

                          Thanks for any help on this.

                            • Complex PDF documents -- help needed
                              monarchman7 _

                              Hello sdem,

                               

                              AB_12345 Category 2 (This is an append with two fields in both models)

                              7654321 description 1 100.00 (This is a detail that will be picked up in the first model)

                              6543212 description 2 (This is an append in the second model)

                              if condition 1 150.00 (These are details in the second model, trapped only on description and price; PartID is not repeated)

                              if condition 2 200.00

                              if condition 3 300.00

                               

                              The XXX.00 numeric values only show up on lines 2, 4, 5, and 6 of the above?

                                • Complex PDF documents -- help needed
                                  sdem _

                                  In most cases, yes, in the secondary model which is intended to capture those details that don't have a partID on the same line as the price, although some of those conditional options could say No Charge instead of having a numeric value.   I discovered this the hard way when I didn't capture those rare occurrences.  I changed my trap from .00 to be three non-blanks and got lucky that it worked for the particular report I was extracting at the time.

                                    • Complex PDF documents -- help needed
                                      monarchman7 _

                                      Hmm, well if you can floating trap on ".00" you could grab characters to the left and right, and add the ".00" back in excel. You will probably be forced to put the partID with the description in those cases without an "if" condition, but that can be parsed out in excel. In summary, lines 1 and 3 would be appends, but 2, 4, 5, and 6 would be details. Lastly, you can footer "no charge" and use "cleared by" on the detail template. This will notify you when "no charge" shows up, and  if those are rare enough you can edit those lines in.

                                       

                                      I know this goes against you not wanting to do post-monarch manipulation, but it might be one solution (hard to say much more without knowing more details about the report).

                                        • Complex PDF documents -- help needed
                                          monarchman7 _

                                          One more question, are you grabbing page numbers from the report? If so, you can combine the files and do some crafty sorting. If the "if" conditions only show up before or after those without an "if" condition, you should be able to get the same order as shown in the report.

                                           

                                          This would be making use of your 2 models above, instead of the 1 model I posted.

                                            • Complex PDF documents -- help needed
                                              Olly Bond

                                              Hello everyone,

                                               

                                              Monarchman's right - we'd need to see fuller extracts from the report to see if there was a simpler approach to trapping, although with floating traps on PDFs it could be that multiple models are needed. If they are, and re-sorting is required, then the best key is not Recno() but Page() and Line(), which will give consistent coordinates across different models.

                                               

                                              I usually make a single field str(Page();3;0;"0")"."str(Line();3;0;"0") which gives me data like "002.054" for the details that Monarch found on line 54 of the second page. This is really helpful for debugging and auditing models - especially for PDFs with difficult traps.

                                               

                                              If you're working with multiple files at once then adding str(ID();3;0;"0")+ to the start of this formula will give you "003.002.054" - the detail from line 54 on page 2 of the 3rd file.

                                               

                                              Hope this helps,

                                               

                                              Olly

                                                • Complex PDF documents -- help needed
                                                  sdem _

                                                  All,

                                                   

                                                  Olly's suggestion to use Page() and Line() functions solved the problem perfectly!  Monarch is such an awesome tool -- wish I had more time to experiment.  For now, this gets me the data the way I need it, and I plan to add it to all of my reports, whether they require single or multiple models so that I can be sure that the data as presented will match the original price books.

                                                   

                                                  Thanks all for your help.

                                                    • Complex PDF documents -- help needed
                                                      monarchman7 _

                                                      very nice Olly

                                                      • Complex PDF documents -- help needed
                                                        Grant Perkins

                                                        All,

                                                         

                                                        Olly's suggestion to use Page() and Line() functions solved the problem perfectly!  Monarch is such an awesome tool -- wish I had more time to experiment.  For now, this gets me the data the way I need it, and I plan to add it to all of my reports, whether they require single or multiple models /Bso that I can be sure that the data as presented will match the original price books.

                                                         

                                                        Thanks all for your help.[/QUOTE]

                                                         

                                                        Good idea. Check out User Defined Functions and Linked Objects - both of which may help you for commonly used activities in your models.

                                                         

                                                         

                                                        Grant

                                                          • Complex PDF documents -- help needed
                                                            monarchman7 _

                                                            Just to add, the Monarch table keeps the integrity of the order presented in the report. But when exporting out of Monarch it doesn't always maintain that exact order. So, if order is important, it's always nice to export with a recno() field (or obviously Olly's suggestion works with either single or multiple models) both to maintain the order and make post-monarch cleanups if needed (since you can always resort by the calculated field that has the right order).