10 Replies Latest reply: May 15, 2014 10:12 AM by Grant Perkins RSS

    Problem opening a saved monarch model

    Mandvika _

      I created a monarch model that contains extraction templates for customer invoice pdf files.  I saved the model.  To reopen it I opened the pdf and then opened the model.  I see the saved model data field definition come in correctly but when I go to data view no data is extracted and no data from the pdf shows in the table.  Do I need to do anything else so that the model extracts data from the open pdf?

       

      Also, Can I save the model as a project so that it opens the pdf from a location automatically?

        • Problem opening a saved monarch model
          Chickenman _

          Hi Mandvika,

           

          Sounds like you need to reverse your steps here.

           

          Open Monarch and from the File[/U] menu choose Report from the dropdown and browse to your PDF and select it to bring into Monarch.

          Then from File[/U] again choose Model and select your Model and you should have your data in table view.

           

          Yes, creating a Project will allow you to simply invoke the project and it will automatically fetch the report and open the Model.

           

          CM

            • Problem opening a saved monarch model
              Grant Perkins

              Mandvika,

               

              Further to Chickenman's suggestions I just wanted to check if you are opening the exact same report you used to develop the model. PDF files can be very variable (for reasons discussed in several posts here and also covered in an article in the blog iirc.) and there are some specific approaches recommended for dealing with input files which may come form the same source and look OK but are not always consistent.

               

              If you ARE using the same report and it all worked OK when you developed the model (and are opening Monarch and the report and the model as described by Chickenman)  then something strange is going on. We could speculate on what that might be but it woould probably be bset to know the answers to the above before heading into that particular forest!

               

              HTH.

               

              Grant Perkins

                • Problem opening a saved monarch model
                  Mandvika _

                  Thanks Grant and Chickenman!

                   

                  I finally managed to open the pdf and then the model. I could see all the data in the table but I can see the data only for the pdf I used to create the model.  If I open any other pdf with same layout no data shows up.  Does that mean that the template I created to fetch the data in the model is incorrect?  Also the names of the pdfs I am using are different, can that affect the extraction through the model too?

                    • Problem opening a saved monarch model
                      Mandvika _

                      It seems the templates I made are not working on all pdfs even though they are all similar because of variable spacing between data in different pdfs

                      example-

                       

                                        ßßßÑÑÑÑÑÑÑÑÑÑ                                                                    ßßßÑÑ/ÑÑ/ÑÑÑÑ                

                                                   4520089103                                                                    ßßß10/12/2011   

                       

                      Please guide me on how I can define templates to extract data from inconsistent pdfs.

                       

                      Thanks!

                        • Problem opening a saved monarch model
                          Grant Perkins

                          Hi Mandvika,

                           

                          Good to know you have identified a reason for the problem  - that's a good step towards working out the solution.

                           

                          If you have not already read it can I mention that the Monarch Blog post by Anwar Ali

                           

                          http://blog.datawatch.com/the-secret-to-extracting-data-from-pdf-files/#more-967[/URL]

                           

                          is a useful backgrounder for explaining the sorts of issues that some PDF sources can throw up to play with.

                           

                          There may not be a single piece of advice that helps you to resolve your problem.  However I would first try some experiments with the scaling and formatting adjustment tools available for the PDF conversion to text. You may need to modify you model as you go to re-position the traps.. You will be looking for something that makes all the positions consistent even if the format looks 'wrong'.  It may be worth mentioning that less (trap character positions) can be more in that context - so long as the tap is still reliebale and obtains what you need.

                           

                          You may also need to consider using the floating trap. If you have not used that feature already be aware that it is very powerful but that you need to understand well how it needs to work to make it effective. Or at least that has been my experience when used with troublesome PDF files with trapping requirements that are more than basic level.

                           

                          If neither of those approaches seems to be getting you anywhere you could consider at least 3 more options.

                           

                          Firstly  - recreate the PDF files if that is an option.

                           

                          If they should be consistent but are not it can sometimes help to simply open the file in a program that can edit them and re-writing them. Doing so can, sometimes, eliminate the internal issues that cause the PDFs to look the same as presented but be structured differently internally and so mess with the analysis and extraction routines. Of course it may not be practical to do that in your final production process but if it offers a benefit in testing it means that you have some idea about where the problem originates.

                           

                          Secondly you could try extracting ' as it is'. Just grab every line of a number of sample PDF files and convert them to text as complete lines, writing out a new text file with whatever formatting appears! You could run the same conversion using the Adobe PDF reader's own text conversion and compare that with what Monarch produces.

                           

                          That alone might provide some ideas for freh approaches. But you can take things further by combining the files into a single document (Concatenate them) and then open in Monarch and, assumng you have a suitable version of Monarch, use the Auto trap generation feature to work through the file to see what is suggested. Repeat the approach for different spacing and format adustment values if necesary.

                           

                          Thirdly , if the individual report ouyputs just won't play nicely, go back to the straigh text extracts (as a starting point) and see what they offer by way of a basis for text line manipulation for slicing and dicing the contents into the require fields. It may mean a little more work with calculated fields in the model but if it simplifies the rtapping and makes the extraction more reliable and future 'PDF format proof' it can be a very good approach indeed.

                           

                          So, those are the main approaches I can think of although they are not necessarily a complete list of all options.

                           

                          See where they take you. If you have any sample PDF files that you can share with others that would be good. It's always easier to work with real stuff than trying to explain options remotely with no view of the source of the problem nor options to experiment.

                           

                          Rest assured there are many approaches to be tried - which is actually quit a challenge when trying to offer targeted responses ...  !

                           

                          HTH.

                           

                           

                          Grant Perkins

                            • Problem opening a saved monarch model
                              Mandvika _

                              This is a sample pdf. I want to capture the line items starting with 10, 20.....The data fields are same but the spacing between them differs.Can you suggest a good approach to handle this problem?

                               

                                                                                                              10            21320963            2770882KXXS           162-277088-KXS  Unchanged / H619 2KX N-NewOrleans SaintsT. Whi         8.91            EA  112         11/01/2011 //                  //

                                                                                                              20       21320964            2770882KXS            162-277088-2KX-SUnchanged / H619 2KX N-New    8.91            EA  260             11/01/2011                 //                                           //

                                                                                                              /CODE

                                • Problem opening a saved monarch model
                                  Olly Bond

                                  Hello Mandvika,

                                   

                                  10 21320963 2770882KXXS 162-277088-2KX-XS Unchanged / H619 2KX N-New Orleans SaintsT. Whi 8.91 EA 112 11/01/2011 // // 33901

                                  20 21320964 2770882KXS 162-277088-2KX-S Unchanged / H619 2KX N-New OrleanSaintsWhi 8.91 EA 260 11/01/20 // // 33902

                                  /CODE

                                   

                                  If you trap the lines with NNB (numeric, numeric, blank) as the trap characters, and select the whole line as one character field called , then you should be able to get the data you need by using functions like:

                                   

                                  lsplit(intrim();15;" ";3)

                                   

                                  for the "2770882KXXS" data, or:

                                   

                                  lsplit(intrim();15;" ";5)

                                   

                                  for "Unchanged".

                                   

                                  Combine lsplit(intrim()) with substr(), extract() and rsplit(intrim()) and you should be able to get most data.

                                   

                                  Hope this helps,

                                   

                                  Olly

                                  • Problem opening a saved monarch model
                                    Mandvika _

                                    I am having problem with capturing the Quantity field that comes after EA (example EA  260 in the first line itme) . The position of this field  differs in all pdfs. Is there an easy to capture this?

                                      • Problem opening a saved monarch model
                                        Olly Bond

                                        Assuming that you've captured the whole line as one big character field called as per my previous post, then try defining a numeric calculated field called Qty as:

                                         

                                        val(rsplit(intrim(;20;" ";5))

                                         

                                        HTH

                                         

                                        Olly

                                          • Problem opening a saved monarch model
                                            Grant Perkins

                                            Hi Mandvinka,

                                             

                                            Looking at your sample suggests you have pdf content that really should be treated as a database record dump. Each record should be on one line and then, preferably, read into Monarch as a database (assuming the pdf creation process has not changes the structure) or  be dealt with as a single line record in a text 'report' where you can map the field by the character position on the line. In both cases having the record data structure will help to define the model template.

                                             

                                            If the process, especially the pdf write part of it, has messed around with the format of the line by compressing spaces (for example) the exercise may become a little trickier. However so long as the character count is correct and consistent mapping the field should be easy enough.

                                             

                                            If the pdf write process has added extra page formatting (ie, set a resticted 'page' width and embedded some line feeds) you may need to put a preparatory process in place to undo that work. Monarch Utility offers some tools for the task but most technical scripting editors will also allow substitution scripts to be developed to return the format to the original database output.

                                             

                                            If my assessment is correct (based on what I see in your sample) and if it was my project I would not consider trying to deal with the pdf file as a multi-line text report unless some other factor made it imposible to work with it any other way. (And then I would be looking to ask the source to provide the information in an entirely different form if possible!)

                                             

                                            HTH.

                                             

                                             

                                             

                                            Grant