6 Replies Latest reply: May 15, 2014 9:55 AM by datapro _ RSS

    PDF and Models

    datapro _

      first of all i want to say that my english is not very good because im a german who learned french in school.

       

      My Problem is that when i open a report from a PDF file and i want to put the model over this report i can't see these darker marks at the lines. but thats not the problem. the problem is when i want to see the analysis of the report it looks like i have taken the wrong model but when i look at the information i see that it's the right one. so my question is: is it a bug? or is it my mistake.

       

      i hope you could give me some answers

       

      bye bye

        • PDF and Models
          Grant Perkins

          Hi datapro,

           

          Welcome!

           

          Your English reads OK for me.

           

          My understanding is that you have created a model for a PDF file, is has worked OK and you have saved the model.

           

          Now you have a new version of the file and the model does not seem to work - it is not recognising the data fields defined in the existing model templates.

           

          Is that correct?

           

          Here are some suggestions.

           

          When you apply a model to a text file the expectation is that the format of the information in the file will be the same for the loaded report as it was for the original report. For most reports this is true. But sometimes the report format changes and the model will not work without being changed as well.

           

          Sometimes models can be created using specific traps for templates and the traps used do not appear in later versions of the report - so it can look likke the data is not recognised but really it is a problem with the trap definition.

           

          The PDF conversion process can suffer from both of these problems beacuse, in effect, the PDF file is first being translated to a text file and then the text file is being mapped in the model.

           

          The extra problem potential with PDF files is that first conversion to a text file. If the 'new' pdf file has been created (internally in the PDF creation program) slightly differently to the interpretation of the original file the horizontal positioning of the lines may be different. If this happens the traps for the templates might not work unless they are slightly modified.

           

          It may be possible to fix such a problem by adjusting the way the PDF file is interpreted. This may help to correct any variable spacing introduced by interpretation of characters and fonts for example.

           

          Alternatively, if the problem is not consistent for each version of the report, there may be a way to work around the problem of the traps using the floating trap feature in Monarch.

           

          It is difficult to be specific about problems like the one you describe - there are many possible variables. But I think you might find the reason for your problem if you consider the points made here.

           

          If you are still left with the problem unidentified (I assumed this is not a question of fonts and character sets) then it may be a good idea if you can allow someone else to have a look at the PDF file and the model you have made to see whether any other reason can be identified.

           

          When you open the PDF file I assume you can see some text displayed? There must have been some when you defined the model. If you do not see any text it might be that the entire report has been produced as a graphic content with no separate text. (The text in the graphics.) If that is the case then Monarch would not be able to read it and a very different approach would be necessary.

           

          I hope this helps and that what I have written is OK for you to understand.

           

           

          Grant

          • PDF and Models
            FM DJ

            In the below quote to datapro you mentioned "It may be possible to fix such a problem by adjusting the way the PDF file is interpreted."  Having always worked in the past with Monarch on electronic mainframe reports, I've never had to deal with interpreting PDF files.  I now find myself in that situation.  Exactly where would you make these adjustments to interpreting PDF files, in Acrobat or Monarch or both?

             

            Thank you,

            FM DJ

             

            Originally posted by Grant Perkins:

            Hi datapro,

             

            Welcome!

             

            Your English reads OK for me.

             

            My understanding is that you have created a model for a PDF file, is has worked OK and you have saved the model.

             

            Now you have a new version of the file and the model does not seem to work - it is not recognising the data fields defined in the existing model templates.

             

            Is that correct?

             

            Here are some suggestions.

             

            When you apply a model to a text file the expectation is that the format of the information in the file will be the same for the loaded report as it was for the original report. For most reports this is true. But sometimes the report format changes and the model will not work without being changed as well.

             

            Sometimes models can be created using specific traps for templates and the traps used do not appear in later versions of the report - so it can look likke the data is not recognised but really it is a problem with the trap definition.

             

            The PDF conversion process can suffer from both of these problems beacuse, in effect, the PDF file is first being translated to a text file and then the text file is being mapped in the model.

             

            The extra problem potential with PDF files is that first conversion to a text file. If the 'new' pdf file has been created (internally in the PDF creation program) slightly differently to the interpretation of the original file the horizontal positioning of the lines may be different. If this happens the traps for the templates might not work unless they are slightly modified.

             

            It may be possible to fix such a problem by adjusting the way the PDF file is interpreted. This may help to correct any variable spacing introduced by interpretation of characters and fonts for example.

             

            Alternatively, if the problem is not consistent for each version of the report, there may be a way to work around the problem of the traps using the floating trap feature in Monarch.

             

            It is difficult to be specific about problems like the one you describe - there are many possible variables. But I think you might find the reason for your problem if you consider the points made here.

             

            If you are still left with the problem unidentified (I assumed this is not a question of fonts and character sets) then it may be a good idea if you can allow someone else to have a look at the PDF file and the model you have made to see whether any other reason can be identified.

             

            When you open the PDF file I assume you can see some text displayed? There must have been some when you defined the model. If you do not see any text it might be that the entire report has been produced as a graphic content with no separate text. (The text in the graphics.) If that is the case then Monarch would not be able to read it and a very different approach would be necessary.

             

            I hope this helps and that what I have written is OK for you to understand.

             

             

            Grant /b[/quote]

            • PDF and Models
              Grant Perkins

              Originally posted by FM DJ:

              Exactly where would you make these adjustments to interpreting PDF files, in Acrobat or Monarch or both?

               

              /b[/quote]Hi,

               

              You have my sympathy. Mostly mainframe reports show more consistency than the 'newer' options.

               

              There are better descriptions elsewhere but the summary is that PDF files are more complex to interpet programmatically because, even within text that Monarch can work with, they allow different fonts leading to potentially different character spacing and all that goes with that complex area of computing.

               

              Monarch needs to read the fonts and decide how to handle them for presentation as more consistent text - in the way that mainframe reports would be presented for example - by converting the characters to the current default Monarch font. This requires some compromise sometimes and there will be situations where the underlying format of the text will not be completely unambiguous at first assessment.

               

              I think Gareth Horton has documented, elsewhere in the forum, the updates that were made for the 8.01 release based on initial feedback from version 8.0. The update represents an improvement in the initial automated assessment that Monarch makes when asked to open a PDF file. (It's not the the initial release was bad, just that more challenging examples of what PDF writers can produce came to light when people started to use the new features!)

               

              So I would recommend the 8.01 update as a starting point.

               

              On open Monarch will make a best assessment analysis and present the format appears should give a suitable result. Often this will be OK. But sometimes, and in some sections of a PDF file, the format may be less clear or may conflict with the general assessment. (Remember that at this point Monarch has no idea what you are intending to extract.) So there are adjustment values provided.

               

              The best starting point for understanding the options is the section of the Help linked from the PDF Import Options dialog window.

               

              For the background to what is being considered in the process have a look at the link to  "Customizing the PDF Import Options" in the second paragraph of that help text.

               

              I found the quickest way to get a feel for what was going on was to read the information and then spend a little time just playing with the facilities using a couple less 'helpful' pdf files littering my system. Getting it right first time is not too important since the results can be quickly amended and observed using the adjustments, so I felt that a simple general understanding of what does what was enough and in practice that seems to have been effective enough for me. So far ...

               

              HTH.

               

               

              Grant

              • PDF and Models
                datapro _

                So how i understand i have to make a new model for the pdf reports. i just try my old models wich i've made for the "normal" reports. That means to me if i want to use the PDF reports i have to build a new model. OK than it wasn't the fault of the software      than it was my fault.

                 

                bye bye

                 

                thanks for helping

                • PDF and Models
                  Grant Perkins

                  Originally posted by datapro:

                  So how i understand i have to make a new model for the pdf reports.  /b[/quote]Ah!

                   

                  Yes a new model is required because of the extra activity in the process to convert the PDF file to text.

                   

                  I suppose it may just be possible that you could get the pdf to text conversion into the same format as the original reports when you convert. If so the original model might work.

                   

                  But my guess would be that the chance of getting a match is very small indeed.

                   

                  If the models are quite simple re-creating them should not be too time consuming.

                   

                  If they are large and complex, but the format of the report is still the same basically, you may be able to apply the old model to the imported PDF and adjust the trap positions and then save as a new model. You would also need to 'move' the fields I would expect. There is a new facility in Version 8 to allow you to adjust both of these on screen by 'pushing' the current definitions to the left or right. It may help you for this re-modelling.

                   

                  HTH.

                   

                  Grant

                  • PDF and Models
                    datapro _

                    Thanks for your answers you've helped me a lot

                     

                    so if i have another question i know where i have to post