3 Replies Latest reply: May 15, 2014 9:54 AM by Grant Perkins RSS

    PDF file text not wrapping correctly

    JJ _

      Hi All,


      I am trying to extract from a PDF file, I created templates and when I view the text in the table format, what used to be a single entry (wrapped text) appears on multiple lines.  Can anyone suggest how I can correct this?


      Thank you.

        • PDF file text not wrapping correctly
          Grant Perkins

          Hi JJ,


          I'll take a guess that you have hit the difficulty of defining traps for files which are mainly formatted text.


          What you need to do, as you probably know already, is use the multi-row field capability and most likely define the field as MEMO type.


          The problem is that you then need to be able to define a template that applies ONLY to the first row in the block. If it applies to all rows the template will select each row separately.


          From that you can see that it would likely be quite easy to define a trap to select a single block of text - just use as much of the text as you need from the first row to make the row unique and then set the advanced properties for the field to a suitable 'end on' value.


          The problem with that is that it will be unlikely to pick any other text blocks - unless all first lines start the same way.


          If there is anything on the line above the text block - a paragraph title for example - you might be able to trap on that line and use a multi-line sample uin the template to include the start of the text block and make it multi-row from there.


          If there is nothing obvious that can be used directly from the original I would consider the possibility of extracting in multiple rows and finding a way to add an identifier to those rows which represent the first row of a paragraph or a section of text that you want to treat as a single field. Put that in a column in the table. Export the result to a new report file. Then use that as a source file for a new Monarch model in which you can identify the single start row for the text block you require.


          Whilst that is a fairly easy idea to write down it may be a little more difficult to work out how to do it! Each case will need to be considered according to the layout of the layout of the report.


          However you might consider something like -


          Define "Page header" templates that split the report into logical block of text as you want to see them. Add a calculated field for the Page Number for each 'record' (i.e. line lf text.)


          Create a Summary with the Page number in column 1 and the text fields in column 2. Suppress the display of duplicate data rows for column 1.


          Export the resulting summary to a PRN file. (though this may be a little interesting if you have some very large Memo text fields ...)


          You will now have a new report with an identifier at the begiining of every row where a new section is indicated. Build a template that uses this identifier and the multi-line field will work as you need it to assuming the amount of text does not exceed the MEMO field limit, which is unlikely I would have thought.


          You may need to play around with the settings for the export to work as expected - not sure as I haven't played with anything like that recently and don't recall the details of what happened in older versions of Monarch. The theory is sound though.


          Does this help at all?




          • PDF file text not wrapping correctly
            JJ _

            Hi Grant,


            Thank you for the comprehensive solution you've posted.  You're correct in assuming that the document I am working with contains a lot of memo type of entries. Unfortunately, I am not familiar with the workings of Monarch, and I can't do what you're suggesting in terms of using a combination of pages headers, calculated fields and summaries.  Would you be able to elaborate further?


            I have tried using the advanced field definitions, however, some of the entries extend over multiple pages, so even the advanced field definitions failed to catch those.  I wonder what else I can do?


            Thank you very much.

            • PDF file text not wrapping correctly
              Grant Perkins



              Hmm. Sounds like quite a challenge.


              I'm not sure what I can elaborate on without trying to cover everything - which would not really help you.


              If you have the training guide I could point you to the most likely useful chapters or the help files and examples which are very good, though there is always the possibility that what you have to deal with is not covered by 'normal', if there is such a thing, approaches.


              Two suggestions.


              Firstly, if the PDF file you have to work with is not too confidential or too large I would be happy to take a look at it and develop a model to demonstrate the concepts that should give you the results you need (assuming there are not really strange things going on in there!) and so provide a head start for your task. I would need the file and and a description of what you want from it. Both of which you could email to me and I will give you and address if the idea is feasible.


              Secondly it sound like your text over multiple pages problem should be addressed by setting up a Page Header template so that the lines (including blank lines) that can be considered to be part of the repeating page header, rather than the body of the report, become 'invisible' to the the detail or append templates that need to read either side of them.


              I could be wrong there but from your description it sounds like that might help.


              If the file (or a reasonably similar file) can be made available send me a Private Message via the forum and I will respond with my email address.


              Hope this helps.