Greetings consu and welcome to the forum.
One possible approach to this type of challenge lies in painting the fields wider than seems initially necessary to compensate for the shifting that occurs, and then cleaning up the extracted information with calculated fields as necessary.
Of course, each extraction problem such as this can be different and have its own unique particularities, so it would help enormously in attempting to assist if you could [URL="http://www.monarchforums.com/showthread.php?t=2290"]post a sample[/URL] of what you're seeing.
I agree with Kruncher 100%.
PDFs and Floating traps are each, individually, good sources of real challenges and to work with them together is, shall we say, brave.
It seems that PDFs that are variable are not at all uncommon. In fact I seem to be finding more and more of them and they can be very frustrating to deal with. Fonts types and sizes can be bad influences and now and again the internal structure of the file may be 'unusual'. Try using the Adobe PDF Reader to convert the file to TEXT and see what it offers. I ran that against a fairly simply looking columnar report today and found that the resulting lines included a few where columns had been repositioned. It made me wonder what the internal information looked like and how on earth Monarch managed to keep the columns in the right order.
As for floatinig traps ... if you will be defining a lot of fields I think you will struggle to get a floating trap to work to extract the fields in one attempt. It wouold be unusual for the data content to be so consistent even if the position of the fields were not changing from one line to the next or one page to the next.
The basic rule for floating traps is that, ideally, you need a sample line that would match to the MAXIMUM width of each field you need to define and that has clear and unambiguous trap character positions that do not themselves form part of the data you need to extract. You may need to create the sample line specifically for the purpose when you create the model. There is a good chance that the reports you work with will not have that perfact line to use for building the trap and template.
If you connect this requirement to the potentially variable PDF extraction ... well, you would be very lucky to get more than a 2 or 3 field extraction to work.
However, it may be possible to extract from each line to create 2 or 3 fields that can then be further processed by 'slicing and dicing' techniques to deliver the information you ultimately need. The benefit of thinking in those terms is that the problems of the PDF extraction may also be minimised because you know that you can accept a 'rough' extraction because you will in all cases be processing the results some more.
It's is certainly not guaranteed to work as a single process - but there is a good chance that it will have potential to do so.
Is there any chance that you have a version of the PDF file that you could share with people for assessment? If you have that wouold probably offer the quickest and most effective way to obtain suggestions about the possible approaches you cold take.
I'm assuming that you can regularly trap 100% of the appends, it's just that, thanks to the inconsistencies of the PDF, you can't get all of the detail with any single trap combination.
In that case, trap everything (a blank character in the right hand margin is usually efficient) as a detail and grab the whole line as one field into a table. Now you can use calculated fields and filters to get rid of the stuff you don't need and leave yourself with just the data you want from the detail rows.
Another approach is to attack it in two passes, using Page () and Line() to connect the two interpretations of the same report data. Add Page() and Line() as a calculated field - I tend to use Row = Page()+(Line()/1000) to give me row numbers like 1.001, 1.002 etc. You can then join in external lookups - this technique allows you to combine multiple detail traps on one report.
Hi guys! Thanks alot for taking the time to help! I have since my tread tried several approaches. I am (was) an experienced v5 user, and we recently purchased v10.5 to get the PDF import possibility, and I must say in general it is a great breakthrough! We receive various data in a pdf format, but it seems sometimes, the characters that normally "should be" in same position on every page is moving position 1 or more characters. I have found that with trying different PDF imports settings it sometimes is "almost" perfect. The monospace function solves some problems, but not always. Then, when specifying the floating trap, it seems that the float acts a little "unlogic" to my ability to understand why it does not trap in correct position. I would like tro try & fail some more, and come back to this thread with better example if I still struggle!
Thanks again for keeping the forum an active place for assistance!:rolleyes:
You have picked a good challenge with PDF files.
If the initial problem you face is that some pages have line start point shifted compared to other pages you could consider pre-processing the first closest interpretation of the PDF to get a new 'improved' output file and then run another model against that.
For example if you create a model that simply grabs the entire length of every line as a field and them use TRIM() to remove leading spaces you would get a more consistent start position. Export that to a new file and run a new model against that file.
You may be able to apply more than one adjustment at that stage.
For some observations about floating traps I made a recent post on herzberg's thread [URL="http://www.monarchforums.com/showpost.php?p=13926&postcount=3"]here[/URL]. I suspect by the post timing that herzberg applied Kruncher's solution and in any case the problem is different to yours but the general comments about what is required for floating traps to be successful will still apply.
Of course you might be able to break a difficult line into multiple parts, process each part and then build the result back together. That could take some work in modelling but, once done, could provide a good solution for whatever variable format future versions of the report(s) would present to Monarch. At least the computer would be doing the work ....