It sounds like you're doing everything right using OCR and then trying the floating trap, but it is tricky.
When you have data that hasn't been perfectly OCR'd you can have a problem with Date and Numeric fields, as 1234S will be read as either 1234, or Null, instead of 12345. Periods and commas can cause errors of a factor of a hundred or thousand quite easily too. You can avoid these by using Character fields, and then in the Table window defining a Calculated field that cleans it up a little.
would swap S for 5, and you can extend this approach with:
To convert the output into a number, just use:
If someone has helpfully designed an invoice so the amount is floating in the centre of the page, without a field name like "Amount: " to anchor it, then you might find the Numeric Or trap to be more helpful than the floating trap. The Numeric Or trap symbol looks like "¦" - if you put a few of these in the range of columns where you'd expect to see numbers, and perhaps a few blank traps "B" in the space around it, you should find you can trap most data. Bear in mind that Monarch won't let you combine the Numeric Or trap and the Floating trap.
If you'd like to send me a model and some example files by email, I'd be happy to have a look.
Just to add to Olly's advice...
With all the PDFs coming from different sources I would be slightly surprised if they were all created the same way into 'graphics' images rather than text documents. Monarch can, in principle but with some reservations at times, read PDFs containing text without the need for OCR. However if you operate within a specific industry it is possible that the source systems producing the PDF files are all the same or very similar and use 'graphics' representations of documents rather than 'originals' and that forces you to head down the OCR route.
The well known problem with OCR activity is that it is rarely totally error free. This is why most of the OCR software products include potential error information at the review stage. If you can be certain that any errors (per document source) can be identified and corrected according to infallible rules (no matter whether it is a human or a computer doing the correcting) then certainly you can put Monarch to work doing the correcting. But you really do need to be certain about that in my opinion. Humans can often spot and check possible errors that a machine will not unoess coded very cleverly indeed.
One option would be to rean the entire OCR output into Character fields 'as extracted' and then have Monarch generate a calculated field for each extracted field and build in error checking at that point. So for a simple example - if the calculated field should be numeric but the extracted data does not translate to a pure numeric that would flag an error. Potentially errors in other field types could be spotted in the same way, excess spaces could be tidied up and so.
You could then allow the operator to enter the correct information via an interactive session of Monarch and make the 'translated' fields the export at the end of the process.
I won't go into detail at this point as it is the concept that you need to consider at the moment and I am relutant to overload you with too much by way of Monarch instructions when you don't know the product well. Suffice it to say that if I wrote everything down in a comprehensive document step by step it would look like a lot of work and quite complicated which it is not. But how mush effort is really required will be a function of the success or failure of the OCR process and analysing those inputs thoroughly might take a while.
As Olly suggested - if you have one or two PDF file examples that you could share I would be happy have a look and make some rather more specific proposals.
Your partner beyond the plate:
7950 SPENCE RD
7950 SPENCE ROAD
NET 30 DAYS
QTY. SALES PRODUCT
SHIPPED 1/ UNIT NUMBER
DaNERV ROI1TE: CJ::>::JL I
7950 SPENCE RD
770 993 0111
DEPT # 00
GA SHIP DATE:
DESCRIPTlON PACK SIZE
10 CS 4793048 APPETIZER. ASST PTITE QUICH 4/25 EA
INVOICE SUMMARY ***
sharon pu pxs
2220 8000 05/26/11
I\IUMRFR' 1 J,,"'il~"
U. S. FOODSERVICE. INC.
PO BOX 281945
800 241 7677
LABEL 0 WEIGHT
D UNIT PRICE PRICE
PRESENTATN CS 50.3900 $ 503.90
TOTAL ~GT SHIPPED: 52.00 PIECES ORDERED: 10 PIECES SHIPPED: 10 ITEMS SHIPPED: 1
PLEASE REMIT THIS AMOUNT BY
ir':fll::S: +"". t:01tta:IlC pry. 1ft! in d"!fa Qb.J.al1d n Ii:EDIWa 0"dI ~ 1/:_.
Visit www.us1ood.com for a fast and easy way to order. DISTRICT COpy
PRODUCT TOTAL $
TAXABLE AMOUNT $
GEN SALES TAX
06126/11 AMOUNT $
W.;1I¥udate. ~ If..,..,..,.
I have an issue with pdf report. Every week I want to import payroll stub in Monarch. The pdf is generated by payroll system. For a reason that I can't understand during many it's working properly. But after this every week I have to modify my model because fiels move place . When we open the pdf the size is exactly the same but in Monarch there is huge differences.
This is very time consumming can someone help me, because it's always the same reports from the same system weeks after weeks.