3 Replies Latest reply: May 15, 2014 9:53 AM by Grant Perkins RSS

    model-Problem

    Oli _

      Hello,

      I've got this txt-file comes out of OCR-Software. How can I manage to make a model, that shows all information in the right colums:

      Beleg|BA|Ihr Beleg|Datum|Skonto|Bruttobetrag

       

       

      [font="courier"]Beleg     BA Ihr Beleg  Datum             Skonto     Bruttobetra

       

      300009228 GS061945301   24.03.2005        1,67-         55,80-

      300006868 GS108465702   03.03.2005        0,26-          8,64-

      300005587 GS108477702   07.03.2005        0,23-          7,64-      

       

             Beleg       BA  Ihr Beleg      Datum                  Skonto        Bruttobetra

       

              

             100027982   ER  303366101      14.02.2005             4,47            148,90

             100032790   ER  303366201      14.02.2005            41,24          1.374,60

             100040692   ER  303366401      14.02.2005            44,77          1.492,20

             100040688   ER  303366501      14.02.2005            19,36            645,41

             100023874   ER  303366601      14.02.2005             7,44            247,84

       

       

       

             Bankverbindung:

             xxxxxxxxxxxx xxxx 12000        xxxxxxxxxx xxxxxxxxxxxxxx

      23/05 '05 MO 07:56 FAX xxxxxxxxxxxxxxxxx      xxxxxxxxxxxxxxxxx                               U

               xxxxxxxxxxxxxxxxxxxxx Ges.m.b.H.

               Europastraße 3, xxxxxxxxxxxxxx

       

       

       

       

       

       

       

       

               xxxxxxxxxxxxxxxxxxxx                         Beleg / Datum                  Seite

               xxxxxxxxxxxxxxxxxxxx 6                         3000058302 / 10.05.2005 27

               xxxxx xxxxxxxxx

       

       

       

               Beleg        BA  Ihr Beleg        Datum                     Skonto          Bruttobetra

       

               Übertrag                                              10.890,01          363.008,90

               100033141    ER  303370502        14.02.2005                4,87              162,17

               100030493    ER  303370601        14.02.2005                4,28              142,75

               100030492    ER  303370701        14.02.2005                1,16               38,76

               100030490    ER  303370801        14.02.2005                2,14               71,38

               100030491    ER  303370901        14.02.2005                3,21              107,09

               100030495    ER  303371001        14.02.2005              15,55               518,40

               100027975    ER  303371101        14.02.2005              16,80               559,94

               100027981    ER  303371201        14.02.2005                9,68              322,70

               100027976    ER  303371301        14.02.2005              16,06               535,46

               100027978    ER  303371401        14.02.2005                5,51              183,60[/font][/quote]Thanks for your help.

       

      Kind Regards

      Oli

        • model-Problem
          Grant Perkins

          Oli,

           

          I will guess that you have already experimented with the floating trap approach (based on your earlier post).

           

          I have not yet tried my own experiments with the sample data posted so I do not know how well the data aligns if we manage to get all the lines to shift to the same left side start position. It may still leave some problems, but I think that is what I would do first if the floating trap still leaves problems.

           

          You can modify the report in one pass to give a new report.

           

          Just select the entire line for every line as a complete data field, and then use a calculated field to LTRIM the one field you get.

           

          Print the resulting table to a file or save it as a text file and you should have a slightly easier report to work with.

           

          Alternatively use the reult of the first extract to SPLIT the field (LPSLIT OR RSPLIT OR some of the other functions) into the separate fields directly from the single field selection rather than making a 2 stage process.

           

          If I get a opportunity I will experiment with the sample you posted.

           

           

          Grant

          • model-Problem
            Oli _

            Thanks Grant,

            like you already presumed, I played with the floating trap. But I didn't get any usefull results. I try to improve the OCR-Output, to get a better quality of datas, with which I can work with Monarch. But if you have success in this challenging task, it would be great to hear from you.

             

            Oli

            • model-Problem
              Grant Perkins

              Oli,

               

              The floating trap will have problems here because the OCR output results give variable horizontal positions for several fields with a variable number of spaces between the fields.

               

              An additional complexity is that the output does not always give a space between the BA and Ihr Beleg columns.

               

              There are ways of dealing with these problems but the formulae required can become very complicated to read and interpret. You may also need to build in some 'just in case' redundancy of features. The missing space mentioned above is one example and I have developed a model that deals with the problem for that field. HOWEVER the OCR program MIGHT produce the same problem for other fields at other times. So when creating a generic model you need to consider whether all fields need to be extracted with some sort of definitive safety feature in place.

               

              The other problem here is that there are a variable number of spaces between each of the fields so we cannot just substring the data by their position on the line. The formulae have to attempt to deal with variables. (If you can be sure that the solution is only required for Version 8 equipped users you could use the new INTRIM feature early in the process and make the formulae much easier to follow.)

               

              All of that means that you will need to test the results carefully and maybe look for some way to to provide an internal check on the validity of the data being extracted.

               

              I will send a model that provides some answers for these issues and WORKS WITH THE POSTED DATA SAMPLE but I cannot be sure it will work with all possible outputs from you OCR program, though it might! This was developed for V7 and would work with V6 or V5 and probably V4 as far as I can remember.

               

              It also includes a filter to keep only the lines you require in the table.

               

              The fields you need to create are extracted in a single process.

               

              The steps are:

               

              1. Read the file and select all of every line.

               

              2. In the table, create a filter to leave you with only the detail lines you really want.

               

              3. Trim leading spaces to get the left line alignment by creating a new calcualted field. (Also it makes the other variables more visible.)

               

              4. Using various techniques involving the LSPLIT, RSPLIT, TRIM, LTRIM, RTRIM, SUBSTR and similar functions, cut the line up to separate the various data into the fields required. (6 calculated fields.)

               

              5. Re-order the fields you wish to publish so they display in the correct order and hide the rest.

               

              6. Export the data fields in the required format.

               

               

              A mail with the model and the sample file I used will be on its way shortly.

               

              Grant