5 Replies Latest reply: May 15, 2014 9:57 AM by Nick Osdale-Popa RSS

    Cleaning up text

    Michael J. Hansen

      I have a huge file with help information from a mainframe system, that contains commands, that will result in text being highlighted or colored, lines skipped etc when looking at it on a mainframe screen. I need to clean out these commands to give me just the relevant text, but cannot really find a good way of doing it - I know that I will likely have to do it in several steps, but that would be ok. Let me give an example and state how far I have gotten sofar - my text string could look like this:

       

      %AON (n/g/n/n/n=) %AON (h/t/n/n/n=`) %PAR (fl=61,SL=1,ti=7) When you input B/L number, Voyage code will be displayed from RKDS.

       

      I obviously want the finished result to be:

       

      When you input B/L number, Voyage code will be displayed from RKDS.

       

      So far, I have been able to identify the commands by using the Extract function, like this:

       

      Extract(Descr,"%",")") - but if combining it with e.g. the REPLACE function, it will remove all blanks, slashes, and letters n,g etc from the text, not just the exact string "AON (n/g/n/n/n=". Does anyone have some ideas? Again, it does not matter if I have to do it in a number of steps

        • Cleaning up text
          Data Kruncher

          Hi Michael,

           

          The flexibility offered by Monarch will likely permit several solutions to this challenge; here's my option.

           

          I wanted to be sure that any solution would handle any number of % commands, just in case your system varies and doesn't always give you three per line.

           

          As a test, I worked with this sample:

          [font="courier"]%AON (n/g/n/n/n=) %AON (h/t/n/n/n=`) %PAR (fl=61,SL=1,ti=7) When you input B/L number, Voyage code will be displayed from RKDS.

                                         

          %AON (h/t/n/n/n=`) %PAR (fl=61,SL=1,ti=7) When you input B/L number, Voyage code will be displayed from RKDS.                  

                                         

          %PAR (fl=61,SL=1,ti=7) When you input B/L number, Voyage code will be displayed from RKDS.                                      /font[/quote]To extract your desired string, I built this formula:

          [font="courier"]lsplit(Descr,

          len(Descr)-len(strip(Descr,")"))+1,

          ") ",

          len(Descr)-len(strip(Descr,")"))+1)  /font[/quote]This uses the logic of the character counting solution to determine the number of %commands in the string: len(Descr)-len(strip(Descr,")"))

           

          It then adds 1 to the result to account for the string at the end that you need. Then the Lsplit function divides the whole string into that number of parts, and returns the last part.

           

          Of course, this solution falls apart if you have the string ") " within the text you're trying to extract, as in "will be displayed (test) from RKDS."

           

          How does this work for you?

           

          Kruncher

          • Cleaning up text
            Grant Perkins

            Hi Michael,

             

            After your kind words on the other post (for which thanks) I rushed straight here to see what I could offer and discovered that Kruncher has suggested an excellent solution already.

             

            If you are lucky there may be another alternative with the 'command' sections always finishing in a similar string - ",ti=xxx)" for example. As with DK's solution you could find the end position of that string and create a calculated field using the result to remove the leading characters.

             

            With the same caveat about the use of the ")" in the info you want to keep, a RSPLIT() on ")" might be worth looking at.

             

            I'm wondering if there are more constraints in the full file.

             

            HTH.

             

             

            Grant

            • Cleaning up text
              Michael J. Hansen

              That certainly does work, but as you say does not really take care of text in brackets thats not a command. So, I replaced lunch with a couple of brewskis - always makes me think better - and then I came up with this:

              - since this is a one-shot deal, i cleaned out the obvious commands in wordpad with a number of 'find-replace' commands

              - then, I made two fields to look for commands start and end using the function Instr [font="courier"]Instr("%",Descr) and Instr(")",Descr)  /font[/quote]- Now made a field Length: [font="courier"] (end-start)+1 /font[/quote]- then, I made a field 'Trash':  [font="courier"]Substr(Descr,start,length) /font[/quote]- Lastly (and to acommodate for the problem with parenthesis in the text, not in commands) I made a field 'cleaned_description': [font="courier"]If(length<=1,InTrim(Replace(Descr,trash,"")),descr)[/font][/quote]That did it!!!! Thank Heaven for beer

              • Cleaning up text
                Data Kruncher

                As is often said here, where there's a will, there's a way.

                 

                Glad to hear that you've found a solution. Especially one that requires beverages.      

                • Cleaning up text
                  Nick Osdale-Popa

                  I was working on this solution, then got interrupted by lunch.

                   

                  This is assuming that your "commands" are always % ( indicating a space). In most[/b] situation, you'd rarely find a "%" preceding regular words or by itself unless it's referring to command structure such as your data.

                   

                  Here's what I came up with.

                  First make a new Character Field that appends a _  to your current field and replaces all occurrences of _% with some abstract set of a characters. I used the pipe "||" character as it's usually not present in regular text and used for other purposes in string functions.

                   

                  [font="courier"]Replace(" " + Descr," %","||")[/font][/quote]I called this field NewDesc.

                   

                  Then using a modified solution of Kruncher's:

                  [font="courier"]LSplit(lsplit(NewDesc,len(NewDesc)-len(strip(NewDesc,"|"))+1,

                  "|",len(NewDesc)-len(strip(NewDesc,"|"))+1),2,")",2)[/font][/quote]This would alleviate the problem of having "()" characters in your returning string as well as any percentiles (10%, 20%, etc). Again this still runs afoul as DK points out that if the % is by itself in the returning string.

                   

                  If your "commands" don't have a space between as your text does, but is followed by the last command like such:

                  [font="courier"]%AON (n/g/n/n/n=)%AON (h/t/n/n/n=`)%PAR (fl=61,SL=1,ti=7)[/font][/quote]Then you can append another ")" at the beginning of the string instead of a space and adjust the formulas as necessary.