4 Replies Latest reply: May 15, 2014 10:10 AM by Grant Perkins RSS

    Spliting Text Line on numeric (or dates)

    rsbeatty _

      New User; First Time Post

       

      I am pulling a line off of a .pdf file.  Because the columns tend to vary in position, I grab the entire line, and parse the data.  In one scenario, it fails and I cannot find a way to solve it. There are thousands of lines to process, but for simplicity, I have attached the basic scenario I am solving for.  See below

       

      Property Inspections 5/12/2009

      Property Inspections12/22/2010

      Utilities                   5/25/2010

      Property Maintenance5/25/2010

       

       

      For the majority of lines, this works perfectly.

      rsplit(intrim(lsplit(,2,'/',1)),2,' ',2)

       

      As you can see in the other examples, there are no spaces. Ideally, if I could find a way to look for the first numeric character, that might work.  I tried to insert a .in. for the split character, but that appears to be a invalid option.  Any help would be greatly appreciated.

        • Spliting Text Line on numeric (or dates)
          Data Kruncher

          Hello and welcome to the forum!

           

          You've definitely got the right idea there.

           

          Try this:

          If(Substr(RptLine,Len(RptLine)-9,1) .in. ("0","1","2","3","4","5","6","7","8","9");

          CtoD(Right(RptLine,10),"m/d/y");

          CtoD(Right(RptLine,9),"m/d/y"))[/CODE]Works for me.

           

           

          EDIT:You're really only interested in two digit months, so if the character in question is a "1" then...

           

          If(Substr(RptLine,Len(RptLine)-9,1) = "1";

          CtoD(Right(RptLine,10),"m/d/y");

          CtoD(Right(RptLine,9),"m/d/y"))[/CODE]HTH,

          Kruncher

          • Spliting Text Line on numeric (or dates)
            Grant Perkins

            Hello rsbeatty and welcome to the forum.

             

            From your post I would guess you are quite comfortable with the functions available so I'll outline an idea rather than try to spell things out.

             

            If you can assume that the last 10 characters of the line (or field) will contain a date in all cases BUT that the first character of that 10 char string may be either;

             

            a valid number;

            a 'valid' space; or

            an invalid character that has to be removed/substituted ....

             

            then it should be possible to to work out a calculated field extraction.

             

            At a simple level and with a warning about the potential for the need to TRIM blanks, if you use LEN() to get the length of the field then -10 to get the start point for a SUBSTR to grab the end of the line you are nearly there.

             

            Check the first character of the resulting string. If it is 1 it's OK. If it is not 1, then set it to 0.

             

            That should do it. I see Kruncher has suggested much the same solution but offered more detail about the formula required. Looks good to me!

             

            Grant.

              • Spliting Text Line on numeric (or dates)
                rsbeatty _

                I ended up using the STRIP() command. By parsing the first slash, I was left with numbers at the end of my character string.  I striped out all numerics, and that left me with the desired result.  Thank you both for your posts, they have given me some other idea's on solutions I have been struggling with.

                 

                This ended up being my solution:

                strip(intrim(lsplit(,2,'/',1)),'0123456789')

                  • Spliting Text Line on numeric (or dates)
                    Grant Perkins

                    I ended up using the STRIP() command. By parsing the first slash, I was left with numbers at the end of my character string. I striped out all numerics, and that left me with the desired result. Thank you both for your posts, they have given me some other idea's on solutions I have been struggling with.

                     

                    This ended up being my solution:

                    strip(intrim(lsplit(,2,'/',1)),'0123456789')[/QUOTE]

                     

                    Ah, OK, didn't read your first post correctly and thought you wanted the date.

                     

                    Two observations.

                     

                    Firstly the strip idea is great provided you can be absolutely certain that the text you wish to capture will never contain any of the strip characters. now or in the future. It probably won't but it's always worth remembering. If there is any doubt build in a check that would highlight the occurrence.

                     

                    Secondly, off the top of my head and probably a minor thing, I think using intrim after strip would be better. I could be wrong (I have not checked) but the order of processing suggests you might end up with a space at the end of the extracted string. Whether or not this matters to your result I have no idea but it might in certain circumstances.

                     

                    HTH.

                     

                     

                    Grant