7 Replies Latest reply: May 15, 2014 10:07 AM by Data Kruncher RSS

    nsplit()

    elginreigner _

      I use this function heavily, and have just notice inconsistent results.

       

      Example:

       

      MCALONIS KELLY A

      MCCABE PATRICK J

       

      Both have last name, first name and middle initials.

       

      I use the following for a test during any new client setup:

       

      '1'NSPLIT(GNAME,1)'2'NSPLIT(GNAME,2)'3'NSPLIT(GNAME,3)'4'NSPLIT(GNAME,4)'5'+NSPLIT(GNAME,5)

       

      GNAME represents the whole name

       

      In this case I am getting results that are inconsistent:

       

      TEST

      12MCALONIS3KELLY4A5

      12MCCABE34PATRICK5J

       

      As you can see both have last name returned in the same part, but the first name is being returned differently.

       

      Anyone ever run into this?

        • nsplit()
          Grant Perkins

          My understanding of this function is that it is intended to allow a name in the standard written format of

           

          Title, First Name, Middle Name, Last Name, Suffix

           

          to be split into 5 parts. (See the Help for a more lucid description.)

           

          It will attempt to spot Titles (Prefix) like Mr, Mrs, Dr., etc, and suffixes like MD.

           

          Now I have no idea what internal rules it seeks to apply but a quick test suggests that if the 'initial' at the end of the name is A, E, O, U or Y, nsplit will treat these as part 4 of a name (i.e. does not interpret them as Suffixes and will accept them as a Last Name) but any other letter is understood as a Suffix and so treated as the 5th part of a split. There may be some other criteria in play as well but I have only checked for the last initial changes.

           

          Your source names are somewhat out of sequence for the automated intentions of nsplit(). Given that data input I think I would most likely work with lsplit or rsplit or both.

           

          Alternatively if the input is always Last Name/First Name/Middle Initial one could re-order the name and pass that through the nsplit function if your  records have appropriate Prefix and Suffix identities to deal with.

           

          I'm not sure that would offer so much benefit compared to dealing with the structures directly but experimenting with the options should quickly identify the positives and negatives of that.

           

          HTH.

           

           

          Grant

            • nsplit()
              Data Kruncher

              FWIW, I only ever use NSplit with the data in the expected sequence as Grant detailed above, even if that means involving RSplit or LSplit to develop that sequence, otherwise you run the risk of what I would call unexpected results.

               

              That said, the results would not entirely be "unexpected" though when the NSplit function isn't used as intended.

                • nsplit()
                  elginreigner _

                  Rsplit() and Lsplit() would not work too well for names, being there can be any given number of spaces and the part that could be recognized as first or middle could easily be a part of the last name or vice versa.

                   

                  Yes, I agree the names are out of order, but I have to work with what can be provided. The data is even worse than the example shows, I get last name, first name, middle name, suffix and no prefix and none of them are required.

                   

                  You would think first and last name would be required fields in a medical system but who ever programmed this one didnt agree.

                   

                   

                  Thanks for the help.

                    • nsplit()
                      Grant Perkins

                      Rsplit() and Lsplit() would not work too well for names, being there can be any given number of spaces and the part that could be recognized as first or middle could easily be a part of the last name or vice versa.

                       

                      Yes, I agree the names are out of order, but I have to work with what can be provided. The data is even worse than the example shows, I get last name, first name, middle name, suffix and no prefix and none of them are required.

                       

                      You would think first and last name would be required fields in a medical system but who ever programmed this one didnt agree.

                       

                       

                      Thanks for the help.[/quote]

                       

                      The problem you face seems to me, in my experience over the years, to be quite common, especially in medical systems where few if any standards have applied. I agrre you would hope that patiatne records could be uniquely linked to an individual but even in a single system - like the UK's National Health Service - that is not always quite as comprehensively applied as you might expect. Recent attempts to completely integrate everything into a single system seem to be proving even more problematic.

                       

                      That said I think the specific intent of the nsplit function is no more help to you than your own creative use of the L and R splits if the data is as inconsistent is it seems to be (and very likely is. Bet you can't combine records reliably based on name anyway - there will always be spelling anomalies and typos to deal with.)

                       

                      One of the problems with expecting programmers to insist on first name/last name is that they will quickly realise that in a large population (typical of medical systems) such combinations will not be unique. ANd in different cultures the order of 'family' and 'given' names will be different. So they quickly conclude that there are no rules they can successfully apply ...

                       

                      It's a challenge, certainly. Can you get anywhere thinking of the 'names' as codes if you strip the spaces and treat them as one long string? Take what works, flag the rest as probably anomalies and pass the problem to someone to sort out the problem data before re-submitting (or provide a cross ref file you can use as a lookup tool in some way for records identified as possible error sources.)

                       

                      Nasty - but is may be the only real option you have, depending on the  purpose of the extraction.

                       

                      There may be some mileage in asking DW for guidance about the internal processing performed by nsplit. I don't think it looks like it will give up that information readily by experimentation and I don't remember reading anything about it in any documentation. (But I'll get around to checking the Learning Guide just in case.)

                       

                       

                      Grant

                        • nsplit()
                          Data Kruncher

                          Avoidable data challenges seem to present themselves at every opportunity.

                           

                          Take a customer master data table for example. If I didn't know better, I'd swear that some people who perform the data entry for work such as new customer setups simply enjoy putting data in the incorrect yet clearly labeled field, probably knowing full well that doing anything useful with the data extracts afterward would be all but impossible. City names in state fields, state abbreviations in province fields, country names in the county[/I] field, flipping first names and last names, combining what should be separate data items in one field. The list goes on...

                           

                          Never mind the horrors that are created with phone numbers.

                           

                          One of Monarch's strengths is its ability to clean up the data (to one degree or another, when good model design is implemented) when things are incorrect, and to simultaneously good handle when all is well.

                           

                          You know that you've got significant problems when it takes hours and days to scrub a list with the intention of sending out greeting cards or simple marketing materials to people and companies that are expecting to get them.

                           

                          Putting a little smart programming in the data entry side to immediately logically validate what's being input is money well spent.

                            • nsplit()
                              elginreigner _

                              Putting a little smart programming in the data entry side to immediately logically validate what's being input is money well spent.[/QUOTE]

                               

                              Smart programming? I'd like to see that. Yes I agree, simply data validation would save countless hours.

                               

                              Data and Grant, based on your answers, I would assume both of you have a programming background?

                                • nsplit()
                                  Data Kruncher

                                  Some time ago I spent a few years doing 4GL programming work on an integrated financial and manufacturing system that was very similar to Oracle. I maintained existing code and systems and developed new ones.

                                   

                                  While much of my time was spent on developing and optimizing the programs that created the reports, I did have an extended period in which I gave several key data entry screens some smarts to help avoid the GIGO scenario.

                                   

                                  But alas, with Monarch doing the heavy lifting for me of late, my SQL chops are decidedly rusty now. I can live with that.