4 Replies Latest reply: May 15, 2014 10:01 AM by Grant Perkins RSS

    NSplit - What logic does it use?

    AmyNRA _

      I am using NSplit today and along comes a name where the first name is only two letters. 

      Example:  BB SMITH


      NSplit gives me no first name!  This two-letter portion of the name isn't falling into any of the other four name parts, either.


      (The last name is being identified correctly.)


      I have wondered before what logic Monarch uses to make these determinations.  Is there anything somewere I can read about it?

        • NSplit - What logic does it use?
          RalphB _

          Hi Amy,


          Yes, there are a couple of places where you can find more information about functions.  One is to click on the 'Help' on the menu bar.  Type in 'NSplit' and it will give you info on that functon. 


          You can also go to the Datawatch website at url= Downloads and Updates[/url].  There you will find the Monarch Functions Reference Guide which you can download and print out.





            • NSplit - What logic does it use?
              Grant Perkins



              Bear in mind that functions like NSPLIT will only work reliably with suitable formatted data that fits the rules (and any anticipated anomalies the rules they can allow for) for which they have been coded.


              It's the same for addresses.


              Most database will have quite high numbers of records with non-conforming entries for one reason or another and really all one can do is try to spot and report the obvious anomalies for special corrective attention after the extraction and give thanks for the reduced work effort related to the records which extracted as required.


              If you look at the 'failures' there may be some patterns there that could be dealt with by some additional rules you could build into the model but the chance of getting 100% conversion on a large set of records is small - unless that set has been recently treated to some attentive cleaning and housekeeping. At least that is my experience over the years.


              There are some names, especially short names, that can be difficult to categorize within the rules and may not be unambiguous in some cases.


              NSPLIT will work to produce 5 possible fields from a 'name' extraction but seeks to use punctuation and spaces to help with recognition of the section break points.


              If you use you BB Smith as an example you can see the effect by setting up 5 fields, one for each possible field in the NSPLIT function. "BB" would be interpreted as some form of Title and appear in field 1. 


              If you use "B B Smith" you would get "B" for first name, "B" for middle Initial and "Smith" for family name.


              If the name had been "BA Smith" the first name field would have returned "BA" with nothing for the middle initial.


              Any likely Name field must have at least one vowel or the letter Y to be identified. There may be other rules as well but that is the most likely check to come into play.


              For those who have Monarch V9 you can make use of the Test facility for a calculated field to check the effects.


              Open an Monarch session and go to the table window (you don't need a report or other input.) Add a calculated field with the formula


              nsplit("BB Smith",1)


              Change the text string in the quotation marks (BB SMITH on this case) to re-space it or amend the 'spelling' and then use the Test facility to see the results. Change the number of the SPLIT part to be extracted to ascertain where the component parts will fit.


              One way of dealing with such matters would be to create fields for all 5 possible fields and then assess the contents. For example part 1 fields would normally be for titles such as Mr., Mrs., Miss., Ms., Dr., and so on which probably do not appear in many databases. So that opens the possibility of spotting possible anomalies simply because the field ends up populated - as it would in the BB Smith example.


              Next one could compare the result - BB - to a list of expected values - Mr.,  Mrs.,  etc. - and if there is no match perhaps assume that the entry is really something that should be spread across part 2 and part 3 of an NSPLIT extraction and likely to be initials. So one could make a calculated field that takes the first character only and a second field that takes whatever is left after the first character has been removed.


              The final step would be to use another calculated field, probably using an IF() function in most cases as a precaution, to make up the field you really want from either the original direct extraction OR the re-worked calculated fields described above.


              One would be very lucky to find a 100% correct result - there may well be a few records that defy any rules you can safely use! However it could certainly offer an even greater one-pass success rate.


              Just remember that with records in databases, especially anything that is fundamentally 'free text entry' rather than controlled by predefined acceptable data tables, the 'data entry sources' are very likely to have found ways around the rules intended for the data content!






                • NSplit - What logic does it use?
                  AmyNRA _

                  Thanks, Grant.  That's what I was looking for:


                  Absense of a vowel (or Y) makes Monarch determine it's a title.


                  I also made a table, as you suggested, so I can see examples of Monarch's name determinations.  That's helpful, too.


                  I don't mind challenges of irregular data, but I needed to understand the logic.

                    • NSplit - What logic does it use?
                      Grant Perkins

                      I don't mind challenges of irregular data, but I needed to understand the logic.[/quote]


                      I agree. But also have a feeling for the  limitations of the logic based on the human ability to fail to follow 'rules' at some point in some way! 


                      Of course that implies that some names from around the world may be considered to have broken the 'rules', but such things are probably two way trades ...


                      Glad you know what you needed to know.