This sounds more like a data variability problem, irrespective of which method you use (even manual) to do the analysis and selection.
If you are getting returns for Doctors who are not in your list is there any way of filtering that by some other means? Hospital/Department codes of some sort for example? (Slightly worrying, I think, that this is happening at all but I guess that is a different problem...)
You could also consider taking the entire Doctor name string and simply searching it for names you recognise from your Doctor list rather than worry about trying to do a name format. The format would really only be of special use if trying to create some sort of mailing list I would have thought.
For many names this will probably work quite easily but could be a problem if the Lab sending back the information services other hospitals with many Doctors who may also have the same names - John Smith, Juan Fernandez, Mary Brown and so on.
One approach might be to assess the potential for uniqueness of the name and then accept those which are alost certainly unique (and match your doctor!) and have various degree of manual checking for other combinations. It may at least reduce and assist the checking process.
One thought I had to compare the names on the returns with a list of your known doctors held in a lookup table. Potentially an easy spot for miss matches.
Another option might be to identify whether the labs consistently use a specific format on their reports. If they do and the report can be used to identify the Lab, then you could potentially convert the format to your preferred format (or the most common format) before doing the comparisons for name matches. The comparison might include two or more stages - family name match then given name match then perhaps some form of other name or initial matches for names that might be relatively common and therefore less unique than you might prefer.
Fianlly there will almost certainly be exceptions which make identification difficult and it would be worth including a 'probable error' flag in the output.
I worked on a similar sort of problem (2 databases with different name format and data held but which needed to have data combined into a new output) two years ago. We had enough success to justify the activity BUT there were some records where, with spelling errors or 'variability', the links were perhaps questionable. So we used a 'possible error' flag so that the users (same department, two separate systems for HR needs) could easily see the questionable records and decide where best to go to correct the data to give improved results next time around. A double benefit!
As I recall we needed a multi-stage process for that solution to prepare two files to a common format of 'name' and the a third process to make the data joins and produce the required output.
Part of the process involved eliminating as many variables - like spaces and punctuation marks - as possible in order to create a common key.
I think the key we ended up with was something like FAMILYNAMEGIVENNAMEINITIAL with all spaces and punctuation removed. Worked well as long as the componenent parts of the names were entered relatively consistently on both files and without spelling errors.
The whole thing was then set up to run as a batch file activated process. It would have been easier with V8 and Data Pump ...
As well as NSPLIT consider just using RSPLIT, LSPLIT and INSTR functions to break the name field up or, in the case of INSTR, simply assess whether the name is likely to match one of your doctors.
I have Datawatch ES
I have Monarch V8.01
I just have to make these things sing. Thanks for the ideas.
Hey being a grand master how come there are only 10 post for Datawatch ES over 3 years???? /b[/quote]Keith,
The name thing could be quite tricky in the detail no matter how you do it. I think my assumption would that the backstop position is to refer anything that is not clear to humans who are much better at working out subtle differences and spotting errors in original data (or should be!)
Data Pump should make a multi-stage process relatively easy to generate and control whichever way you do it.
As for the ES question - good question, well presented but I have no idea what the answer is. I am not an ES practitioner by any means, though I do have some knowledge of the product, but I think the answer may be that it is a more technical 'team' sort of product and therefore people seek support in a different way.
As you can see from the forum, Monarch, a desk top solution, gets far more traffic than any other product. Maybe it takes too much courage to ask questions on a public forum if you are working in a technical role with an enterprise level product in an IT department?
VorteXML and to some extent Data Pump seem to be similar to ES in that respect. That said I know of several Data Pump user/administrators are not considered to be IT department people. And I have met at least one ES user and system administrator who is not an IT person in any way at all and has set up and maintains ES as part of a regular finance related role.
The other possibility is that the potential for making the product highly tailored to corporate requirements may suggest that sharing information is not going to be very useful and may even unwittingly give away corporate governance information which could be considered unwise.
That said, since Monarch functionality underpins both Data Pump and ES, it will be commonly used by vastly more people much more frequently than, say, Data Pump sheduling or ES report database structuring would be, so it is not surprising that there are more Monarch correspondants, just that the difference in numbers is so large.
Comments from those others who have several Datawatch products to keep them busy would make interesting reading.