Last summer there was a [url="http://mails.datawatch.com/cgi-bin/ultimatebb.cgi?ubb=get_topic;f=1;t=001289#000000"]similar discussion here[/url] on HL7 files. Unfortunately there was not a clear solution posted to the problem at the time.
A quick Google of parsing HL7 files (with which I have zero experience) didn't turn up much concrete information either. This basically reinforces the notion quoted on the web site in the other thread which eluded to all formatting standards being optional. Some help, huh?
The only other item I would speculate /i on is that there could be an embedded end-of-file character within your file, and that could[/i] be the reason that you don't see the entire message in Monarch. It would read the end of file character and stop loading the remainder. Just a guess though. That said, I can't imagine why someone would purposefully build a file with that structure.
When I open the text file in Monarch, I don't see the complete HL7 message. /b[/quote]Carolyn,
I assume what you do see you can clearly identify as a part of a message and that you can also see the full message by some other means. Does that mean you can see whether the messages 'stop' (in Monarch) at a consistent point or character type?
My reading around the subject, such as it was, back in the time of the post that Kruncher referenced AND using some finds form another search engine just a few minutes ago suggest that there are a large number of potential (variable)uses and standards which may make a generic answer somewhat difficult to derive. Perhaps a seasoned HL7 user might offer a different conclusion but looking at an overview of the methods and structure(s) which seem to be employed and allowing for possible 'local' adaptations suggests the only way for any of the forum members to approach this (ules we have some new HL7 experienced people around these days) is to consider your problem as entirely divorced form the fact that the data source is HL7 based.
In other words, we have a string of data in a file. for some reason Monarch does not seem to see all of it. Some of the possible reasons might be.
1. The entire string is too long (unlikely and you would have expected to see a message warning you.
2. Something in the string is interpreted as and end of "string/line/paragraph/whatever" character for some reason.
3. Monarch does see the entire string but for some reason you are not seeing it on the screen. (e.g. the field is longer than a normal field can hold and needs to be defined as a Memo field or the field is within type limits but exceeds screen display and therefore needs to be viewed using the "Show Field Contents" option.
3 basic suggestions mainly to eliminate any siomple but less obvious oversight.
I also wonder what you mean by "the complete message". My reading suggests that a complete TRANSACTION could be a series of message exchanges over time and therefore may not be sequential in a file - as you allude to when you mention 'other information'.
I doubt that such an observation is relevant to your problem but thought it best to raise it just in case there is a connection somewhere!
The documentation reference I saw (and skimmed through) sugegsted there were over 500 message type codes (iirc) and it looked like each one could potentially have a different record(message) structure. If so, unless you are setting out to select and analyse only some of them, you have a fairly significant task to undertake.
Could you 'sanitise and anonymise' a sample of a log file and post it here with a comment about where the interpretation seems to stop so that we can have a more specific look at the problem?
Thanks; you guys helped me figure it out! I copied and pasted part of the Log file below. In the original document, Segments ("MSH", "EVN","PID", etc.) are delimited by a carriage return but I noticed that the delimiters disappeared when I copied the messages below.
After some experimentation, I discovered that if I open the original Log file with Word and then save it as a text file, Word gives me the option to "insert line breaks" and "allow character substitution". Word, in effect, "Preps" the file for me. Now when I open the "prepped" file in Monarch, I see the entire HL7 messages.
Now, on to the the task of parsing the individual fields (delimited by "|") !
PID|0001|450283|45-02-83|000428129|PITTBRAD||19630101|M||W|4201 HOLLYWOOD LANELOS ANGELESCA^59123||5558765331|5558907689||S||00110203353|500333333
GT1|1|000428129|PITTBRAD||4201 HOLLYWOOD LANELOS ANGELESCA59123|5558765331|5558907689|19630101|M||99|500333333||||MGM MOTION PICTURE|1701 VINEHOLLYWOODCA^90210|401555-7881|||||||||||59746|S||||||||Y|||||||||||ACTOR||||||W
IN1|1||2|AETNA INDEMNITY||||58689|PARAMOUNT|||20070114||||PITTBRAD|99|19630101|4201 HOLLYWOOD LANELOS ANGELESCA59123|||1||||||||||||||526981010|||||||M|1701 VINEHOLLYWOODCA^90210|N||||000428129
IN1|2||12|CIGNA INDEMNITY||||87890|JOLIE'S PROD.|||20060101||||PITTBRAD|99|19630101|4201 HOLLYWOOD LANELOS ANGELESCA59123|||2||||||||||||||9090878000|||||||M|1701 VINEHOLLYWOODCA^90210|N||||000428129
Progress - that's great news Carolyn!
I copied your sample text to Notepad and saved and opened the file with Monarch. I created a detail template with only alpha traps in columns 1 and 2. I then defined a 254 character field to capture the entire line as one field, named TextField.
Now parsing the TextField to extract the individual field values between the | characters is quite straightforward. Using "my" character count function, I determined that your longest line has 56 fields.
To extract the fields I would create 56 calculated fields named, for lack of more meaningful descriptions, "Field 1" through "Field 56", with the following formula, taking advantage of the great LSplit function:
[font="courier"]LSplit(TextField,56,"|",56) /font[/quote]A bit tedious to setup, but once it's done, it's done.
The information document I found indicates that the sections of the message are delimited by Carriage Return (cr) and Line Feed (lf) in combination. However there seem to be a number of iterations of the HL7 concept and I can't be sure that they are all the same, though for this sort of thing they most likely are.
Now that leads me to suggest (from a point of ignorance other than having scanned briefly through the document I mentioned which relates to an implementation in New Zealand and therefore has its own local adaptations ...) that Word my well have dumped the (crlf) delimiter which, in this case, does not really help you I suspect.
Some of the very long records look like they are now simply wrapped to fit in with the page size and margins defined in word. My guess is that one very long row would make life easier in some ways.
If you have the PRO version of Monarch you could simply use the file as a database with "|" as the field delimiter.
Well, not quite so simply because a message contains several different sections each of which have their own data structure. (I even got the impression that the may be subtle difference in the data structure in a similarly identified section in some cases but that may have been a misreading on my part!)
I also read (I think) that some sections can have sub-sections. That will add to the fun!
I have doubts that the records, especially those that obviously wrap after Word has finished with them, will always be consistent. So if you are trying to parse the file as a REPORT it may be difficult to be certain that the section you want within a template always has the same number of lines. Even if it does the delimiter positions will clearly not be consistent.
I doubt a definition based on a floating trap would be workable - too many fields and variations I reckon. If you wanted to go that way you would need to swap the "|" for something else since in a template trap "|" is reserved for a special purpose. (Prep or the Monarch Utility for that job.)
Of course I could well be wrong about all of this and to some extent the level of complexity will depend upon what you are required to extract from the records. Maybe that is quite simple?
Whatever the need I hope you have access to some complete and accurate HL7 usage information as it relates to the files you have to analyse. I could see life being rather difficult without that.
Please let us know how you get on with your challenge!
All the best,
Edit to add:
Data Kruncher posted while I was looking at the documentation!
I would recommend the same approach for splitting the fields if a databse import is not a viable option, but have some reservations about the potential for wrapped lines in some of the sections. And a few other concerns about the section/record formats which might make the field definitions conditional and therefore more complex.