10 Replies Latest reply: May 15, 2014 10:06 AM by Data Kruncher RSS

    Suggestions for extracting data from an RTF file?

    MonUserCJ _

      Hey all,

       

         I want to extract data from some reports that are RTFs. I was wondering about the different options that I have for doing this using Monarch. I think that my best bet is to print the reports to PDF files, and try to open the PDF reports in Monarch that way. However, I'm a little wary of using this approach, since PDFs often open in Monarch with strange spacing.

       

         I am interested in seeing if I can save the RTF files as text files, since they don't have much complicated formatting. When I try to save one as a text file, the data is still discernible, but it saves with garbage strings in it.

       

         I was wondering if anyone has any suggestions for saving the RTF file as a file format that Monarch will be able to successfully open. In the File Conversion dialog box in Word that appears when I try to save the RTF file as a text file, there are options to save with different text encoding, and options like "Insert line breaks", "End lines with:..." and "Allow character substitution". Does anyone know if those are useful?

       

         I would appreciate any suggestions that people have concerning this issue.

       

         Thanks.

        • Suggestions for extracting data from an RTF file?
          Data Kruncher

          To convert RTFs to plain text, I'd be tempted to use Wordpad instead of Word. Wordpad does not have file size limits (AFAIK), and does the job in a very straightforward manner: without any options at all.

           

          That said, I've also done RTF conversions with Word without any issues, just by accepting the defaults presented.

           

          Wordpad should be able to generate absolutely clean text files.

           

          Does it work for your file(s)?

            • Suggestions for extracting data from an RTF file?
              Grant Perkins

              If you have up to date MS applications and Monarch 10 Pro (are you still on 9 as is indicated by your profile?) you could also consider xps format.

               

              But otherwise I would use Wordpad as Kruncher already suggested - especially if there is no complicated format to deal with.

               

               

              Grant

                • Suggestions for extracting data from an RTF file?
                  MonUserCJ _

                  WorkPad does seem to do a pretty good job saving the RTF files as text files. Thanks for the advice.

                    • Suggestions for extracting data from an RTF file?
                      Olly Bond

                      Hello everyone,

                       

                      Sounds like WordPad can handle the ad-hoc requests. But if you had a steady stream of such files and needed an automated solution, I wonder whether you could control WordPad via a script?

                       

                      I've seen one application that can monitor a folder, select files of type *.rtf and convert them to text-readable PDFs - that's FineReader from Abbyy - but perhaps there's a more elegant solution?

                       

                      Best wishes,

                       

                      Olly

                        • Suggestions for extracting data from an RTF file?
                          Data Kruncher

                          If a number of files need to be converted then AutoIt would help with scripting it. But with the possible learning curve for a one-off job, one would likely be better off to do it manually.

                          • Suggestions for extracting data from an RTF file?
                            MonUserCJ _

                            Hello everyone,

                             

                            Sounds like WordPad can handle the ad-hoc requests. But if you had a steady stream of such files and needed an automated solution, I wonder whether you could control WordPad via a script?

                             

                            I've seen one application that can monitor a folder, select files of type *.rtf and convert them to text-readable PDFs - that's FineReader from Abbyy - but perhaps there's a more elegant solution?

                             

                            Best wishes,

                             

                            Olly[/QUOTE]

                             

                            Hi Olly,

                             

                               As for automating the process of converting RTFs to PDFs, what you might be able to use is the PDF pritner "PDFCreator". PDFCreator is good for this because it has an auto-save feature, where it can automatically save files that are printed to it to a folder according to certain naming conventions which you specify using placeholders. Then, select the RTF files in Explorer, right-click, and select Print.

                             

                            Hope that helps.

                              • Suggestions for extracting data from an RTF file?
                                RalphB _

                                In the past, I have successfully converted Microsoft Word docs to a basic text (.txt) format by saving as Plain Text (.txt).

                                 

                                At one time I had it scripted but since we are no longer using that Word doc, I have deleted the script.

                                • Suggestions for extracting data from an RTF file?
                                  Grant Perkins

                                  Hi Olly,

                                   

                                  As for automating the process of converting RTFs to PDFs, what you might be able to use is the PDF pritner "PDFCreator". PDFCreator is good for this because it has an auto-save feature, where it can automatically save files that are printed to it to a folder according to certain naming conventions which you specify using placeholders. Then, select the RTF files in Explorer, right-click, and select Print.

                                   

                                  Hope that helps.[/quote]

                                   

                                  I'm not sure I would want to add extra conversions to the process if it could be avoided. PDF writers have it seems been quite individual in what they manage to do internally, certainly historically and I have not heard any information to say things have changed, and that can provide extra challenges for re-interpretation. I figure if Adobe's own convert to text programs can have problems then anyone else's will too.

                                   

                                  If working one file at a time this may be no big deal. Running batches of Monarch analysis .... hmm, possibly not so comfortable.

                                  If you can go direct to plain text from the RTF form you have a simpler solution and keepingthings as simple as possible is usually a good goal.

                                   

                                   

                                  HTH.

                                   

                                   

                                  Grant