19 Replies Latest reply: Nov 8, 2017 3:12 PM by Grant Perkins RSS

    To extract multiple lines from a text document

    Anu Moorthy

      Hi datawatch community members,

           I need to just extract a block of data from a text file and save it into a separate text file.  Is this possible using Monarch?

       

      Thank you,

      Anu

        • Re: To extract multiple lines from a text document
          Stephen Smay

          Anu, when you say you need to "extract a block of data," do you mean to extract it into a table, or to extract the text exactly as it appears?

          • Re: To extract multiple lines from a text document
            Stephen Smay

            Anu, Monarch is designed to extract data from reports into tables. It can also export reports, but as far as I know, that exports the entire text file rather than a specific defined section.

             

            For this situation, I would build a Model with Start and End Region Templates defining the boundaries of what you need to extract. Then capture everything in between as the Detail Template. This Detail would need to be a Regular Expression trap so it could capture whether there is anything on the line or not. The expression would be ^(?<Data>.*)$

             

            Then create a field for that capture and ensure the field width is set to 254 so that it will get the entire line. From there, you will have a table with only one column, but that column will contain all the text of the region you need to extract. After that, create an Export, give the file name a .TXT extension and select export file type as delimited text. Finally, run the export and that should give you a text file with only the defined region.

            • Re: To extract multiple lines from a text document
              Grant Perkins

              Anu,

               

              Stephen raises a number of interesting questions here.

               

              Firstly, is this one block of data from one report at a time?

               

              Or one block of data from multiple reports in a single process?

               

              Or many blocks of data from a single report?  (Or many blocks from many reports ...?)

               

              Do you need to export to a file or files or do you just need to cut and paste?

               

              Do you have the latest version of Monarch with Data Prep Studio or are you running with an older version?

               

              How do you need to identify the block of data/text that is to be extracted? Is this something you would be doing interactively on a screen or does it need to be a full model and export?

               

              If you can share with us an example of what you are working with (or a mock up to illustrate what it might look like) that would be very useful.

               

               

              Grant

              • Re: To extract multiple lines from a text document
                Anu Moorthy

                Hi Steve and Grant,

                Firstly, sorry for being a little late in replying your questions.  Thanks to you both for taking the time to help me out with this :-)  Answers to your questions:

                 

                1.  Yes, this is to extract one block of data from one report at a time.

                2.  Not from multiple reports but just one report.

                3.  No, just one block of data from a single report.

                4.  Extract that potion and save it as a text file on a monthly basis.

                5.  Currently having Monarch 9 but will be moving to Monarch v14.2 but not sure if 14 is the latest version.

                6.  I do not want to identify the data in a interactive fashion as the block of data I need always appears in a pre defined place.  So, I want a full model that will just extract all the data between 2 defined regions and just export that to a text file.

                8.  Here is an example

                Input file:

                                      NY - STATE 1

                Fact1 xxxxyyyyyyzzzzzzz

                1. data

                2 .data .....

                till

                30. data

                                   NY - STATE 1

                Fact 2  xxxxyyyyyyzzzzzzz

                1. data

                2 .data .....

                till

                30. data

                                NY - STATE 1

                Fact 3 xxxxyyyyyyzzzzzzz

                1. data

                2 .data .....

                till

                30. data

                           CA - STATE 2

                Fact1 xxxxyyyyyyzzzzzzz

                1. data

                2 .data .....

                till

                30. data

                           CA - STATE 2

                Fact 2 xxxxyyyyyyzzzzzzz

                1. data

                2 .data .....

                till

                30. data

                           CA - STATE 2

                Fact 3 xxxxyyyyyyzzzzzzz

                1. data

                2 .data .....

                till

                30. data

                 

                I want the output to get all the details for NY - STATE 1 starting from Fact 1 till Fact 3 and save it into a separate text or doc file.  Same for CA.

                Is this possible????

                TIA

                Anu