4 Replies Latest reply: May 15, 2014 9:55 AM by Data Kruncher RSS

    Working with x-large data files

    Michelle _

      I have to process data files that range in size from approximately 400,000 KB to almost 2,000,000 KB.  Loading the data into a table takes several minutes and the filtering takes forever! 

       

      Does anyone have any experience with files this large and any tips on what I can do to improve the processing speed?

       

      Michelle

        • Working with x-large data files
          Data Kruncher

          Hi Michelle, and welcome to the forum!

           

          My first tip isn't related to Monarch at all, but is usually easy, and relatively inexpensive: add memory to your computer. Bring the total memory to the maximum your computer permits. This helps enormously.

           

          Secondly, as you have the Pro version, there a couple of steps you can take that I believe should speed up your filters. I don't have v7 Pro, so the next bit represents the steps using V8 Pro. Hopefully they're very similar.

           

          Open your data set as you normally do. I'll make an assumption and say that you're applying a model to a report file or files, instead of opening a database source.

           

          Now export the Table window to a new Access database file on your C: drive. Don't save it to a network drive as this will slow you down too, on both the save and the subsequent accesses.

           

          Close all your reports and the model.

           

          Now Open Database and select your new Access database. Select the table you want to use. Now here's the important part that I hope exists in V7. There's a section for Source Filtering. Click the Apply Filter box and click the Change button.

           

          Now define the filter just as you normally would in Table filters. Go on to select the fields you want to import and Finish to open the data.

           

          By using a source filter, you only retreive the records you want, and Monarch won't have to work so hard (slowly) picking and choosing the right records the way it does when you create and use regular table filters.

           

          Now save this model. The next time you want this data, you Open Database and open the model instead of opening the mdb file.

           

          Create as many model files with customized Source Filters as you need.

           

          If Source Filters are available for you, I hope this helps with your performance issues. OK, not yours[/i], but Monarch's on your computer!  

          Kruncher

          • Working with x-large data files
            Michelle _

            Dear Data Kruncher:

             

            The IT group has already upgraded my computer and memory.

             

            I tried your suggestion on exporting the table to an Access database on my local drive.  I chose a file that is an average size (e.g., 558,072 KB)for the month.  This particular file had 1,365,438 records in it.  The export took me 1.5 hours?  Does this sound correct?  Could I possibly have done something wrong?       

             

            Once exported, I followed your directions and applied a Source Filter, which worked great.  Unforunately, the time saved using the source filter was way offset by the time it took to export the file to Access.

             

            I hope that I have missed something in the export process that caused it to take so long.  Please let me know what you think.

             

            Thanks,

            Michelle

            • Working with x-large data files
              Grant Perkins

              Hi Michelle,

               

              I sometimes work with files of similar sizes to yours and can confirm that when you get up to that sort of size things do take a while.

               

              There is more to trying to assess the amount of time it will take than simply the number of records - as you probably realise already. Any additional processing and to some extent how many fields the record contains can all have some effect.

               

              If I needed to do that sort of thing regularly, especially with repeating inputs and report formats,  I would hope to find an automated way of  running the process and having the results 'delivered' to me.

               

              So that might mean setting up a batch script and scheduling it to run during non work hours for example.

               

              Even better might be to consider whether Monarch Data Pump would make a sound investment for the tasks - it might well do if they are always likely to take as long as they do currently. And let's not forget that 1.3 million records is still quite a large number! Data Pump would allow easy scheduling of the long process at any time you set or when the file is delivered to a specified watched folder and you could be advised when the process had finished. I am assuming of course that you already have a model available to process any files you get.

               

              My own observations were that my system, which was a fairly high spec 4 years ago but would be considered less than budget spec now of course, was restricted by disk access functions rather then processor (Pentium 4, 2 GHz) or memory (512Mb).

               

              I hope this helps you in some way with your approach to the problem. I expect Kruncher will have some further ideas as well.

               

               

              Grant

               

              [size="1"][ September 18, 2006, 07:52 PM: Message edited by: Grant Perkins ][/size]

              • Working with x-large data files
                Data Kruncher

                1.3 million records - that's some serious Krunching.

                 

                Grant's correct, when writing that many records the classic bottleneck isn't so much RAM, CPU speed or other motherboard related topics, as it is disk drive read/write access.

                 

                Unfortunately, as Grant's discussed time shifting the data processing via Datapump or batch files, I've got nothing to add.

                 

                I just haven't had occasion recently to work with individual files that are that large. Although exporting to a csv file format might involve less overhead and therefore process more quickly (marginally), I'm not certain it would be any better, or at that level, as stable. And I would speculate that reading data back from the csv would be nowhere near as quick as reading it back from the mdb would be.

                 

                I'd say that since you've proven that reading back the filtered data source works nicely (good news, by the way), your best option is to run the export overnight. And the easiest way for you to do that will be to create a batch file which will do it for you.

                 

                There are a number of batch file programmers here who I'm sure will be pleased to offer their assistance.

                 

                Kruncher.