9 Replies Latest reply: May 15, 2014 9:57 AM by Gareth Horton RSS

    Pro V8 and VERY BIG PDF files?

    Stephen _

      I've got a model and project in Pro V8 that work great for PDF's of a few thousand pages (8 MB is the largest so far).  When I try with our biggest challenge - a 27,000 page file of some 65 MB - the Monarch PDF to Text import runs for quite some time, and gets to about 90% of the way through.  The program quits responding, and I eventually get an error that the import failed.

       

      PC is Pentium 4, 1.7 GHz, 640MB RAM, 4 GB paging file, over 10 GB free disk space.  I turn off the McAfee "scan on access" to improve performance during this task.  Memory allocated doesn't hit the virtual limit - it got to about 2 GB out of total 4.5 GB available.  CPU stays at 95 to 100% during first 50-75% of the import, then suddenly drops to 5%.  Allocated memory then begins to drop, before the "failed" message appears.

       

      File is stored locally on my PC, Monarch is installed locally as well.  So I don't think any network issues are involved.

       

      Has anyone else worked with PDF's this large yet?

       

      Thanks in advance,

      Stephen J Voss

      Sr Systems Analyst

      MHMRA of Harris County (Houston, TX)

        • Pro V8 and VERY BIG PDF files?
          Grant Perkins

          Hi Stephen,

           

          I am wondering if you are getting to the same point that some of my very large database sourced extracts get to - the internal work files max out on the data extracted. Usually at several million records with around 100 fields or more per record, so to be fair it is a very big analysis in my case and frankly the shortfall in extracted records is really a reminder to take a different and more time efficient approach!

           

          If you have a look in your Monarch Work area (probably in the Windows temp folder but I can't remember precisely and I re-allocated mine elsewhere) you should find a number of files with a related data and time.

           

          FILEnnnn.PRN

          FILEnnn1.pgx

          Filennn2.ldb

          Filennn2.MDB

          Filennn2.rdf

          Filennn3.TDX

           

          for example. A couple of these may significantly large - enough to cause the process to stop adding records perhaps. I would be interested to know what you have there.

           

           

          Grant

          • Pro V8 and VERY BIG PDF files?
            Stephen _

            Grant,

             

            Process is failing during the import/conversion of the PDF to text, before the report can be presented on the screen.  There are no external links (e.g. MDB or SQL tables).

             

            I checked the temp folders and can find nothing left over.

             

            BTW, forgot to mention this box is running Win2K Pro w/ Svc Pack 4 installed.

             

            Thanks for the pointer, though - the additional data sources will be used in the near future on other projects.

            • Pro V8 and VERY BIG PDF files?
              Grant Perkins

              Hi Stephen,

               

              Is that before you get anything displayed in the import parameter control screen or after that but before you get the report window displayed?

               

              I assume the import control. If you get past that the .PRN file will be created along with the .PGX and .TDX files.

               

               

              Grant

              • Pro V8 and VERY BIG PDF files?
                Stephen _

                It's after the Import control settings.  The "Importing text from PDF file" progress window comes up, and the progress bar starts across.  Nothing every gets written into the WORKPATH folder (set to C:TEMP, with lots of fre space).  Progress bar reaches about 90% of the way across, then suddenly CPU drops to 2-5% and memory utilitization starts downward (watching through Windows Task Manager).

                 

                I'm going to use Acrobat 7.0 Professional to split the file into smaller pieces this weekend, then try again Monday.

                • Pro V8 and VERY BIG PDF files?
                  Dee Moore

                  test

                  • Pro V8 and VERY BIG PDF files?
                    Brie _

                    Testing 2

                    • Pro V8 and VERY BIG PDF files?
                      Grant Perkins

                      I have been checking what happens when I open a 30 page pdf doc.

                       

                      Ultimately this document produces a 353Kb PRN file and the PRN and associated files as mention previously appear in the folder when the progress bar is showing about 60% load completion. So in efect it is probably earlier that that  - I would expect it to be created as soon as the process starts though how soon the systems commits to disk from memory may be another matter.

                       

                      On that basis (very non-technical I am afraid) I would certainly expect your enormous file to be generating something well before the process fails.

                       

                      Very strange.

                       

                      It will be interesting to see what happens when you split the file.

                       

                       

                      Grant

                      • Pro V8 and VERY BIG PDF files?
                        Stephen _

                        Final results:

                        Broke the 2 files into pieces using Acrobat 7 Professional, each of 7-8K pages or 15-20 MB. 

                         

                        These imported fine through my project file and exported to Excel spreadsheets of around 11K rows.

                         

                        The PRN and other temporary files don't get created until the progress bar is finished, actually.

                         

                        Maybe there is something about system resources other than disk space or virtual memory?  But at least it worked.  The PDF functionality is fast and reliable.

                         

                        Thanks for the help!

                        • Pro V8 and VERY BIG PDF files?
                          Gareth Horton

                          Stephen

                           

                          I would very much like to get hold of your large PDF file, so we can test it here at Datawatch.

                           

                          If this is possible, either by sending in a CD, or providing it on an ftp site for download, please contact me at gareth_horton@datawatch.com[/email]

                           

                          Thanks in advance

                           

                           

                          Gareth

                           

                          Originally posted by Stephen:

                          Final results:

                          Broke the 2 files into pieces using Acrobat 7 Professional, each of 7-8K pages or 15-20 MB. 

                           

                          These imported fine through my project file and exported to Excel spreadsheets of around 11K rows.

                           

                          The PRN and other temporary files don't get created until the progress bar is finished, actually.

                           

                          Maybe there is something about system resources other than disk space or virtual memory?  But at least it worked.  The PDF functionality is fast and reliable.

                           

                          Thanks for the help! [/b][/quote]