0 Replies Latest reply: Jun 22, 2014 2:49 PM by Dean Gwilliam RSS

    all of pdf to text calling monarch pro 8.01 from shellexecute?

    Dean Gwilliam

      Hi I've been using the Xpdf's free pdftotext.exe to convert pdfs to text but just wonder if I can use the above instead

      and if so how i.e. what parameters should I use?

      Thank you in anticipation and BTW what an awesome product!

        • Re: all of pdf to text calling monarch pro 8.01 from shellexecute?
          reeves

          Hi Dean,

           

          First of all you'ld have to create a model and project file and than you could use a batch file. A batch file generator can be found at: [URL is no longer valid]

           

          Regards,
          Koen

            • Re: all of pdf to text calling monarch pro 8.01 from shellexecute?
              Dean Gwilliam

              That's great. Thanks very much Koen.

              Regards

              Dean

                • Re: all of pdf to text calling monarch pro 8.01 from shellexecute?
                  Dean Gwilliam

                  I used the template to select certain parts of the pdf and that's fine but I want to select ALL LINES and can't see how you do that.

                  I'm also not sure what the relationship between a template and a model is i.e. I'm saving a project and opening a template. Where does model come into this?

                   

                  Sorry for my naivety!

                    • Re: all of pdf to text calling monarch pro 8.01 from shellexecute?
                      reeves

                      The project file holds the combination of the source file (e.g. a pdf or txt file which holds the data) and the model. The model file holds all traps and filters.

                       

                      Basically that answers your first question: the project knows which file holds the data and the model tells it which lines to extract. Apparently not all lines in the data file are meeting the traps you've set in the model. Open the project in Modeller, change the traps and you should be able to fix it.

                        • Re: all of pdf to text calling monarch pro 8.01 from shellexecute?
                          Dean Gwilliam

                          Thanks very much for your responsiveness and explanation re Monarch's project structure.

                          That helps a lot.

                          Re the trap...If I was just processing a single file...I'd be very happy to adjust the trap to get all lines.

                          The problem is I'm trying to automate the collection of text from a lot of disparate pdfs so...I'm after the foolproof trap that will catch every line of every pdf ever presented. Is this possible i.e. the equivalent of wildcard '*'.

                          Sorry if I wasn't clear and thank you for your consideration.

                            • Re: all of pdf to text calling monarch pro 8.01 from shellexecute?
                              reeves

                              I'm more into trapping txt files or processing csv files in Modeller, but I guess this should work: simply don't put any traps on the trap line, but highlight the entire row and make it a memo field.

                               

                              This is not really a intelligent way of extracting the data. Better said, it's data conversion rather than extracting data. There are probably better tools for this than Modeller.

                                • Re: all of pdf to text calling monarch pro 8.01 from shellexecute?
                                  Dean Gwilliam

                                  Thanks I'll try that and let you know how I get on.

                                  I realise I'm effectively using a scalpel as a cold chisel.

                                  I was using pdftotext.exe but it's not converting accurately and Monarch's sitting there for those pdfs that won't convert automatically.

                                  Unfortunately, after testing, the number of files that won't convert accurately is prohibitive so...Monarch's taking over as the primary rather than remedial tool. When you say there's might be a more appropriate straight conversion tool did you have something in mind?

                                  Much appreciated!

                                    • Re: all of pdf to text calling monarch pro 8.01 from shellexecute?
                                      Dean Gwilliam

                                      I couldn't find  a memo field per se but didn't put anything on line S and highlighted all of line T

                                      I saved this template as "no_trap_in_T_hilite_all_S" and...it selected everything like you said.

                                      I then clicked the table window which let me then do...file/export/table/nameofpdf.txt and choose output dir i.e. exporting to a text file.

                                      Phew!

                                      Mindful that my project only has a single pdf...I'm just wondering if it's possible to specify a command line that changes in a loop without needing a project for each pdf that I need to convert e.g. something like

                                      for  X = 1 to 10

                                      monarch     path_to_pdf+str$(X)    path_to_SAME_model_file   path_to_txtfile+str$(X)

                                      next i

                                      Nearly there hopefully and thanks very much for your help.

                                        • Re: all of pdf to text calling monarch pro 8.01 from shellexecute?
                                          reeves

                                          You can add multiple report files to a project, I'm sure that's the case for pdf files as well. However you've to do this in Modeller (or Monarch). If you're still using xprj files (Monarch Project files) you could also add reports via a text editor as xprj files are XML files.

                                           

                                          Modeller can not be instructed to take all files from one folder. For this you'ld need Datawatch Automator (aka follow-up of DataPump). This tool allows you to use the astrix (*) as wildchart. So "C:\SomePath\Sales *.txt" would pick all text files in the folder C:\SomePath\ which start with Sales as well as Sales.txt.

                                            • Re: all of pdf to text calling monarch pro 8.01 from shellexecute?
                                              Dean Gwilliam

                                              >You can add multiple report files to a project, I'm sure that's the case for pdf files as well.

                                              That's good

                                               

                                              >However you've to do this in Modeller (or Monarch).

                                              >If you're still using xprj files (Monarch Project files)...

                                              I am

                                               

                                              >...you could also add reports via a text editor as xprj files are XML files.

                                              Thanks! I don't quite understand this at the moment but will look into it.

                                               

                                              >Modeller can not be instructed to take all files from one folder.

                                              >For this you'ld need Datawatch Automator (aka follow-up of DataPump).

                                              Sounds like a limitation re processing a large number of pdfs

                                              but then I suppose Monarch/Monarch is intended as a scapel not a chain saw <smile>.

                                               

                                              Thank you very much indeed for your advice.

                                              It's extremely helpful and very kind of you.