7 Replies Latest reply: Oct 10, 2017 10:05 AM by Rebecca Cronin RSS

    Brand new to this and boy do I have questions

    Esdras Vera

      I am looking to extract information from "investment type" pdf statements  These will look similar to statement you may receive from any of the large brokerage firms out there.  Each statement has data that we would extract to excel.  But the data may be in one than more lines (for example a description of a bond or other type of investments that usually provides details. 

       

      Also each monthly statements would need to be passed three times in order to obtain the tables I need?

       

      For example

       

      1. Asset Allocation

      2. Valuation (prior balance and balance this period)

      3. Daily Activity

      I could add

      4. Portfolio holding etc.

       

      Any ideas where to start learning this at a QUICK speed?

       

      Thanks

       

      FS

        • Re: Brand new to this and boy do I have questions
          Rebecca Cronin

          Hi Esdras,

           

          I would like to share a link to a "How To" Video that may provide some assistance on what you may be looking to accomplish. How to Quickly Convert a PDF to Excel

           

          -R

          • Re: Brand new to this and boy do I have questions
            Grant Perkins

            Hi Esdras and Rebecca,

             

            When dealing with PDF files an early useful question to consider is how well the content you need to extract from the PDF behaves (does it retain a "report structure" well or is it erratic) and how consistent is it form one report to the next?

             

            The answers are very likely to influence the way you approach the extraction decisions if you are seeking to create a model or workspace to be re-used with no (or at least very minimal) interaction time after time.

             

            If the consistency of the PDF sources look good the next thing is to decide how the data structure "patterns" compare.

             

            A document presented as a Statement most often does not have the same sort of repeating structure of content that a typical operational report will have. Therefore, although it still makes sense to check for a usable repeating pattern to model, the chances are that a few variations may be required.

             

            Then we come to the question about "what is a record?"

             

            In order to nicely present information onto pieces of paper in a format that humans like to read (e.g. Legal/Foolscap/A4) reports often use 2 or more lines to present data that computers (and Excel) would like to see in one line. If the data for a "Single detail record" is reported on more than one line it is likely that Monarch Classic may become the modelling tool of choice.

             

            Once you have "seen" the data and assuming you know how it eventually needs to look wherever it is that you are planning to send it after extraction, it becomes relatively easy to take the basic concepts and cover the best part of the functionality you need. But there are likely to be a few special tweaks required for something like a PDF based Statement document (there always are in my experience)  that is where the process can be a little more "interesting" when making it as efficient and effective as possible on a case by case basis.

             

            I hope this helps. I think having to dive in to PDF files early in you acquaintance with Monarch (or indeed any other PDF reading and representing application) can easily be baffling in some ways  and sometimes completely baffling.

             

            The worst examples seem often to be those that look like they come close to giving a good result straight from the start but have a few imperfections. Sometimes the small imperfections can be easily dealt with. Other times they are a symptom of a deeper problem with the way the PDF file is being created and handling its internal data and a more drastic approach is required. Spotting this at an early stage can help to avoid a lot of frustrating effort that ends up going nowhere really useful.

             

            Very powerful and effective results are always possible but in the more difficult cases there is no step by step guide that is specific yet works for all.

             

            However, once you have a working model or workspace you have a self documenting set of tools that can be adopted and redeployed many times for future challenges.

             

            I hope that helps a little.

             

            For more incisive guidance it is almost invariably beneficial to be able to work with representative or substantially representative samples of the reports that need to be modelled. I recognise that this may raise privacy and security concerns but often such a very specific experience is the only way to identify how to ensure a successful outcome consistently if such a result is not forthcoming at an early stage of modelling.

             

             

            Grant