8 Replies Latest reply: Sep 19, 2016 11:24 AM by A MAN RSS

    Multiple Lines for Random Column Entries

    A MAN

      Hello Everyone

       

      I'm having trouble finding an easy way to extract the data I need without having to use several formulas in Excel to rearrange the data once I extract it with Monarch Classic v13.4. Below is a sample of the data I am working with. It is in PDF format. As you can see there are several spots where the information in a column flows into the next row. I can extract it row by row and the put it back together in Excel, but was wondering if there was a way to do it all at one time within Monarch. The sample doesn't show it but column E also has this issue for a few lines of the multi page report.

       

                     A                                  B                    C        D                  E     

       

      Thank you in advance.

       

      Aaron M

        • Re: Multiple Lines for Random Column Entries
          Olly Bond

          Hello Aaron,

           

          Yes there is a way to do this in one model in Classic. You need a single line detail template, but define each field (using the Advanced Field Properties) as ending on "none of the above".

           

          Best wishes,

           

          Olly

            • Re: Multiple Lines for Random Column Entries
              A MAN

              Thank you Olly for the quick reply. I tried what you suggested but it does not give me the results I want. It is not adding the parts of the line that flows into the next line. For example on the sixth line down in column A "Advertising, community health" I can't get the next line down "education" to be added to the "Advertising, community health" line. Making the whole line as one "Advertising, community health education".

               

              It just adds it as a separate line below the one above.

               

              example:

              Advertising, community health

              education

               

              Not like what I want:

              Advertising, community health education

               

              Thanks again for all the help you give on this forum.

               

              Aaron

                • Re: Multiple Lines for Random Column Entries
                  Olly Bond

                  Hello Aaron,

                   

                  If you have a N numeric trap somewhere in column D as well, then it should work. It sounds like you're trapping the secondary lines as well, so need more trap characters to get Monarch to distinguish the first line from the following lines.

                   

                  Best wishes,

                   

                  Olly

                    • Re: Multiple Lines for Random Column Entries
                      A MAN

                      Thanks again for the help. Adding the N trap did help, but the inconsistency of my data causes Column A to put together several lines that do not go together. My example did show any all of the crazy formatting that is causing the issues.

                       

                      Again thank you for the help. I think I can work with your suggestions to get what I need.

                        • Re: Multiple Lines for Random Column Entries
                          Grant Perkins

                          Aaron,

                           

                          PDF files very often produce extremely erratic text outputs no matter which of many available commercial tools you might try to use for the extraction.

                           

                          Adobe's own tools when used with files produces by their own software are usually not too bad (unless the PDFs were written by some old versions of the Writer program) but anything produced by a third party writer program, as often built into some large database based applications, can vary from excellent to unusable.

                           

                          There are usually ways around the problems but sometimes they are not at all obvious.

                           

                          Is this a report you could share at all. Not publicly perhaps but maybe on a secure server or perhaps with a anonymized "sample" version?

                           

                           

                          Grant

                            • Re: Multiple Lines for Random Column Entries
                              A MAN

                              Grant

                               

                              The file is a government produced file available to anyone.

                               

                              Here is the link to the original file if you want to take a look. We need it in an Excel format for additional manipulation and reporting.

                               

                              http://www.oshpd.ca.gov/hid/Products/Hospitals/AnnFinanData/Manuals/ch3000.pdf

                               

                              Thank you

                               

                              Aaron

                                • Re: Multiple Lines for Random Column Entries
                                  Grant Perkins

                                  Hi Aaron,

                                   

                                  Well there are some "interesting" structure and formatting decisions but try these settings for the PDF interpretation stage of the process before going to model creation:

                                   

                                  You'll be using Classic here.

                                   

                                  PDF settings

                                   

                                  Tick "Monospaced"

                                   

                                  Set Stretch to    6.3

                                   

                                  I ended up with crop set to "1" but I don't think that much matters (other then in may confuse any previously generated "Auto" definition.)

                                   

                                  PDF Engine set to the default 4.1

                                   

                                  That looks like it traps most lines in good shape and in presentable columns. It will allow use of the multi-line method for field definition.

                                   

                                  However there are a few sections where the regular format "rule" is "modified".

                                   

                                  The examples seem of have a number of subsections which, in the normal way of things, would have been on their own lines.

                                   

                                  Telephone Expense:

                                   

                                  Travel Expense:

                                   

                                  Uniforms (hospital furnished):

                                   

                                   

                                  Are examples. However I noticed 2 section under "repairs that do not present in the same way.

                                   

                                  I assume, based on the date of the document, that this is effectively a fixed and non-changing file for practical purposes.

                                   

                                  If so I'm tempted to suggest that, although I can imagine ways to deal with this anomaly of style within Monarch, the pragmatic solution would be to consider excluding these lines from the processing to Excel and simply add them afterwards via cut and paste or similar.

                                   

                                  However, if you have further processing to do on the extracted data before passing it to Excel there are some approaches that would be interesting to compare for ease of development and practicality in use.

                                   

                                  One approach would be to identify a trap that would include the indented lines of the sections mentioned and treat them as "normal" lines.

                                   

                                  Another might be to trap them as multiple text lines and then split them out into sets of calculated fields using the TEXTLINE() function. Ultimately those sets would need to be exported to become separate records in the Excel document. There are 2 or 3 practical ways to do that - especially for a one-off creation.

                                   

                                  See how you get on with these ideas. If you get stuck we can pick it up again and see where it takes us.

                                   

                                  Do you need to extend the output at all - for example by providing the text associated with the (a), (b), etc. notes from Page 3 - before sending on to Excel? Or maybe add your locally relevant codes to fully and explicitly interpret the notes?

                                   

                                   

                                  Grant