5 Replies Latest reply: May 15, 2014 10:14 AM by Grant Perkins RSS

    SQL Queries and locking tables

    ZJSoule _

      Hello all,

       

      I am starting to venture into the world of SQL databases.  I have been granted read access to a test environment of our production system.  I have the ODBC connections set up and I can access the tables. 

       

      My question is this:  When I am querying these tables, am I locking them?  I assume that once the table has been loaded, Monarch releases it (but I could be wrong).  Is the table being locked while it's loading?  This is a concern since the end goal is to pull a significant amount of data, and I wouldn't want to keep production tables locked for any real length of time.

       

      Any info is appreciated!

       

      Zach

        • SQL Queries and locking tables
          Olly Bond

          Hello Zach,

           

          I'm no programmer, nor a DBA, but I'd always worry about the load on a production system. It might crash, which has a cost in recovery time, or it might slow up, which has a cost in annoying lots of users, or you might be paying more for CPU usage on a mainframe, which has a rather large cost 

           

          Most BI and reporting is pretty functional - it aims to support human decision making, and most of us have to make decisions in reasonable human time frames. There's very little advantage to be gained in querying real time production data, unless you have the ability to influence events based on real time decisions. Ordering more parts for a widgets factory, or hiring someone, or filing your year-end accounts, don't need real time BI.

           

          Many production systems are running 24/7 but don't have any users logged in for twelve hours overnight. This is quite handy to backup your data, or make some system changes without impacting users, but it's also useful to run off all the standard reports you need. Then let Monarch do the work of parsing, joining, splitting, and delivering the data you need to the people who need it.

           

          Best wishes,

           

          Olly

            • SQL Queries and locking tables
              ZJSoule _

              Olly,

               

              What you've described is what we currently have.  I just wanted to float the idea out there and see what people's thoughts were.  Thanks for the reply!

               

              Zach

                • SQL Queries and locking tables
                  Grant Perkins

                  Further to Olly's comments, with which I fully agree ....

                   

                  I would expect your access to be set up as "read only" in which case locking a live database should not be a problem per se. You may need to check this with your IT admin people to be sure.

                   

                  However ....

                   

                  If you don't have a 'frozen' database and the extraction takes some time for a large amount of data you may not always get consistent results without some very specific work (or scheduling of extractions ) to ensure the integrity of the data you are working with. This may or may not matter  - depends on the purpose of the analysis or data work on a task by task basis.

                   

                  Whilst the obvious choice for running large extractions from a database (assuming the purpose is not 'real time reporting') is to use a system;s low activity times, as Olly has pointed out, there are other people out there probably seeking to use the same assumed benefits and things may be busier, with system admin type tasks, than you might think. Plus you don't want to be working in the middle of the night so you would have an automated extraction that might fail ... etc, etc. How mission critical is the activity that will be based on live access to the DB?

                   

                  If you really need real time data you just have to go with what you've got.

                   

                  If you don't need real time then I would be looking for a couple options to work with in this order;

                   

                  1) Any existing 'trusted' reports from the system that already deal with 'freezing' the data entries if that might be an issue.

                   

                  2) Working with a 'frozen' copy of the database. This might be a back up file (although not much use of incremental backups are taken?) or possibly a temporary copy of the DB made available for analysis work. Obviously that would require storage space but would save the overhead of running anything, including trusted reports, on the live DB.

                   

                  I have used this approach in the past for a 'database' (a flat file system in that case at that time but it makes no difference to the concept) where the client was running live 24/7. Activity during the night was low but existed in specific areas of the operation. Since there was a need to backup the entire system with no activity occuring we agreed a time with the key overnight users at which the system could be taken offline for a brief period. All data files were copied to a backup location and the system was returned to operation. This took no more than 20 minutes - the target time was 10 minutes or less even as the files sizes grew.

                   

                  The copied files were then used to create backup volumes (to tape at that time) and for reporting purposes where overnight analysis was performed automatically and the result distributed either to pre-ordained folders on the network (for large outputs) or by email (as email systems matured to cope with useful file sizes) directly. Or some combination of the two. In this way the live database was not overly loaded, data integrity was improved and there was better consistency of report delivery since nothing happening in the live system adversely affected the reporting operation (unless something was going badly wrong already ... )

                   

                  One other aspect of working with a database, depending on what you will be doing with the contents, is that structures can be impenetrably complicated and how the 'data' is set up may be less than clear in many cases. You can almost certainly bet that there will be stuff in there that is used in ways 'not as originally intended by the design teams'.

                   

                  For example our systems included trusted (i.e. fully tested and assessed) reports that performed calculations based on table driven information applied to data field values where the calculations, in the main, only existed in the reports and one or two screens used for live operations. I don't think anyone ever attempted to recreate the calculations (certainly not successfully) using an external report writer. This is not an unusual situation by any means in my experience.

                   

                  However the trusted reports that were available within the system catered for all of that need ... albeit it maybe not in quite the presentation format that an individual requirement dictated. Usually they were a little 'verbose' at best.

                   

                  Verbose was great for use with Monarch of course - making it easy, most of the time, to get just what was necessary from a trusted greenbar report or two.

                   

                   

                  A few thoughts to consider that might lead to some conversations with your IT people.

                   

                  HTH.

                   

                   

                  Grant

                    • SQL Queries and locking tables
                      ZJSoule _

                      Grant,

                       

                      Thank you very much for the reply! 

                       

                      We have two goals: 

                       

                      One is to hopefully create a year end "dashboard" that we can run during working hours and get an idea of where we are in relation to financial goals during the last few days of the year.  This would require real time data, BUT 100% accuracy would not be expected. 

                       

                      The second is a little more complicated.  We currently have a weekly/monthly extract out of our production system that acts as a mini data warehouse for us.  It works great, but it's HUGE and it takes upwards of 8-10 hours to be created (Its currently up to a 300 mb file in .txt format), this in turn delays the creation of other files, and is a strain on our system. 

                       

                      We were hoping that maybe creating a SQL snapshot, or just having Monarch grab targeted data when needed (rather than having a massive file) would alleviate some of the issues we have with this file.  Unfortunately, we don't currently have a lot of SQL knowledge in our organization, so we (the non-programmers) are muddling through the best we can.

                        • SQL Queries and locking tables
                          Grant Perkins

                          Grant,

                           

                          Thank you very much for the reply! 

                           

                          We have two goals: 

                           

                          One is to hopefully create a year end "dashboard" that we can run during working hours and get an idea of where we are in relation to financial goals during the last few days of the year.  This would require real time data, BUT 100% accuracy would not be expected.  /quote

                           

                          I would guess that you can easily identify which transaction types are important for this. Usually there will be some form of transaction record created in a specific DB table that, hopefully, will be well indexed. My thoughts here are that if you can identify such a table or tables you should be able to grab related information from them, using a good key, hopefully, that can feed your dash board data base. Append to whatever you have previously extracted and then run the dash board from that extracted archive.

                           

                          If you are lucky the system will have an option to output a 'copy' of the new transactions, maybe even as an email feed, in which case you may be able to get real time updates delivered somewhere ready for ingestion to some Monarch processing. You may need to check for reasonable accuracy. If your system is providing management with an overnight summary of the state of play that might be something you could check the dash board figures against each night.

                           

                           

                           

                           

                          The second is a little more complicated.  We currently have a weekly/monthly extract out of our production system that acts as a mini data warehouse for us.  It works great, but it's HUGE and it takes upwards of 8-10 hours to be created (Its currently up to a 300 mb file in .txt format), this in turn delays the creation of other files, and is a strain on our system. 

                           

                          We were hoping that maybe creating a SQL snapshot, or just having Monarch grab targeted data when needed (rather than having a massive file) would alleviate some of the issues we have with this file.  Unfortunately, we don't currently have a lot of SQL knowledge in our organization, so we (the non-programmers) are muddling through the best we can.[/QUOTE]

                           

                          Right well this sounds like exactly the sort of requirement that the Datawatch Enterprise level applications are intended to cater for. One immediate benefit is that you can refine the extraction with some precision and only produce periodic updates then append to an existing data store. You then get a frozen history and no significant load on the production system 'dumping' the entire content (or a lot of it) to en external file on a weekly basis. As with the dash board, but with less frequency, the idea will be to find a suitable feed or feeds that allow you to gather only the most relevant data and for only entries that have happened since the previous extract.

                           

                          The requirement may not reduce the ultimate size of your extracted database but should reduce the amount of processing required to keep it up to date. Providing you can identify the records by using  a suitable Key into the data it ought not to take too long to process. Obviously whatever you repeated extraction of old data just to populate a new file is somewhat wasteful of production system resources (if I read your description correctly) yet will still leave you with the same size of extracted data after each run. It would usually be better to be able to select a subset (assuming you can) and append to an existing file than re-create it each time.

                           

                          That said there are situations when re-creation may make more sense.

                           

                          Whilst financial transactions tend to be dealt with as daily records of money in and out a single transaction, say a credit note, might have a retrospective effect on values in an account for other reporting purposes. In such a case the way one would extract and store the information might need to take a different approach and a rolling re-extraction could make more sense provided that people understood the basis of the information.

                           

                          If, after careful consideration, there seems to be no way of reducing the load on the production system for the Weekly/Monthly reports I would be very tempted to seek a way so make a temporary copy of the database in order to be able to run the analysis extraction off line and spread the load. Either way I guess you end up with some sort of investment to deal with the load.

                           

                          The other option might be to look at all of the outputs being produce by the system to see if there might be a better way of getting them.

                           

                          For example in one production environment I was involved with I realise that most of the large report runs, often run as individual report write type outputs, many per night, could all be provided with the data form 2 of the standard and trusted system reports, forms of which were also being run and the output printed, distributed and largely ignored.

                           

                          So we could cut production system overhead, create 2 well tested standard report files, run a few Monarch processes on them and provide people with all of the information they already were given but do it in electronic (useful) form that was searchable and thereofre potentially usable. It may be that the same thing applies with your systems and that the production system load could be reduced by doing fewer less complex extractions and simply dumping information to 'desktop' based resources for final preparation and distribution. To put that another way - you still have your 300Mb files extraction overhead (at least in theory) but do a lot more with it saving the prod system from having to perform some of its existing tasks. I have no idea if that is practical for your situatution but it has to be worth looking at.

                           

                          It's difficult to suggest anything more specific without getting to know the systems in some detail but I hope these thoughts may spark some ideas for you.

                           

                           

                          Grant