By Craig Shallahamer
Most Oracle performance analysis is now time-based. But it is "total time"-focused: Time to process a SQL statement, a batch process, or the CPU consumed plus Oracle wait time that occurred over an interval of time. This is a fantastic way to approach optimization because it is easy to monitor improvement and it is closer to what a user is experiencing. And, with just a couple twists, we can unite Operations Research (OR) queuing theory with the Oracle time-based approach, opening up an entirely new arena for performance analysis.
The Setup
It takes time to process work, so let's clearly define both the time and the work. We'll describe an Oracle system experiencing cache buffer chain latch contention in terms of the buffer accesses. This is like describing the weather in terms of inches of rain or the wear on a tire by the miles driven, so our unit of work will be defined as a buffer access, which is commonly called a buffer get or logical IO (LIO for short).
Now, suppose over a 30-minute period of time, 218M buffers were accessed, 9000 seconds of CPU was consumed, and 23000 seconds of Oracle non-idle wait time occurred. (Oracle processes are either consuming CPU or waiting for something. There is no other option.) Another way to express this is that, on average, over the 30-minute period, it took 0.1468 ms to process a single logical IO. We have just calculated the response time, which is the time it takes to process a single unit of work.
If you're familiar with queuing theory, you're ready for the service time and queue time calculations! Interestingly, as the database workload increases, the CPU consumed to process a single unit of work is nearly constant, yet the Oracle wait time increases. If this sounds a lot like service time and queue time, it is because the behavior is very similar - so similar in fact that we can use this Oracle "service time" and "queue time" just as OR queuing theory indicates.
Continuing with our example, the service time will be the CPU consumed per unit of work, which is 0.0413ms/LIO. The queue time will be Oracle wait time consumed per unit of work, which is 0.1055 ms/LIO. Adding the service time and the queue time results in a response of 0.1468 ms/LIO.
Response-Time Curve
To complete the picture, we also need the arrival rate. Loosely speaking, this is the rate at which Oracle processes work, which in our case is represented by logical IOs. Since over a 30-minute-period, 218M logical IOs were processed, the arrival rate is 121.1111 LIO/ms. With this metric, we can plot a single point on the classic response-time curve plane. The response time is the vertical axis and the arrival rate is the horizontal axis.
Here is the equation for a CPU-based system. Since we are focusing on in-memory CPU activity, the CPU - as opposed to the IO - response-time formula is more appropriate.
R = S/( (1-( L*S)/M)^M)
Where;
R is the response time; 0.1468 ms/LIO
S is the service time; 0.0413 ms/LIO
L is the arrival rate; 121.1111 LIO/ms
M is the number of effective servers. A server serves transactions, which will equate to roughly the number of CPU cores.
We have one missing variable, which is M. If you are using MS-Excel, you can use the goal seek function to solve for M, but I have created a simple web application to do this, as well at http://filezone.orapub.com/cgi-bin/msolve.cgi. In this example, M is 5.3222. Figure 1 shows the M-Solver with our data entered.
Figure 2 is a graph based on our data input and then the call to WolframAlpha.com creates a wonderful and interactive graph. I have also moved the cursor to the response time and arrival rate intersection. This is the point where our system was operating over our 30-minute sample period.
How to Use this Unification
The response time curve is a wonderful communication tool. Most technical and non-technical people inherently know you don't want your system operating in the elbow of the curve (i.e., the steep part). What I tell people is our performance solutions will move us out of the elbow of the curve. With just a little more explanation, my audience is thinking, "Okay, Craig, but how do we get out of the elbow?" To which I reply, "Great question! Here are our solutions which will move us out of the elbow." A perfect setup!
Every performance solution will impact one of the response equation variables. For example, if the SQL is tuned, the workload will be reduced, which means the arrival rate will be reduced. And if an Oracle instance parameter is changed, making Oracle more efficient, the CPU required to process a single unit of work will decrease; that is, the service time will be reduced. By quantifying these changes, injecting them into a response-time equation, plotting the point and re-plotting the initial curve and our two solutions, performance solutions can be visually compared.
Conclusion
Uniting OR with time based database performance analysis is fascinating, it works, and it opens up a powerful analytical tool when combating intense performance situations. Over the past few years, I have used this method many times and taught hundreds to do the same. The process will deepen your performance analysis understanding and your ability to help non-technical people understand what we are doing and why, and, along with the expected benefits, is worth the few minutes it takes to perform.
Get involved - the Independent Oracle Users Group (IOUG) represents the voice of Oracle technology and database professionals. Join IOUG at the user-driven Oracle education event of the year, COLLABORATE 12- The IOUG Forum! Register with IOUG and receive free full-day deep dive training, as well as access to hands-on labs and other unique benefits.