Assuring optimal performance of database applications starts with coding properly formulated SQL. Poorly written SQL and application code is the cause of most performance problems. As much as 75% of poor relational performance is caused by "bad" SQL and application code.
But writing efficient SQL statements can be tricky. This is especially so for programmers new to a relational database environment or those who have never been trained to properly write SQL. Of course, it is impossible to adequately address all SQL performance issues in a column like this, but we can address the basics of SQL coding for performance.
Steps to Efficient SQL Statements
The first step is to move from thinking about file-level processing to thinking relationally. Code join instead of implementing cursors for every table and reading from each of them like they were files. Joins are optimized by the DBMS (database management system) and will return only the data that is needed to your program. When using cursors like files much more data needs to be accessed and performance will suffer.
The next step is to always provide only the exact columns that you need to retrieve in the SELECT-list of each SQL SELECT statement. Another way of stating this is "do not use SELECT *". The shorthand SELECT * means retrieve all columns from the table(s) being accessed. This is fine for quick and dirty queries but is bad practice for inclusion in application programs. This is so because adding columns to a table will cause application programs to fail unless changed, and additional resources are consumed by the DBMS for every column you request. So if your program does not need the data, it should not ask for it.
Next rule: Be sure to use the WHERE clause to filter data in the SQL instead of bringing it all into your program to filter. This is a common rookie mistake. It is much better for the DBMS to filter the data before returning it to your program, because I/O and CPU resources are required to obtain each row of data. The fewer rows passed to your program, the more efficient your SQL will be. So, the following SQL
SELECT EMPNO, LASTNAME, SALARY
FROM EMP
WHERE SALARY > 50000.00;
is better than simply reading all of the data without the WHERE clause and then checking each row in your program to see if the SALARY is greater than 50000.00.
Another important rule is to avoid asking for what you already know. This may sound simplistic, but many developers violate this rule at one time or another. For example, what is wrong with the following SQL statement?
SELECT EMPNO, LASTNAME, SALARY
FROM EMP
WHERE EMPNO = '000010';
Give up? The problem is that EMPNO is included in the SELECT-list. You already know that EMPNO will be equal to the value '000010' because that is what the WHERE clause instructs the DBMS to do. But with EMPNO listed in the WHERE clause, the DBMS will dutifully retrieve that column too. This causes additional overhead to be incurred thereby degrading performance.
From an administrative perspective, it is important to keep your database statistics up-to-date. Without the statistics stored in DBMS dictionary or catalog, the optimizer will have a difficult time optimizing anything. Statistics provide the optimizer with information pertinent to the state of the tables that will be accessed by the SQL statement that is being optimized. Typical statistical details stored in the system catalog include:
- Information about tables including the number of rows, compression details, and total number of pages;
- Information about columns including the number of discrete values for the column and the distribution range of values stored in the column;
- Current status of the index including its organization state (number of levels and leaf pages), the number of discrete values for the index key, and whether the index is clustered;
- Information about the table space and index nodegroups or partitions.
Follow the Rules
Building appropriate indexes for your queries is also critically important, but we cannot adequately get into all the nuances of that topic in this month's column. Suffice it to say, that if you follow all of the above rules and experience performance problems, the next things to check are your query's access path and what indexes are available.
These rules are not difficult to follow and additional, in-depth tuning will be required. But following the above rules will ensure that you are not making "rookie" mistakes that can kill application performance.