Catalogue of SQL & PL/SQL Bad Practices

gojko's picture
articles: 

I have started compiling a list of Oracle PL/SQL and SQL bad practices, with the intention of producing a comprehensive catalogue of common programming errors, that can be used as a check-list for code reviews or given to junior developers so that they can learn from the mistakes of others.

For each bad practice, I provided a list of symptoms in the code, an explanation why it causes problems and a list of preferred solutions. I have also listed exceptions to the rules, when they exist. One of the four severity levels is assigned to each bad practice, identifying the danger:

  • Risk of data corruption – writing the code like this will probably lead to loss of data
  • Makes system harder to use – writing the code like this will prevent reuse or make it harder for clients to develop on top of it; it may also make the system harder to maintain
  • Reduced performance – writing the code like this will cause sub-optimal performance and waste resources (typically CPU or storage)
  • Security risk – writing the code like this introduces a security hole in the system, risking sensitive data to be exposed to unauthorised users.

You can download the first version of the catalogue, in PDF form, from http://gojko.net/effective-oracle

Please provide feedback and help make this list more complete. You can do that by sending information on similar bad practices you have noticed or commenting on issues I have outlined in this document. I would especially like to hear from you if you have a better solution for any of the issues or if you do not agree with any of the explanations or potential problems.

So far, this is the list of bad practices with symptoms:

Using non-deterministic functions directly in conditions: Reduced performance


- Non-deterministic function that does not depend on current row values (or depends on a subset of row values) is used in Where clause. Examples might be functions that fetch data from a referential table depending on current database context, or perform some calculations on derived data; Functions are not enclosed in select from dual or a subquery.

Catch-all error handling: Risk of data corruption


- A catch-all exception block (WHEN OTHERS THEN) used to process an error (or a group of errors) in a PL/SQL procedure. This is done typically either to ignore an expected error which should not affect a transaction (for example, when duplicate key in index should be disregarded, or if a problem with inserting should be quietly logged). Catch-all error handling might also be written when a
single block of code is expected to throw several types of exceptions (no data found, duplicate value on index, validation…) and the developer wants to handle them all at once.
- Another example is using WHEN OTHERS block to catch domain exceptions thrown by RAISE ERROR because they cannot be individually checked.

Client access as object owner: Security risk


- External applications, application servers, web servers or client applications accessing database with usernames/
passwords of object owners (schema containing objects).

Client access to tables: Security risk


-External client code (web, applications) reads and writes data directly to tables

Embedding complex SQL code into PL/SQL: Makes system harder to use


- Large chunks of SQL code (typically a complex Select statement), used inside PL/SQL function or procedure.
- Using an optimiser hint inside PL/SQL function or procedure.

Error handling with magic numbers: Makes system harder to use


- RAISE_APPLICATION_ERROR used in pl/sql procedures, with arbitrary error codes that conform to some implicit contract with other users/front end developers (no pl/sql exception declaration for that error code).

Error handling with output parameters: Risk of data corruption


-Procedures or functions declared with return code (and/or error message) output parameters. In case of errors or problems, these parameters get special values that should signal the caller about problems.

Formatting data in views: Makes system harder to use


- to_char used on date or numeric values in views
- strings concatenated in view code to form pretty printed values

Hardcoding local Varchar2 variable size: Makes system harder to use


- PL/SQL function or procedure declares local Varchar2 variables for temporary storage of table values, with hard-coded length
- Views declared with Varchar2 types with hard-coded length

Ignoring exceptions: Risk of data corruption


- "Exception When others then NULL;" This kind of code is written when errors such as attempts to insert a duplicate record or modify a non-existing row should not affect the transaction. It is also common in triggers that must be allowed to fail without effecting the operation which caused them (best-effort synchronisation with an external resource)
- Less frequently, this code is written by junior developers who do not knowing what to do in case of an error, so they just disregard exceptions.

Not using bound variables for changing parameters: Reduced performance


-Frequently executing the same query with different parameter values, but specifying parameter values literally, without using bound variables.

Relying on conditional column predicates in triggers: Risk of data corruption


-A trigger uses conditional predicates to check whether a particular column is being updated. if updating('ColumnName') then do something with column end if

Relying on context information in package variables: Risk of data corruption


- Variables in PL/SQL packages used to store context (session) values between DB API calls. They are typically initialised by calling a procedure explicitly, or using a log-on trigger. Stored procedures rely on these variables being properly initialised and do not check whether they are empty.

Relying on external context initialisation: Risk of data corruption


- Views and stored procedures use sys_context() to retrieve a global context, depending on that context not being empty. Context is not initialised automatically, typically requiring the caller to initialise it by calling a stored procedure.

Secondary processing with unshielded triggers: Risk of data corruption


- Trigger used to propagate updates of a table to a secondary subsystem (vertically adding functionality, like external publishing, auditing, etc not part of the main workflow). Subsystem transfer can happen either synchronously or asynchronously (with a queue/job combination). In any case, problematic symptom is that the trigger does not protect the primary system from secondary subsystem exceptions.

Storing ROWID for later reference: Risk of data corruption


- ROWID values for references are stored in a table or kept in client code in order to access referent rows easier

Storing empty LOBs: Reduced performance


- Empty CLOB values used instead of NULL for CLOB fields that do not hold a value

Too many levels of views: Reduced performance


- A large hierarchy of views containing sub-views or subqueries is in place. Such hierarchy is usually established as several layers of abstraction over abstraction, typically when adding new core features to underlying models, but keeping client API for backward compatibility using a set of higher-level views.

Transactional control in non-autonomous procedures: Risk of data corruption


-Commit or rollback statement in a stored procedure or function without PRAGMA AUTONOMOUS TRANSACTION

Trigger depending on order of execution: Risk of data corruption


-Triggers relying on other triggers for the same action. Examples might be sharing contextual information, assuming that a log/audit record added by another trigger exists.

Using Sequence nextval without curval: Risk of data corruption


-Sequence currval method used in a procedure or trigger without calling nextval first typically in a trigger that updates a log record, or a procedure that partially processes data-sequence currval used to calculate the next value which will be read by nextval

Using a sequence as a counter: Risk of data corruption


- A sequence object used in a manner which relies on consecutive numbering without gaps. Often with arithmetic operation on sequences used to guess the next or previous value without actually using sequence.nextval.

Using bound variables for constants: Reduced performance


-Bound variables used in queries for values that are never changing (often when client developers bind all variables in a query).-Bound variables used for parameters where actual value can significantly effect the optimisation plan

Using derived column values for existence checks: Reduced performance


- count, sum or some other aggregate function (i.e. custom made csv) executed in a subquery, and then compared to 0 or 1 or NULL in the WHERE clause of the outer query in order to check whether at least one (or exactly one) related element in the subsctructure exists; results of the aggregate function are then discarded (not passed on through the outer view).

Using exceptions for flow control: Risk of data corruption


- Half-successful procedures, throwing exceptions when the work is not totally complete, but a part of it is
- exception when XXX then blocks used to control workflow other than error-handling (exceptions used instead of state-codes)

Using magic numbers for non-existing values: Makes system harder to use


- Special numerical or string value used as not applicable or non-existing in stored procedure parameters, return values or table values. Usually this value is 0 or -1 if parameters are numerical IDs.

Using non-dedicated packages for continuous jobs: Makes system harder to use


- A continuous or frequent database job executes a long-running procedure from a pl/sql package. That same package is also used for client access or by other schemas.
- The body for a continuous or frequent database job is a procedure from such a package.

Wrapping everything into stored procedures: Makes system harder to use


- Client access layer consists exclusively or mostly of stored procedures - both for writing and for reading. Ref Cursor output parameters of stored procedures used to read data. -
- Stored procedures are used to fill in additional details when inserting or updating data (automatically calculated columns, timestamps and similar).

Comments

Just feel inspired to add my opinions:

1) Using non-deterministic functions directly in conditions: Reduced performance
Agree. Do it only under duress

2) Catch-all error handling: Risk of data corruption
Agree. Also "makes system harder to use" and "Security Risk"

4) Client access to tables: Security risk
Er...and exactly is a client app supposed to do anything if it cannot access any tables? Such a system would be "secure", but in a very trivial way. I suspect you meant to say something about all access being though stored procedures (see below for my comments on _that_).

5) Embedding complex SQL code into PL/SQL: Makes system harder to use
Disagree. This has been discussed recently at http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:672724700346558185
and I have to say I am on Tom Kyte's side on this one. Putting the SQL where it is being used makes for easier debugging and improves performance. And, in a well-designed system, there is only one (or possibly two) places where a specific query will be used. Actually, putting everything into simple pl/sql functions promotes the practice of using these functions in-line in queries, which is Bad Practice #1

7) Error handling with output parameters: Risk of data corruption
Agree, with the clarification that no exception is raised (or re-raised). As long as an exception is raised, I see no problem with also returning additional diagnostic information (as long as you are careful how much of it you make public).

8) Formatting data in views: Makes system harder to use
Can go either way -- which means it is not a Bad Practice. Some front-end tools require this
(and I'm not sure I want non-technical end-users trying out the fine nuances of TO_CHAR)

9 - 12) Agree

13) Relying on external context initialisation: Risk of data corruption
If I understand you correctly, you are calling FGAC/VPD a bad practice. I think this bold statement requires a bit more development.

14) Secondary processing with unshielded triggers: Risk of data corruption
Agree. Been there; failed miserably

15 - Lost count) Generally agree, but I have to think about what it is exactly you are saying

?) Using exceptions for flow control: Risk of data corruption
Except that this is the #1 best way to handle this common situation:

IF Row X exists
THEN DO THIS
ELSE DO THAT
END IF

You put the fetch of Row X into a new BEGIN/END block, and if NO_DATA_FOUND do the ELSE clause.

3rd from end) Using magic numbers for non-existing values: Makes system harder to use
I agree: SQL has the NULL for non-existing values. Be aware that NULLs are somewhat controversial.

Last One) Wrapping everything into stored procedures: Makes system harder to use
I personally view this as a Best Practice, but I have to go to work in 6 hours, and there is not enough time left to give this the discussion it is worth.

All in all, I think this is a Good Start.