A better view

gojko's picture
articles: 

Accessing the database from the outside world basically comes down to two options - direct querying or executing stored procedures. Procedural access is often chosen for the wrong reasons - making maintenance significantly harder.

Using exclusively or mostly stored procedures seems to be an established practice in SQL Server community, due to various limitations of the optimiser and security model in earlier versions of that server. However, even Oracle developers are not immune to procedural overdose. It seems to me that people coming from Forms background prefer procedures to relational access, and I occasionally come across a data access layer consisting completely of stored procedures, designed and implemented by seasoned Oracle professionals. Most popular explanations for that design approach are:

  • Direct table access is a crime, so procedures must be built around them to restrict, control and audit access
  • Procedures are required because some columns must be calculated on the fly
  • Procedures provide clients with a way to request only 10 (50 or 100) records, and not retrieve everything
  • Procedures are needed so that client programmers do not have to understand all the details of the underlying table model

To (most of) that, I say "Rubbish". If you are reading this line and becoming furious, before scrolling down to write an angry comment, I'd like to offer you a (different point of) view:

  • Direct table access really is a crime, but procedures are not the only solution. Any restrictions you can impose on procedures, you can also implement with views.
  • Additional data can be calculated on the fly with PL/SQL functions and incorporated into views to look just like normal columns. Analytical functions can perform even very exotic calculations like ranking and finding local maximums and minimums.
  • Limiting the number of rows transferred from database to client code can be solved easily in client access drivers, using features like server-side cursors and paging. Even if the clients are using a driver without such support, it can trivially be implemented by adding "WHERE rownum<" or analytical ranking.
  • Procedures are not really good for hiding the complexity of the underlying data model - views are much better as they are a lot easier to maintain and use.

How Deborah and Webber avoided overtime

Imagine a software company - let's call them Effective Oracle Inc. Effective Oracle gets a contract from Clint (named after the world famous spaghetti western star) to develop his Web bookshop, and they give that task to their two star developers, Deborah and Webber. Deborah is a database specialist, and Webber is programming the front end.

Deborah and Webber are working on the offer listing page, and Deborah knows that she cannot let Webber read directly from the book table, as that would get her fired from any self-respecting software company (and Effective Oracle is self-respecting). She might give Webber a stored procedure to call. A month later, Clint asks for another page, similar to the offer listing, but also displaying sales and web browsing statistics. Just to make things more interesting, those statistics must be calculated using estimated forex rates, not just read from tables. Now Deborah is faced with a choice - either create an additional procedure, very similar to the old one, or add a few columns into the cursor returned by the old procedure. If she chooses the first option, then she is duplicating code and making it harder to maintain in the future. If she takes the second, then Oracle will calculate sales and popularity even for the first report, though they will just be discarded. If she originally used a view, than she could just add two columns to it, and Webber could choose which columns to read. On the old page, he will not retrieve additional columns, and Oracle will not execute the calculations.

Here is another scenario: Clint's marketing people ask for some weird report, in order to hunt down customers which have not spent enough money in the last few months. Deborah and Webber do it again as a combination of database stored procedure and a web page. Everything is OK for a while, but then Clint wants another similar report, including only clients registered this year. Deborah is again faced with a tough choice - either do a new procedure (duplicating code), or add a 'filter' parameter to the old one, with some default value that signals 'no filtering'. If she starts adding parameters, Deborah knows that very soon she will have to check whether to use the first filter, second filter, none or both. That procedure would quickly turn into a hydra, and she would be scared to look at it, let alone modify it. Making Webber fetch unnecessary rows and filter them in the Web code is not really a choice. Had she used a view for the report, Webber could just add a condition into the WHERE clause.

The situation gets even more complicated when that report has to be combined with some other data (customers' account information, for example). If the report was produced by a procedure, Deborah could not just join it with the Account table - she would have to develop a stored procedure that would duplicate all the code. I've actually seen one more solution to this problem, with both cursors retrieved and then joined in Web code1 - but I would not suggest doing that, since you would soon find if your employer is self respecting.

By giving Webber a view, Deborah is letting him filter, combine and order data using the database engine. Oracle will be able to optimise queries better, less data will travel from database to web servers, and Webber does not have to reinvent the wheel and implement relational model functionality. I would trust guys from Oracle much more for this task than Web programmers - no matter how good those Web programmers are. Deborah does not have to maintain duplicated code. If Clint wants to change the ordering or filtering - Webber can do it by modifying the query. Deborah does not even have to do anything. If she is currently busy, Webber does not have to wait for her, and they will actually finish the job faster. Deborah can, of course, re-use the views and join them with other views and tables easily, again not duplicating code.

It gets even better. Half a year later, in the middle of a busy trading day, system starts running slow and a DBA finds that one of Deborah's queries is the bottle-neck. 'Just add a RULE', they say, 'and it will be fine'. But if the query is in PL/SQL code, changing that during busy hours is not always an option. Any other procedures and packages that depend on that procedure will be invalidated, connected clients will get an exception first time when they try to execute changed procedures, and if any jobs depend on those procedures (or packages), they might need to restart. So, a simple optimiser hint now becomes a major problem, requiring downtime, which might not be possible during busy hours. If Deborah used a view instead of the procedure, then the DBA would be able to insert the hint without much trouble.

Read-write abuse

Read-only procedures are not the only ones that get abused, though they are the most obvious example. I consider "Update" procedures generally a bad idea, since they quickly grow into monsters. Imagine a customer service application with three important use cases - in one use case, just the name and address should be updated in the customer table; another use case updates all customer data, and the third just deletes the address, leaving other customer data intact. All cases can be implemented with three different update procedures, or a single procedure encapsulating three update statements, selecting which one to use based on parameters. When another developer takes over that little monster pet, he will wonder what to do if the address parameter is null, so this code is very likely to attract errors.

Procedures are often used as a wrapper over inserts, in order to simplify the API or define default values. Personally, I don't like using procedures to restrict inserts or simplify them by defining column values, and think that triggers and default value specifications are much better for that. If some other developer decides to update the tables directly, triggers and defaults will still execute, unlike the values I defined using a procedure.

Views can also be updateable, so in most cases writing an 'update' procedure as a simple wrapper around a single SQL statement is just a waste of time. If the clients already access the data using a view, then let them update it - you will not have to write separate procedures for different parameter combinations, and they don't have to worry about calling the appropriate procedure. What ever data they want to modify, they should just update it.

And again, if they want to update or insert a number of records, you don't have to write a new procedure and they don't have to execute database calls in a loop. Sure, not all views are updateable, but instead-of triggers are a good workaround for that problem.

Procedures are often used to implement column-level security - to let a user update only certain columns, without the ability to modify others. Even in those cases, implementing a view that hides columns is much easier than writing a stored procedure wrapper over Update statements. Separating access by privileges is not really an excuse to force updates through procedures.

Any logging or auditing should not be done in an update stored procedure anyway, but as a trigger, since it should catch updates even if somebody runs them directly on the table.

When to use functions and procedures

Functions are excellent for encapsulating business rules (like "should discount apply" or "estimated forex rate" mentioned above), and I suggest using functions for business rules for the same reason I suggested views as read-write access layers: code will be easier to maintain. When the discount rate rule changes (and it will one day, no doubt), I will have to change just one function, and will not have to search through dozens of views and modify them.

Procedures are most useful for capturing business processes and automating operations, such as multi-step updates or creating a hierarchy of elements. Also, having different combinations of "default" values is really hard to implement without procedures, so they are good as "wizards" for creating records in complex cases. Since execution order of triggers cannot be guaranteed, procedures are excellent as wrappers around several trigger bodies that would need to be executed in a specific order.

I can even name a few cases when it's better to use a procedure for read-only access - typically involving several tables merged and filtered all by date ranges, when data could not be properly filtered with views. Though in most of those cases the report can be expressed as a query, and then filtered with an external Where clause, the execution plan would be significantly less optimal then if the query was parametrised from inside. In that case, I would suggest extracting the query into a parametrised cursor, and then (if the clients cannot access the parametrised cursor directly), writing an additional stored procedure that would just open the cursor and return it.

Prefer relational to procedural access

Procedures have a big advantage in familiarity with other programming environments, so they are easier to understand at first - but try to resist them. Using procedures just as dumb wrappers over SQL code does not pay off - and only brings maintenance problems. Stored procedures and functions are not really good for simplifying the API, hiding the underlying data model or implementing security restrictions - views are much better for that. However, this does not mean that procedures and functions don't have a place in the client access layer - but I strongly suggest preferring the relational approach when possible. You should do that because of a very selfish reason - your code will be much easier to maintain, so you will be able to spend time doing something smarter.

About the author

Gojko Adzic is an IT consultant specialised in designing and building high-throughput transaction processing systems and enterprise system integrations. His story so far includes equity and energy trading, mobile content delivery, e-commerce, online betting and complex configuration management. In his free time, Gojko maintains a blog about programming on gojko.net.

--

Footnotes


1 I've actually seen this once - profiler found that piece of code and politely warned us that the merging loop was consuming 90% of the processor - it took us quite some time to figure out what the author really wrote, but on the end, it turned out to be another fresh invention of the Cartesian product.

Comments

I must say that I do not share your opinion.
In my approach PL/SQL API is the only possible way to retrieve and manipulate data in Oracle Database. Especialy when it goes for Web Applications.

Logging user activity. Using triggers is fantasting to logging changes to data (Journaling), but usually is useless for logging user activity. User activity should be logged based on his/her actions and it’s parameters not data modifications. You do not want your batch processes to generate so many log.

Performance Monitoring. With well-written API you can get very detailed statistics for time consumed by individual routine. Based on this statistics You can decide where are your bottlenecks and you can improve that.

Simplicity. For easy, standard operations You should prepare generators that automatically generate API procedures for many simple tables.

Cursors. PL/SQL procedures can return REF CURSORS as an output. With dynamic SQL you can return different variations of data depending on input parameters.

API can be used for many Applications. In many big organizations one database is often accessed by different client applications. You do not have to write the same (similar) SQL’s in PHP, java or whatever other tool You want/plan to use.

Paweł Barut

gojko's picture

Different result data structure depending on input params is impossible without stored procs, agreed, and that is one very good application of procedures (though different structure typically means for me that it's a different report, so I would use a different view or procedure for it, but there are exceptions to the rule). For the rest, I don't really see much difference.

The logging depends on company policy, so I cannot really argue about your situation. However, excluding batch jobs from logging or auditing is not so hard even with autonomous transaction auditing triggers - just turn off logging depending on sys context or current user.

For tracing+tkprof and statspack, I don't see any difference between a view or a procedure. And views are, from my experience, easier to change in production than procedures.

Not having to write the same (similar) SQL’s in PHP, Java and others is not really a benefit over direct querying - calling a stored procedure requires about the same amount of code as selecting from a view or executing an update (sometimes even less, since reading a ref cursor output parameter may require more code than just fetching the query result). I was not arguing that client code should contain complex queries, and I am very much in favour of providing a standardised API to clients. But views can hide the complexity as good as pl/sql code, and can be used as the client API.

As for simplicity+generating standard API - insert,update,select and delete are pretty much a standard API which you do not have to generate at all.

Regards,

Gojko Adzic
http://www.gojko.com

Kevin Meade's picture

Though I love PL/SQL (so easy to use, so powerful, so well integrated with Oracle), I must say, the magic is in the views. I have been trying to get this idea across to my collegues for years. Some get it, some don't. Often enough, it takes something dramatic for them, a big time savings, solving a specific problem, I'm sure you know what I mean. But in the end, so satisfying when they get catch on.

Again you prove your experience level gojko, by demonstrating a perspective only had from seeing the bigger picture.

As for should we or shouldn't we have code in client apps, I like not having too much code in client apps. When its in the app, I can't tune it or change it, when its in the database, I can do so easily. The more simple the code in the app, the better.

I am especially impressed with your reference to Oracle Forms. How many times did I show people, use a view as the base table, less code in your app. Less code in a Forms app is a way good thing, no post-query triggers, not complex computations or edits in the form, its all taken care of, just show the data.

Ah well I talk to much and this is your show. Another good article, thanks. Kevin Meade

gojko's picture

Kevin - I completely agree with you on not having too much code in client apps. For me, the rule of thumb is to push it into the database if it breaks into more than 5 lines or if it needs an optimiser hint. Having the option to modify a view and speed up 300 instances of client apps in a second is a miracle that cannot be beaten by any software deployment tool or policy.

Gojko Adzic
http://www.gojko.com

Kevin Meade's picture

It was a financial reporting system. I found a better way to code a view that improved performance 100%, everything went at least twice as fast. So I tested the view change (a lot to be said about how views affect testing strategies in a positive way), and then implemented it.

On Monday the manager of the system had people all over the country wondering if there was something wrong with the system because it was running wicked fast (twice as fast). I guess people get used to having X time available to get a cup of coffee in the morning and get upset when it is suddenly gone. They thought their answers were wrong because they had come to expect that when their querys ran a lot faster than expected run times, it was likely because they were not getting all the right data.

He calls me "Hey Kev, did you guys do something to the system over the weekend?". Why yes, we put in a new set of views that reduce query resouce consumption for most of your app. Everything should be running about twice as fast. "Well it is... but I wish you would have told me!, I got people all over the country wanting to know what happened.".

I said I was sorry, because in reality I had made a mistake, I didn't go through the right release procedures and thus blindsided the people I was supposed to be supporting. But he was the consumate salesman, and put a profitable spin on it.

In the end I learned a valuable lesson I try not to repeat.

With regards to "How Deborah and Webber avoided overtime" I think the point made is that having the information for the 'web report' encapsulated in a view makes any updates i.e. additional columns/information very straightforward requiring no coding effort and thus very maintainable.

In additions its a a lot easier to tune a sql statement than a collection of stored procedures when the report suddenly starts taking 50 minutes to runs as opposed to 5 seconds !!!

Rubbish? That’s one strong word!

I think you just don’t get it.

Tell you what … there is an implicit assumption here that a data access interface between the database and the client application side is a “good thing”. So why don’t you list why exactly it is a good thing to have one vs. not having one at all. No need for Beavis and Butthead scenarios to make the list. Then judge the two approaches, view-based and procedure-based against that list.

If one of the goals is to separate the SQL code from the client code (java, c#, etc.) then a view-based approach does not event meet this one goal. There is, in practice, an almost limitless number of ways to write SQL against views V1, V2, … Vn … there is a finite number, by comparison, a relatively small number of ways of using procedures P1, P2, … Pn returning handles to cursors opened against ‘select from Vk’ SQL statements. In one case you do have something worthy of the terms “interface” or “API” … in the other case, with views, “layer” is just about most one can call it.

Once you’ve given access to the views, you have essentially relinquished and lost control of your SQL access … the client programmer can write just about anything against them: the good, the bad, the ugly … anything.

Procedures are not really good for hiding the complexity of the underlying data model - views are much better as they are a lot easier to maintain and use

The “maintenance” of database objects and “hiding the complexity of the underlying data model” are orthogonal issues. For the “use” part … true, a system of views, can hide some of the intricacies of the underlying data model [and so can the procedures … actually, even better because they can totally encapsulate the data model!], but a system of views is still a virtual data model (most of the times, a denormalized one). The client doesn’t escape the need to understand some sort of data model … you’ve only shifted a bit the goal posts … not understanding that [virtual] data model is akin to dealing with silos of data. Furthermore, to safely use a view, most of the time, one has to have knowledge of its actual definition.

Funny you should mention analytic functions and the use of rownum … a search for “views on views” on asktom should provide enough research material on some of the issues surrounding view merging and predicate pushing, etc.

One your quest to show that views are better for the task on hand you’ve essentially lost sight of why the task is needed in the first place. In fact you reach the wrong conclusion from the beginning:

Procedural access is often chosen for the wrong reasons - making maintenance significantly harder

Responding to an application change can be measured as harder/easier based on who does it. With the views, the SQL is in the realm of the client/application developer … with procedures, the SQL is in the realm of the database developer. Who do you think would find it easier to work on that SQL? And, as for quality, guess who’s better for the job?

But let me put this to you … imagine an application without any data access interface between the database and the client application (or just to eliminate that direct table access that’s so “criminal” … have simple views acting more or less like synonyms for the tables). Whether the client developer writes SQL against tables, views or synonyms … what‘s the difference? None, zilch, nada. What’s the difference for the database developer, DBA though? … could it be that the SQL is outside their realm embedded all over the place in a language code they are not best at?

Views have their places too ... but in the context you've described here, they hardly can do the proper job.

gojko's picture

Quote:

Tell you what … there is an implicit assumption here that a data access interface between the database and the client application side is a “good thing”. So why don’t you list why exactly it is a good thing to have one vs. not having one at all. No need for Beavis and Butthead scenarios to make the list. Then judge the two approaches, view-based and procedure-based against that list.

As with anything else, this will depend on the type of applications/systems. For small and simple projects, I guess that having a client access layer or just executing plain SQL on tables is the same. Most of my projects include various teams, and lots of different internal and external applications connecting to the same database. From that perspective, my list is (not in the order of importance):

1 - hiding complexity; giving client developers a narrow view (not necessarily DB view, but in abstract sense as representation, aspect) so they can be concerned only with relevant details, enabling each of the client application teams to focus on their problems
2 - division of work between db and client developers, enabling people to work in parallel above and below a specified client interface
3 - security: especially preventing unauthorised access and providing different levels of access for different client applications/roles
4 - encapsulating parts of system to isolate the impact of changes, and enable separate maintainance of different parts of the system (especially performance optimisations)
5 - encapsulating business rules for different applications that cannot share code (i.e. web, java and win32 applications)

on points 2 and 3, I consider procedural or SQL access more-less the same, though reusing views to build different levels of access to the same concepts is easier then duplicating code in procedures.

For hiding complexity, I prefer views to procedures (when it is possible to write a view) because a single view gives clients the option to access the same logical database concept in different ways, i.e. update it, read, sort or filter it. With procedures, different ways to access the same logical database concept would have to be implemented as different procedures, or as a 'one procedure to rule them all' which would be very complex and hard to maintain. Requiring that one procedure is used for retrieving a plain list, second for just the last 10 records, third for updating the price and fourth to get summary details does not reduce complexity, it increases it. Maintaining a monster catch-all procedure is much harder than maintaining a view.

For encapsulating and isolating the impact of changes, I find views much better. They are easier to optimise, and much easier to re-use (several layers of views can be built upon each other, to apply filtering, translations or additional data without code duplication).

For encapsulating business rules, procedures are usually better, as they can express multi-step operations.

Quote:
If one of the goals is to separate the SQL code from the client code (java, c#, etc.) then a view-based approach does not event meet this one goal.

I don't see this as an important goal, client code will typically have a layer that converts DB concepts into their platform concepts, be it from ref cursors into objects or from views into data sets or objects. Anyway, calling a procedure that returns a ref cursor is for me the same level of binding with database concepts as selecting from a view.

Quote:
Once you’ve given access to the views, you have essentially relinquished and lost control of your SQL access … the client programmer can write just about anything against them: the good, the bad, the ugly … anything.

I'm not sure if I understand your point, to me this sounds like allowing client developers to write SQL code will enable them to cause problems. First, I think that any complex SQL should be in the database, so client developers do not have to think about optimal execution plans or db tricks - but they should know how to select from a view or update it, as they should know how to call stored procedures. Some common sense, occasional code reviews and a bit of tracing will typically be enough to catch and correct errors, and that effort is, in my eyes, a good price to pay for time not spent on writing and maintaining additional database code.

On the other hand, if you are working with people who were going to kill system performance or fetch wrong data, restricting them to use only procedures will help a bit, but not too much - they will find another way to shoot themselves in the foot.

Quote:

times, a denormalized one). The client doesn’t escape the need to understand some sort of data model … you’ve only shifted a bit the goal posts … not understanding that [virtual] data model is akin to dealing with silos of data. Furthermore, to safely use a view, most of the time, one has to have knowledge of its actual definition.

From my experience, there is no important difference in this aspect between views and procedures. Knowing which columns a view contains is the same as knowing the structure of the returned ref cursor. So I prefer views as they are easier to maintain.

Quote:

Responding to an application change can be measured as harder/easier based on who does it. With the views, the SQL is in the realm of the client/application developer … with procedures, the SQL is in the realm of the database developer. Who do you think would find it easier to work on that SQL? And, as for quality, guess who’s better for the job?

Views are also the responsibility of database developers. I would not trust web programmers to develop views as I would not trust database developers to set up middleware messaging.

Quote:
could it be that the SQL is outside their realm embedded all over the place in a language code they are not best at?

I really don't think that having a simple select or update against a view in the client code is a big problem. Any complex code should be in the database anyway, either in procedures or views.

Gojko Adzic
http://www.gojko.com

This was a great article and the comment above could have been written back in the early 70s when that guy Codd was pushing something called relational databases.
SQL is just another type of code. It is as good a method of data abstraction as stored procedures and is not harder than procedural code just different, but it is optimised for handling lots of data.

If your tables have no more than 100 records (say a small shopping site) and don't run any reports then as it doesn't make any difference.
If you run any sort of volume of data using SQL to leverage the RDBMS engine is the way to go -for the reasons mentioned in the article.
Cheers

John

There are valid points to both sides of this discussion, and I'd like to throw a little gasoline on the fire by proposing a different approach. First of all, I'm assuming that the database team has created a standard, (mostly-)normalized, relational database design, and that the application team is using an OO language such as Java. The goal with this approach is to create an object data model that the application team can work with (perhaps it is derived from their object model, perhaps not...) and to implement it in Oracle as a set of objects and collections. By this I mean just the definitions (transient), not as persistent 'object tables'. Next, define a set of PL/SQL functions that accept these collections as parameters and return them as return types. Your application team should have no trouble with this approach, as long as they have a mechanism for pushing the data back and forth (such as Oracle's JDBC driver, which maps a collection of objects to an array of structs).

On the PL/SQL side, your database developers will need to decompose these objects and put the data into the normalized tables, but there are lots of tools within SQL and PL/SQL to make this easier, such as multi-table inserts, merge, cast..multiset, treat(), bulk-binds, the TABLE operator, the 'is of type' expression, pipelined functions (to avoid building large collections in memory), etc.

Where this gets really cool is when my object model utilizes inheritance (via the UNDER keyword). I can create a collection of a top-level object, and the collection can include instances of various objects that are derived from this top-level object. I can pass around a collection of 'customer' objects which includes both commercial and residential accounts, and each customer object can include a collection of 'product' objects, where each object in the collection is a product that the customer is associated with. This is also very useful for versioning my data-access layer, since I can sub-type my objects to add additional attributes, and I don't even have to re-compile my package specification.

Anyway, this approach has its own set of pros/cons, but I prefer this approach to the use of views or the use of 'standard' PL/SQL methods. Additionally, the object model may also be exposed as a set of 'object views', and you could define instead-of-triggers on these views which call function in your data-access layer, thus providing multiple ways to access the data using the same code base.