Difference between two rows

Kevin Meade's picture
AttachmentSize
Plain text icon column_diffs.txt5.88 KB
articles: 

One of my recent clients had a real data twisting process to validate. It didn't help much that their source had been corrected by many "ONE-SHOTS" over the years to deal with bugs, and accounting methods mandated by Act of Congress. What we needed was a way to see changes in a stream of related rows. In the end I created for them, a solution that allowed developers to pick two rows from a table at random and compare them in SQL with a result set returned showing only differences. Its a simple thing, but kind of neat, and very useful. So I'd like to share it with you.

The basic idea is as was stated, identify a table and two rows from that table, and feed these to a function, letting it tell you the differences. As you might have guessed, a sql statement is built on the fly and executed via EXECUTE IMMEDIATE, and an Oracle Collection type is used to return a result set. Comments in the function spell out how to use it and code is attached. So lets walk through a quick example.

First we build some data.

create table cust_h
(
 cust_id number not null
,row_eff_dt date not null
,row_exp_dt date not null
,cust_name varchar2(10) not null
,credit_limit number
,value_group number not null
)
/

alter table cust_h add primary key (cust_id,row_eff_dt)
/

insert into cust_h values (1001,trunc(sysdate-10),trunc(sysdate-8)-(1/(24*60*60)),'SMYTHE',0,0);
insert into cust_h values (1001,trunc(sysdate-8),trunc(sysdate-7)-(1/(24*60*60)),'SMYTH',0,0);
insert into cust_h values (1001,trunc(sysdate-7),trunc(sysdate-3)-(1/(24*60*60)),'SMYTH',10000,1);
insert into cust_h values (1001,trunc(sysdate-3),trunc(to_date('31-dec-4712'))-(1/(24*60*60)),'SMITH',20000,1);

commit
/

SQL> select * from cust_h order by 1,2
  2  /

   CUST_ID ROW_EFF_DT           ROW_EXP_DT           CUST_NAME  CREDIT_LIMIT VALUE_GROUP
---------- -------------------- -------------------- ---------- ------------ -----------
      1001 18-mar-2007 00:00:00 19-mar-2007 23:59:59 SMYTHE                0           0
      1001 20-mar-2007 00:00:00 20-mar-2007 23:59:59 SMYTH                 0           0
      1001 21-mar-2007 00:00:00 24-mar-2007 23:59:59 SMYTH             10000           1
      1001 25-mar-2007 00:00:00 30-dec-4712 23:59:59 SMITH             20000           1

4 rows selected.

We note the following about this data:

1) it is a customer credit file of some kind
2) it is historical in nature with contiguous non-overlapping date ranges (pretty standard)
3) a row is identified uniquely by CUST_ID,ROW_EFF_DT
4) the following basic events happened:
a) we created the customer
b) we corrected a name mis-spelling
c) we approved a credit line and assigned a starting risk level (value group)
d) we increased the credit line and corrected a second name mis-spelling

But because wide tables are not so easily deciphered for changes, it is handy to be able to see just the changes in some form. This is particularly true if we want to examine changes across a set of rows over time. So here is one possible way to do this.

SQL> break on cust_id on row_eff_dt skip 1
SQL> select a.cust_id,a.row_eff_dt
  2        ,b.*
  3  from (
  4        select lag(rowid) over (partition by cust_id order by row_eff_dt) a_rowid
  5              ,rowid b_rowid
  6              ,cust_h.*
  7        from cust_h
  8        where cust_id = 1001
  9       ) a
 10       ,table(cast(kev_utils.show_column_diffs 
 11                          (user,'CUST_H',a_rowid,b_rowid,'row_eff_dt,row_exp_dt')
 12                   as c_column_value_difference)) b
 13  order by 1,2
 14  /

   CUST_ID ROW_EFF_DT           COLUMN_NAME                    DATA_VALUE_PAIR
---------- -------------------- ------------------------------ --------------------------
      1001 18-mar-2007 00:00:00

           20-mar-2007 00:00:00 CUST_NAME                      SMYTHE/SMYTH

           21-mar-2007 00:00:00 CREDIT_LIMIT                   0/10000
                                VALUE_GROUP                    0/1

           25-mar-2007 00:00:00 CREDIT_LIMIT                   10000/20000
                                CUST_NAME                      SMYTH/SMITH


6 rows selected.

What is going on here:

1) we are using analytics to construct a stream of rows where each row has access to the next row's rowid.
2) we are calling a function with these two related rowids to get differences
3) we are converting the function's returned object to rows, and displaying them

There are thess points of interest to interpreting this data. First understand that all we did was feed the identity of two rows to a function and it compared the rows to find differences. Second, it was our query, not the function, that has constructed a set of input data using analytics such that each row is compared to its next row. Third, in this version of the report we chose to exclude the ROW_EFF_DT and ROW_EXP_DT columns because they would always be different and thus just constitute noise.

So this report shows the changes that occurred between rows. The report reads just like the steps we saw when we dumped the table. Here they are again so you can follow the report.

4) the following basic events happened:
a) we created the customer
b) we corrected a name mis-spelling
c) we approved a credit line and assigned a starting risk level (value group)
d) we increased the credit line and corrected a second name mis-spelling

How you use this is up to you. My testers like it because it helps them figure out how composite rows built using lots of complex logic, got constructed the way they did. This in turn helps them see anomalies in the data and possible errors in the row building processes we use. See attached for full code and restrictions. We used the overloaded function call that accepts rowids.

create or replace type o_column_value_difference is object (
                                                            column_name varchar2(30)
                                                           ,data_value_pair varchar2(4000)
                                                           )
/

create or replace type c_column_value_difference is table of o_column_value_difference
/

create or replace
package kev_utils
is

--
-- key_value_pair_p must look like this  emp_id=1:emp_ver_no=1|emp_id=1:emp_ver_no=2
-- the routine will change single quotes to double quotes for you
-- and add the WHERE and AND components to finish up a set of where clauses
-- please note we are using the pipe (|) and colon (:) as delimiters
-- so if you are using date formats make sure not to use either in your format string
-- it is expected that each expression yeilds one row thus you should be supplying primary or unique key data
-- maybe some day I'll make this into a generice two where clause deal
--
   function show_column_diffs (
                               owner_p in varchar2
                              ,table_name_p in varchar2
                              ,key_value_pair_p in varchar2
                              ,excluded_columns_list_p in varchar2
                              ) return c_column_value_difference;

   function show_column_diffs (
                               owner_p in varchar2
                              ,table_name_p in varchar2
                              ,a_rowid_p in rowid
                              ,b_rowid_p in rowid
                              ,excluded_columns_list_p in varchar2
                              ) return c_column_value_difference;

end;
/
show errors

Thanks, Kevin

Comments


Hi,

Thanks for a wonderul insight about how analytical functions can be used to a very great extent.

Thanks,
Priya.

Hi Kevin,

That was great and something out of the box ...really appreciated.

Regards
Baldeep singh

it doesn't seem to work.
I try to execute the given query but I get an error message ORA-00942 when it tries to execute the line

10 ,table(cast(kev_utils.show_column_diffs

saying that the table kev_utils.show_column_diffs does not exist.
Any idea about why does this occur?
Thank you very much.

Kevin Meade's picture

either
1) you do not have a user package kev_utils
2) the package kev_utils does not have the function show_column_diffs
3) you are executing from a user who does not have the proper privileges to see or execute the routine

Figure out which one it is and fix the problem.

Good luck, Kevin

I tried this and ended up with an error:

Error at line 10
ORA-04067: not executed, package body "MARKF.KEV_UTILS" does not exist
ORA-06508: PL/SQL: could not find program unit being called: "MARKF.KEV_UTILS"
ORA-06512: at line 1

I did figure out that is because only the package spec is given here not the package body.

Kevin Meade's picture

You need to download the attachment

Kevin Meade's picture

I believe that will only see and thus only be able to download the attachement, if you are logged into OraFAQ when you are viewing this article. Create and account if you do not already have one, log in and then review the article again. You will see the attachment after the article but before any replies.

Good luck, Kevin

Thanks Kevin, your solution will come handy for my use.

Hi Kevin,
I'm trying to create the package it seems like I have a problem with permissions.

The user I'm using to create the package can make a query on dba_tab_columns, but when I try to compile the package I get "PL/SQL: ORA-00942: table or view does not exist" on that line.

What am I missing??

Thanks in advance.

cardura.doc

I know now the reason it doesn't compile (found it on the forum).
The user needs to be granted a direct permision, permision to roles wouldn't work.

So my question now is, how do you make it work for your users?

Thanks again,

cardura.doc

Thanks Kevin!

Can you provide an example of how to split out the DATA_VALUE_PAIR into separate columns and/or to include additional columns in the display?

For my purposes, I would like to display the date, modified_by, old value and new value.

I can't seem to figure out how to change the procedure or the types in order to return the additional values.

Any assistance is greatly appreciated.

Thanks.
Mark

Kevin... Thanks for sharing. This ROCKS!

When customizing the above sample query for your own table, be sure to give your table name using upper case ('CUST_H', not 'cust_h').

I was seeing empty/no differences until I did that right.

Kevin, this code is great and exactly what I need but for some reason I cannot get the KEV_UTILS package to compile. It seems to not like the inner for loop with the r1 variable. I am desperate. Can you provide me with some assistance? Here's the error messages I am receiving.

PACKAGE BODY KEV_UTILS compiled
Errors: check compiler log
42/26 PL/SQL: ORA-00942: table or view does not exist
41/20 PL/SQL: SQL Statement ignored
51/16 PLS-00364: loop index variable 'R1' use is invalid
51/13 PL/SQL: Statement ignored
56/218 PLS-00364: loop index variable 'R1' use is invalid
56/13 PL/SQL: Statement ignored

First of all Kevin, this is a really nice tool to have. I have several problems that could be solved this way. However I get exactly the same errors as Aoliver. So, Aoliver, have you or any others in here found what can cause these messages?

Thank you very much in advance, Peter

Well, reading the thread more carefully I found out that the problem has to do with user/role rights regarding the table dba_tab_columns. The loop over i in the package contains a select from that specific table. Without having 100 insight in the where clause in that select I have know had success with compiling using this select query instead:

select column_name,data_type
from all_tab_cols
where table_name = table_name_p
and data_type in ('DATE','NUMBER','VARCHAR2','CHAR')
and column_id between (i-1)*10+1 and (i*10)
order by column_name

It cannot do exactly the same (if it does it is by pure luck) since I have removed some parts of the where clause that contained columns that does not exist in the all_tab_cols table.

I hope this can help those of you having trouble and maybe some one with more insight than me can see if there are any problems with this approach compared to Kevin's original.

Hi Kevin,

It is really nice article. Could you please provide an example of how to use this function with key value pairs (like emp_id=1:emp_ver_no=1|emp_id=1:emp_ver_no=2) instead of rowids? Can we call the second functions directly in that case?

I have to achieve something like , there is a daily dump of a table (daily 5K records approx) which have an extract date,modified-by and modified-date and key columns (constants) along with all the changable columns in a row. I need to compare each columns of the rows for the given fromdate and todate. I am facing a another challenge; some times my key columns (constants) also can be edited which makes it difficult to track.

Thanks
JJ.