Jonathan Lewis

Subscribe to Jonathan Lewis feed Jonathan Lewis
Just another Oracle weblog
Updated: 1 week 3 days ago

Scalar Subquery Costing

Thu, 2019-06-06 13:54

A question came up on Oracle-l list-server a few days ago about how Oracle calculates costs for a scalar subquery in the select list. The question included an example to explain the point of the question. I’ve reproduced the test below, with the output from an 18.3 test system. The numbers don’t match the numbers produced in the original posting but they are consistent with the general appearance.

rem
rem     Script:         ssq_costing.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem     Purpose:        
rem
rem     Last tested 
rem             18.3.0.0
rem             12.2.0.1
rem

create table t_1k ( n1 integer ) ;
create table t_100k ( n1 integer ) ;

insert into t_1k
  select
         level
    from dual
    connect by level <= 1e3;

insert into t_100k
  select level
    from dual
    connect by level <= 1e5;

commit ;

begin
  dbms_stats.gather_table_stats ( null, 'T_1K') ;
  dbms_stats.gather_table_stats ( null, 'T_100K') ;
end ;
/

explain plan for
select 
        /*+ qb_name(QB_MAIN) */
        (
        select /*+ qb_name(QB_SUBQ) */ count(*)
        from t_1k
        where t_1k.n1 = t_100k.n1
        )
from t_100k
;

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |   100K|   488K|  1533K  (2)| 00:01:00 |
|   1 |  SORT AGGREGATE    |        |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T_1K   |     1 |     4 |    17   (0)| 00:00:01 |
|   3 |  TABLE ACCESS FULL | T_100K |   100K|   488K|    36   (9)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T_1K"."N1"=:B1)

The key point to note is this – the scalar subquery has to execute 100,000 times because that’s the number of rows in the driving table. The cost for executing the scalar subquery once is 17 – so the total cost of the query should be 1,700,036 – not 1,533K (and for execution plans the K means x1000, not x1024). There’s always room for rounding errors, of course, but a check of the 10053 (CBO trace) file shows the numbers to be 17.216612 for the t_1k tablescan, 36.356072 for the t_100K tablescan, and 1533646.216412 for the whole query. So how is Oracle managing to get a cost that looks lower than it ought to be?

There’s plenty of scope for experimenting to see how the numbers change – and my first thought was simply to see what happens as you change the number of distinct values in the t_100K.n1 column. It would be rather tedious to go through the process of modifying the data a few hundred times to see what happens, so I took advantage of the get_column_stats() and set_column_stats() procedures in the dbms_stats package to create a PL/SQL loop that faked a number of different scenarios that lied about the actual table data.


delete from plan_table;
commit;

declare

        srec                    dbms_stats.statrec;
        n_array                 dbms_stats.numarray;

        m_distcnt               number;
        m_density               number;
        m_nullcnt               number;
        m_avgclen               number;


begin

        dbms_stats.get_column_stats(
                ownname         => user,
                tabname         => 't_100k',
                colname         => 'n1', 
                distcnt         => m_distcnt,
                density         => m_density,
                nullcnt         => m_nullcnt,
                srec            => srec,
                avgclen         => m_avgclen
        ); 

        for i in 1 .. 20 loop

                m_distcnt := 1000 * i;
                m_density := 1/m_distcnt;

                dbms_stats.set_column_stats(
                        ownname         => user,
                        tabname         => 't_100k',
                        colname         => 'n1', 
                        distcnt         => m_distcnt,
                        density         => m_density,
                        nullcnt         => m_nullcnt,
                        srec            => srec,
                        avgclen         => m_avgclen
                ); 


        execute immediate
        '
                explain plan set statement_id = ''' || m_distcnt || 
        '''
                for
                select
                        /*+ qb_name(QB_MAIN) */
                        (
                        select /*+ qb_name(QB_SUBQ) */ count(*)
                        from t_1k
                        where t_1k.n1 = t_100k.n1
                        )
                from t_100k
        ';
        
        end loop;       

end;
/

The code is straightforward. I’ve declared a few variables to hold the column stats from the t_100k.n1 column, called get_column stats(), then looped 20 times through a process that changes the number of distinct values (and corresponding density) recorded in the column stats, then used execute immediate to call “explain plan” for the original query.

You’ll notice I’ve given each plan a separate statement_id that corresponds to the num_distinct that generated the plan. In the code above I’ve changed the num_distinct from 1,000 to 20,000 in steps of 1,000.

Once the PL/SQL block ends I’ll have a plan table with 20 execution plans stored in it and, rather than reporting those plans with calls to dbms_xplan.display(), I’m going to be selective about which rows and columns I report.

select
        statement_id, 
        io_cost,
        io_cost - lag(io_cost,1) over (order by to_number(statement_id)) io_diff,
        cpu_cost,
        cpu_cost - lag(cpu_cost,1) over (order by to_number(statement_id)) cpu_diff,
        cost
from 
        plan_table
where 
        id = 0
order by 
        to_number(statement_id)
;

I’ve picked id = 0 (the top line of the plan) for each statement_id and I’ve reported the cost column, which is made up of the io_cost column plus a scaled down value of the cpu_cost column. I’ve also used the analytic lag() function to calculate how much the io_cost and cpu_cost changed from the previous statement_id. Here are my results from 18c:


STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
1000                                17033            1099838920                 17253
2000                                34033      17000 2182897480 1083058560      34470
3000                                51033      17000 3265956040 1083058560      51686
4000                                68033      17000 4349014600 1083058560      68903
5000                                85033      17000 5432073160 1083058560      86119
6000                               102033      17000 6515131720 1083058560     103336
7000                               119033      17000 7598190280 1083058560     120553
8000                               136033      17000 8681248840 1083058560     137769
9000                               153033      17000 9764307400 1083058560     154986
10000                              170033      17000 1.0847E+10 1083058560     172202
11000                              197670      27637 1.2608E+10 1760725019     200191
12000                              338341     140671 2.1570E+10 8962036084     342655
13000                              457370     119029 2.9153E+10 7583261303     463200
14000                              559395     102025 3.5653E+10 6499938259     566525
15000                              647816      88421 4.1287E+10 5633279824     656073
16000                              725185      77369 4.6216E+10 4929119846     734428
17000                              793452      68267 5.0565E+10 4349223394     803565
18000                              854133      60681 5.4431E+10 3865976350     865019
19000                              908427      54294 5.7890E+10 3459031472     920005
20000                              957292      48865 6.1003E+10 3113128324     969492

The first pattern that hits the eye is the constant change of 17,000 in the io_cost in the first few lines of the output. For “small” numbers of distinct values the (IO) cost of the query is (33 + 17 * num_distinct) – in other words, the arithmetic seems to assume that it will execute the query once for each value and then cache the results so that repeated executions for any given value will not be needed. This looks as if the optimizer is trying to match its arithmetic to the “scalar subquery caching” mechanism.

But things change somewhere between 10,000 and 11,000 distinct values. The point comes where adding one more distinct value causes a much bigger jump in cost than 17, and that’s because Oracle assumes it’s reached a point where there’s a value that it won’t have room for in the cache and will have to re-run the subquery multiple times for that value as it scans the rest of the table. Let’s find the exact break point where that happens.

Changing my PL/SQL loop so that we calculate m_distcnt as “19010 + i” this is the output from the final query:


-- m_distcnt := 10910 + i;

STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
10911                              185520            1.1834E+10                187887
10912                              185537         17 1.1835E+10    1083059     187904
10913                              185554         17 1.1836E+10    1083058     187921
10914                              185571         17 1.1837E+10    1083059     187938
10915                              185588         17 1.1838E+10    1083058     187956
10916                              185605         17 1.1839E+10    1083059     187973
10917                              185622         17 1.1841E+10    1083059     187990
10918                              185639         17 1.1842E+10    1083058     188007
10919                              185656         17 1.1843E+10    1083059     188025
10920                              185673         17 1.1844E+10    1083058     188042
10921                              185690         17 1.1845E+10    1083059     188059
10922                              185707         17 1.1846E+10    1083058     188076
10923                              185770         63 1.1850E+10    4027171     188140
10924                              185926        156 1.1860E+10    9914184     188298
10925                              186081        155 1.1870E+10    9912370     188455
10926                              186237        156 1.1880E+10    9910555     188613
10927                              186393        156 1.1890E+10    9908741     188770
10928                              186548        155 1.1900E+10    9906928     188928
10929                              186703        155 1.1909E+10    9905114     189085
10930                              186859        156 1.1919E+10    9903302     189243

If we have 10,922 distinct values in the column the optimizer calculates as if it will be able to cache them all; but if we have 10,923 distinct values the optimizer thinks that there’s going to be one value where it can’t cache the result and will have to run the subquery more than once.

Before looking at this in more detail let’s go to the other interesting point – when does the cost stop changing: we can see the cost increasing as the number of distinct values grows, we saw at the start that the cost didn’t seem to get as large as we expected, so there must be a point where it stops increasing before it “ought” to.

I’ll jump straight to the answer: here’s the output from the test when I start num_distinct off at slightly less than half the number of rows in the table:


 -- m_distcnt := (50000 - 10) + i;

STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
49991                             1514281            9.6488E+10               1533579
49992                             1514288          7 9.6489E+10     473357    1533586
49993                             1514296          8 9.6489E+10     473337    1533594
49994                             1514303          7 9.6490E+10     473319    1533601
49995                             1514311          8 9.6490E+10     473299    1533609
49996                             1514318          7 9.6491E+10     473281    1533616
49997                             1514325          7 9.6491E+10     473262    1533624
49998                             1514333          8 9.6492E+10     473243    1533631
49999                             1514340          7 9.6492E+10     473224    1533639
50000                             1514348          8 9.6493E+10     473205    1533646
50001                             1514348          0 9.6493E+10          0    1533646
50002                             1514348          0 9.6493E+10          0    1533646
50003                             1514348          0 9.6493E+10          0    1533646
50004                             1514348          0 9.6493E+10          0    1533646
50005                             1514348          0 9.6493E+10          0    1533646
50006                             1514348          0 9.6493E+10          0    1533646
50007                             1514348          0 9.6493E+10          0    1533646
50008                             1514348          0 9.6493E+10          0    1533646
50009                             1514348          0 9.6493E+10          0    1533646
50010                             1514348          0 9.6493E+10          0    1533646

The cost just stops changing when num_distinct = half the rows in the table.

Formulae

During the course of these experiments I had been exchanging email messages with Nenad Noveljic via the Oracle-L list-server (full monthly archive here) and he came up with the suggesion of a three-part formula that assumed a cache size and gave a cost of

  • “tablescan cost + num_distinct * subquery unit cost” for values of num_distinct up to the cache size;
  • then, for values of num_distinct greater than the cache_size and up to half the size of the table added a marginal cost representing the probability that some values would not be cached;
  • then for values of num_distinct greater than half the number of rows in the table reported the cost associated with num_distinct = half the number of rows in the table.

Hence:

  • for 1 <= num_distinct <= 10922, cost = (33 + num_distinct + 17)
  • for 10,923 <= num_distinct <= 50,000, cost = (33 + 10,922 * 17) + (1 – 10,922/num_distinct) * 100,000 * 17
  • for 50,000 <= num_distinct <= 100,000, cost = cost(50,000).

The middle line needs a little explanation: ( 1-10,922 / num_distinct ) is the probability that a value will not be in the cache; this has to be 100,000 to give the expected number of rows that will not be cached, and then multiplied by 17 as the cost of running the subquery for those rows.

The middle line can be re-arranged as 33 + 17 * (10,922 + (1 – 10,922/num_distinct) * 100,000)

Tweaking

At this point I could modify my code loop to report the calculated value for the cost and compare it with the actual cost to show you that the two values didn’t quite match. Instead I’ll jump forward a little bit to a correction that needs to be made to the formula above. It revolves around how Oracle determines the cache size. There’s a hidden parameter (which I mentioned in CBO Fundamentals) that controls scalar subquery caching. In the book I think I only referenced it in the context of subqueries in the “where” clause. The parameter is “_query_execution_cache_max_size” and has a default value of 131072 (power(2,7)) – so when I found that the initial formula didn’t quite work I made the following observation:

  • 131072 / 10922 = 12.00073
  • 131072 / 12 = 10922.666…

So I put 1092.66667 into the formula to see if that would improve things.

For the code change I added a variable m_cost to the PL/SQL block, and set it inside the loop as follows:

m_cost := round(33 + 17 * (10922.66667 + 100000 * (1 - (10922.66667 / m_distcnt))));

Then in the “execute immediate” I changed the “explain plan” line to read:

explain plan set statement_id = ''' || lpad(m_distcnt,7) || ' - ' || lpad(m_cost,8) ||

This allowed me to show the formula’s prediction of (IO)cost in final output, and here’s what I got for values of num_distinct in the region of 10,922:


STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
  10911 -   183901                 185520            1.1834E+10                187887
  10912 -   184057                 185537         17 1.1835E+10    1083059     187904
  10913 -   184212                 185554         17 1.1836E+10    1083058     187921
  10914 -   184368                 185571         17 1.1837E+10    1083059     187938
  10915 -   184524                 185588         17 1.1838E+10    1083058     187956
  10916 -   184680                 185605         17 1.1839E+10    1083059     187973
  10917 -   184836                 185622         17 1.1841E+10    1083059     187990
  10918 -   184992                 185639         17 1.1842E+10    1083058     188007
  10919 -   185147                 185656         17 1.1843E+10    1083059     188025
  10920 -   185303                 185673         17 1.1844E+10    1083058     188042
  10921 -   185459                 185690         17 1.1845E+10    1083059     188059
  10922 -   185615                 185707         17 1.1846E+10    1083058     188076
  10923 -   185770                 185770         63 1.1850E+10    4027171     188140
  10924 -   185926                 185926        156 1.1860E+10    9914184     188298
  10925 -   186081                 186081        155 1.1870E+10    9912370     188455
  10926 -   186237                 186237        156 1.1880E+10    9910555     188613
  10927 -   186393                 186393        156 1.1890E+10    9908741     188770
  10928 -   186548                 186548        155 1.1900E+10    9906928     188928
  10929 -   186703                 186703        155 1.1909E+10    9905114     189085
  10930 -   186859                 186859        156 1.1919E+10    9903302     189243

The formula is only supposed to work in the range 10923 – 50,000, so the first few results don’t match; but in the range 10,923 to 10,930 the match is exact. Then, in the region of 50,000 we get:


STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
  49991 -  1514281                1514281            9.6488E+10               1533579
  49992 -  1514288                1514288          7 9.6489E+10     473357    1533586
  49993 -  1514296                1514296          8 9.6489E+10     473337    1533594
  49994 -  1514303                1514303          7 9.6490E+10     473319    1533601
  49995 -  1514311                1514311          8 9.6490E+10     473299    1533609
  49996 -  1514318                1514318          7 9.6491E+10     473281    1533616
  49997 -  1514325                1514325          7 9.6491E+10     473262    1533624
  49998 -  1514333                1514333          8 9.6492E+10     473243    1533631
  49999 -  1514340                1514340          7 9.6492E+10     473224    1533639
  50000 -  1514348                1514348          8 9.6493E+10     473205    1533646
  50001 -  1514355                1514348          0 9.6493E+10          0    1533646
  50002 -  1514363                1514348          0 9.6493E+10          0    1533646
  50003 -  1514370                1514348          0 9.6493E+10          0    1533646
  50004 -  1514377                1514348          0 9.6493E+10          0    1533646
  50005 -  1514385                1514348          0 9.6493E+10          0    1533646
  50006 -  1514392                1514348          0 9.6493E+10          0    1533646
  50007 -  1514400                1514348          0 9.6493E+10          0    1533646
  50008 -  1514407                1514348          0 9.6493E+10          0    1533646
  50009 -  1514415                1514348          0 9.6493E+10          0    1533646
  50010 -  1514422                1514348          0 9.6493E+10          0    1533646

Again, the formula applies only in the range up to 50,000 (half the rows in the table) – and the match is perfect in that range.

Next steps

The work so far gives us some idea of the algorithm that the optimizer is using to derive a cost, but this is just one scenario and there are plenty of extra questions we might ask. What, as the most pressing one, is the significance of the number 12 in the calculation 131,072/12. From previous experience I guess that is was related to the length of the input and output values of the scalar subquery – as in “value X for n1 returns value Y for count(*)”.

To pursue this idea I recreated the data sets using varchar2(10) as the definition of n1 and lpad(rownum,10) as the value – the “breakpoint” dropped from 10,922 down to 5,461. Checking the arithmetic 131,072 / 5461 = 24.001456, then 131,072/24 = 5461.333… And that’s the number that made fhe formular work perfectly for the modified data set.

Then I set used set_column_stats() to hack the avg_col_,len of t_100K.n1 to 15 and the break point dropped to 4,096.  Again we do the two arithmetic steps: 131072/4096 = 32 (but then we don’t need to do the reverse step since the first result is integral).

Checking the original data set when n1 was a numeric the avg_col_len was 5, so we have three reference points:

  • Avg_col_len = 5. “Cache unit size” = 12
  • Avg_col_len = 11. Cache unit size = 24 (don’t forget the avg_col_len includes the length byte, so our padded varchar2(10) has a length of 11).
  • Avg_col_len = 15, Cache unit size = 32

There’s an obvious pattern here: “Cache unit size” = (2 x avg_col_len + 2).  Since I hadn’t been changing the t_1k.n1 column at the same time, that really does look like a deliberate factor of 2 (I’d thought intially that maybe the 12 was affected by the lengths of both columns in the predicate – but that doesn’t seem to be the case.)

The scientific method says I should now make a prediction based on my hypothesis – so I set the avg_col_len for t_100K.n1 to 23 and guessed that the break point would be at 2730 – and it was.  (131072 / (2 * 23 + 2) = 2730.6666…) .

The next question, of course, is “where does the “spare 2″ come from?” Trying to minimize the change in the code I modified my subquery to select sum(to_number(n1)) rather than count(*), then to avg(to_number(n1)) – remember I had changed n1 to a varchar2(10) that looked like a number left-padded with spaces. In every variant of the tests I’d done so far all I had to do to get an exact match between the basic formula and the optimizer’s cost calculation was to use “2 * avg_col_len + 22” as the cache unit size – and 22 is the nominal maximum length of an internally stored numeric column.

Bottom line: the cache unit size seems to be related to the input and output values, but I don’t know why there’s a factor of 2 applied to the input column length, and I don’t know why the length of count(*) is deemed to be 2 when other derived numeric outputs use have the more intuitive 22 for their length.

tl;dr

The total cost calculation for a scalar subquery in the select list is largely affected by:

  • a fixed cache size (131,072 bytes) possibly set by hidden parameter _query_execution_cache_max_size
  • the avg_col_len of the input (correlating) column(s) from the driving table
  • the nominal length of the output (select list) of the subquery

There is an unexplained factor of 2 used with the avg_col_len of the input, and a slightly surprising value of 2 if the output is simply count(*).

If the number N of distinct values for the driving column(s) is less than the number of possible cache entries the effect of the scalar subquery is to add N * estimated cost of executing the subquery once.  As the number of distinct values for the driving column(s) goes above the limit then the incremental effect of the subquery is based on the expected number of times an input value will not be cached. When the number of distinct values in the driving column(s) exceeds half the number of rows in the driving table the cost stops increasing – there is no obvious reason when the algorithm does this.

There are many more cases that I could investigate at this point – but I think this model is enough as an indication of general method. If you come across a variation where you actually need to work out how the optimizer derived a cost then this framework will probably be enough to get you started in the right direction.

 

So long …

Mon, 2019-06-03 08:09

… and thanks for any Douglas Adams references.

I decided on Friday that after more than 31 years using the Oracle software and more than 37 years of being self-employed in the computer industry – not to mention a further 8 years being an enthusiastic amateur – the time had finally come for a change of pace.  As from 1st June 2019 I am retired.

I have a couple of long term commitments that I’ll still be doing a little work for over the next few months, and I wouldn’t be surprised if someone occasionally tempts me into doing a few days solving interesting problems.  I’ll still be answering questions on the Oracle Developer Community forum and (if the email gets through) on the Oracle-L listserver , and I’ll still be writing blog notes about Oracle and submitting abstracts to conferences for a few years. Who knows, I may even get round to writing another book – after all, I’ve already downloaded Oracle 19 so I’ll still be experimenting, investigating, and trying to break things Some people retire to do the Times crossword or Sudoku, my retirement pastime is going to be playing with Oracle.

But now I’ve got 3 musical instruments to re-learn, a garden to re-engineer, apple trees to train, and a lot of rowing [U.S. “crew”] time to catch up on so that I can get fit enough to get back to playing squash (which might be an interesting experience now that my eyesight has been fixed). And for relaxation I’ll be sitting in the garden watching the birds, drinking fine wine and reading a good book.

Update (6th June 2019)

Thanks for all the appreciative comments to know. It’s nice to have a record of the fact that I have made a difference for so many people.

 

Ignoring Hints

Mon, 2019-06-03 02:36

One of the small changes (and, potentially big but temporary, threats) in 18.3 is the status of the “ignore hints” parameter. It ceases to be a hidden (underscore) parameter so you can now officially set parameter optimizer_ignore_hints to true in the parameter file, or at the system level, or at the session level. The threat, of course, it that some of your code may use the hidden version of the parameter (perhaps in an SQL_Patch as an opt_param() option rather than in its hint form) which no longer works after the upgrade.

But there’s more. The parameter (whether the old hidden version or the new revealed version) doesn’t make the optimizer ignore parallel() hints. But 18.3 now has a related parameter optimizer_ignore_parallel_hints to address this limitation. Here’s a quick demo – we start by creating a table and then running a query where the full tablescan is clearly the default strategy that the optimizer would take if we didn’t hint an indexed access path:

rem
rem     Script:         ignore_parallel_hints.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem 

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum,10)                  n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

create index t1_i1 on t1(id);

set serveroutput off

prompt  =============
prompt  Baseline test
prompt  =============

select
        /*+ index(t1) */
        n1, sum(id)
from
        t1
where
        id > 0
group by
        n1
order by
        n1
;

select * from table(dbms_xplan.display_cursor);


SQL_ID  gudnnk7j7q5bz, child number 0
-------------------------------------
select  /*+ index(t1) */  n1, sum(id) from  t1 where  id > 0 group by
n1 order by  n1

Plan hash value: 356059923

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |       |       |   198 (100)|          |
|   1 |  SORT GROUP BY                       |       |    10 |    70 |   198   (2)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    | 10000 | 70000 |   196   (1)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 | 10000 |       |    22   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">0)


Now we repeat the exercise with the version-specific “alter session” command below – and you should try each option with each version of Oracle if you want to do the complete test cycle – to see that the session will ignore hints and the plan will change (side note – using the underscore version  with 18.3 doesn’t raise an error, the statement is silently ignored):


alter session set "_optimizer_ignore_hints" = true;
alter session set "optimizer_ignore_hints" = true;

SQL_ID  gudnnk7j7q5bz, child number 1
-------------------------------------
select  /*+ index(t1) */  n1, sum(id) from  t1 where  id > 0 group by
n1 order by  n1

Plan hash value: 3946799371

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    27 (100)|          |
|   1 |  SORT GROUP BY     |      |    10 |    70 |    27  (12)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T1   | 10000 | 70000 |    25   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ID">0)

Then, of course, we have to test a query with a parallel() hint – or shared() hint, which is the internal equivalent you will notice occasionally in outlines or the “remote” statement for distributed execution plans – to show that we don’t yet ignore parallel queries – the plans following the code are from 18.3:


select
        /*+ parallel(t1 3) */
        n1, sum(id)
from
        t1
where
        id > 0
group by
        n1
order by
        n1
;

select * from table(dbms_xplan.display_cursor);

alter session set "optimizer_ignore_parallel_hints" = true;

select
        /*+ parallel(t1 3) */
        n1, sum(id)
from
        t1
where
        id > 0
group by
        n1
order by
        n1
;

select * from table(dbms_xplan.display_cursor);




SQL_ID  7jynurdtc48kv, child number 0
-------------------------------------
select  /*+ parallel(t1 3) */  n1, sum(id) from  t1 where  id > 0 group
by  n1 order by  n1

Plan hash value: 2919148568

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |       |       |    10 (100)|          |        |      |            |
|   1 |  PX COORDINATOR          |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |    10 |    70 |    10  (10)| 00:00:01 |  Q1,01 | P->S | QC (ORDER) |
|   3 |    SORT GROUP BY         |          |    10 |    70 |    10  (10)| 00:00:01 |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE           |          |    10 |    70 |    10  (10)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX SEND RANGE       | :TQ10000 |    10 |    70 |    10  (10)| 00:00:01 |  Q1,00 | P->P | RANGE      |
|   6 |       HASH GROUP BY      |          |    10 |    70 |    10  (10)| 00:00:01 |  Q1,00 | PCWP |            |
|   7 |        PX BLOCK ITERATOR |          | 10000 | 70000 |     9   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|*  8 |         TABLE ACCESS FULL| T1       | 10000 | 70000 |     9   (0)| 00:00:01 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   8 - access(:Z>=:Z AND :Z<=:Z) -- > comment added to avoid wordpress format issue
       filter("ID">0)

Note
-----
   - Degree of Parallelism is 3 because of table property


Session altered.


SQL_ID  7jynurdtc48kv, child number 1
-------------------------------------
select  /*+ parallel(t1 3) */  n1, sum(id) from  t1 where  id > 0 group
by  n1 order by  n1

Plan hash value: 3946799371

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    27 (100)|          |
|   1 |  SORT GROUP BY     |      |    10 |    70 |    27  (12)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T1   | 10000 | 70000 |    25   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ID">0)


tl;dr

In 18.3 Oracle exposes the parameter optimizer_ignore_hints – any code using depending on the hidden version of this parameter will no longer behave as expected. 18.3 also introduces optimiser_ignore_parallel_hints to allow you to ignore parallel hints as well.

 

Timestamp Oddity

Wed, 2019-05-29 12:17

[Editorial note: this is something I started writing in 2013, managed to complete in 2017, and still failed to publish. It should have been a follow-on to another posting on the oddities of timestamp manipulation.]

Just as national language support used to be, timestamps and time-related columns are still a bit of a puzzle to the Oracle world – so much so that OEM could cripple a system if it was allowed to do the check for “failed logins over the last 30 minutes”. And, just like NLS, it’s one of those things that you use so rarely that you keep forgetting what went wrong the last time you used it. Here’s one little oddity that I reminded myself about recently:

rem
rem     Script:         timestamp_anomaly.sql
rem     Author:         Jonathan Lewis
rem     Dated:          April 2013
rem
create table t1 (
        ts_tz   timestamp(9) with time zone,
        ts_ltz  timestamp(9) with local time zone
);

insert into t1 values(systimestamp, systimestamp);
commit;

alter table t1 add constraint ts_ltz_uk unique (ts_ltz);
alter table t1 add constraint ts_tz_uk  unique (ts_tz);


Nothing terribly difficult – just a table with two variants on the timestamp data type and a unique constraint on both: except for one problem. Watch what happens as I create the unique constraints:

SQL> alter table t1 add constraint ts_ltz_uk unique (ts_ltz);

Table altered.

SQL> alter table t1 add constraint ts_tz_uk  unique (ts_tz);
alter table t1 add constraint ts_tz_uk  unique (ts_tz)
                                        *
ERROR at line 1:
ORA-02329: column of datatype TIME/TIMESTAMP WITH TIME ZONE cannot be unique or a primary key

Slightly unexpected – unless you’ve memorized the manuals, of course, which I hadn’t. I wonder if you can create a unique index on timestamp with time zone:


SQL> create unique index ts_tz_uk on t1(ts_tz);

Index created.

You can’t have a unique constraint, but you CAN create a unique index! How curious – did that really happen ?

SQL> select index_name, column_name from user_ind_columns where table_name = 'T1';

INDEX_NAME           COLUMN_NAME
-------------------- --------------------
TS_LTZ_UK            TS_LTZ
TS_TZ_UK             SYS_NC00003$

The index is on a column called SYS_NC00003$ – which looks suspiciously like one of those “function-based-index” things:


SQL> select * from user_ind_expressions where table_name = 'T1';

INDEX_NAME           TABLE_NAME           COLUMN_EXPRESSION                        COLUMN_POSITION
-------------------- -------------------- ---------------------------------------- ---------------
TS_TZ_UK             T1                   SYS_EXTRACT_UTC("TS_TZ")                               1

Oracle has silently invoked the sys_extract_utc() function on our (free-floating) timestamp column to normalize it to UTC. This is really not very friendly but it does make sense, of course – it would be rather expensive to enforce uniqueness if there were (at least) 24 different ways of storing the same absolute value – and 24 is a conservative estimate.

 

 

Re-partitioning 2

Mon, 2019-05-27 14:20

Last week I wrote a note about turning a range-partitioned table into a range/list composite partitioned table using features included in 12.2 of Oracle. But my example was really just an outline of the method and bypassed a number of the little extra problems you’re likely to see in a real-world system, so in this note I’m going to bring in an issue that you might run into – and which I’ve seen appearing a number of times: ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION.

It’s often the case that a system has a partitioned table that’s been around for a long time, and over its lifetime it may have had (real or virtual) columns added, made inivisble, dropped, or mark unused. As a result you may find that the apparent definition of the table is not the same as the real definition of the table – and that’s why Oracle has given us (in 12c) the option to “create table for exchange”.

You might like to read a MoS note giving you one example of a problem with creating an exchange table prior to this new feature. ORA-14097 At Exchange Partition After Adding Column With Default Value (Doc ID 1334763.1) I’ve created a little model by cloning the code from that note.


rem
rem     Script:         pt_exchange_problem.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem

create table mtab (pcol number)
partition by list (pcol) (
        partition p1 values (1),
        partition p2 values (2)
);

alter table mtab add col2 number default 0 not null;

prompt  ========================================
prompt  Traditional creation method => ORA-14097
prompt  ========================================

create table mtab_p2 as select * from mtab where 1=0;
alter table mtab exchange partition P2 with table mtab_p2;

prompt  ===================
prompt  Create for exchange
prompt  ===================

drop table mtab_p2 purge;
create table mtab_p2 for exchange with table mtab;
alter table mtab exchange partition P2 with table mtab_p2;

[/sourcecode}


Here's the output from running this on an instance of 18.3


Table created.

Table altered.

========================================
Traditional creation method => ORA-14097
========================================

Table created.

alter table mtab exchange partition P2 with table mtab_p2
*
ERROR at line 1:
ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION

===================
Create for exchange
===================

Table dropped.


Table created.


Table altered.

So we don’t have to worry about problems creating an exchange table in Oracle 12c or later. But we do still have a problem if we’re trying to convert our range-partitioned table into a range/list composite partitioned table by doing using the “double-exchange” method. In my simple example I used a “create table” statement to create an empty table that we could exchange into; but without another special version of a “create table” command I won’t be able to create a composite partitioned table that is compatible with the simple table that I want to use as my intermediate table.

Here’s the solution to that problem – first in a thumbnail sketch:

  • create a table for exchange (call it table C)
  • alter table C modify to change it to a composite partitioned table with one subpartition per partition
  • create a table for exchange (call it table E)
  • Use table E to exchange partitions from the original table to the (now-partitioned) table C
  • Split each partition of table C into the specific subpartitions required

And now some code to work through the details – first the code to create and populate the partitioned table.


rem
rem     Script:         pt_comp_from_pt_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem

drop table t purge;
drop table pt_range purge;
drop table pt_range_list purge;

-- @@setup

create table pt_range (
        id              number(8,0)     not null,
        grp             varchar2(1)     not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range(id) (
        partition p200 values less than (200),
        partition p400 values less than (400),
        partition p600 values less than (600)
)
;

insert into pt_range
select
        rownum-1,
        mod(rownum,2),
        lpad(rownum,10,'0'),
        rpad('x',100,'x')
from
        all_objects
where
	rownum <= 600 -- > comment to avoid WordPress format issue
;

commit;

Then some code to create the beginnings of the target composite partitioned table. We create a simple heap table “for exchange”, then modify it to be a composite partitioned table with a named starting partition and high_value and a template defining a single subpartition then, as a variant on the example from last week, specifying interval partitioning.


prompt	==========================================
prompt	First nice feature - "create for exchange"
prompt	==========================================

create table pt_range_list for exchange with table pt_range;

prompt	============================================
prompt	Now alter the table to composite partitioned
prompt	============================================

alter table pt_range_list modify
partition by range(id) interval (200)
subpartition by list (grp) 
subpartition template (
        subpartition p_def      values(default)
)
(
	partition p200 values less than (200)
)
;

If you want to do the conversion from range partitioning to interval partitioning you will have to check very carefully that your original table will be able to convert safely – which means you’ll need to check that the “high_value” values for the partitions are properly spaced to match the interval you’ve defined and (as a special requirement for the conversion) there are no omissions from the current list of high values. If your original table doesn’t match these requirement exactly you may end up trying to exchange data into a partition where it doesn’t belong; for example, if my original table had partitions with high value of 200, 600, 800 then there may be values in the 200-399 range currently stored in the original “600” range partition which shouldn’t go into the new “600” interval partition. You may find you have to split (and/or merge) a few partitions in your range-partitioned table before you can do the main conversion.

Now we create create the table that we’ll actually use for the exchange and go through each exchange in turn. Because I’ve got an explicitly named starting partition the first exchange takes only two steps – exchange out, exchange in. But because I’m using interval partitioning in the composite partitioned table I’m doing a “lock partition” before the second exchange on all the other partitions as this will bring the required target partition into existence. I’m also using the “[sub]partition for()” syntax to identify the pairs of [sub]partitions – this isn’t necessary for the original range-partitioned table, of course, but it’s the only way I can identify the generated subpartitions that will appear in the composite partitioned table.


create table t for exchange with table pt_range;

prompt	=======================================================================
prompt	Double exchange to move a partition to become a composite subpartition
prompt	Could drive this programatically by picking one row from each partition
prompt	=======================================================================

alter table pt_range exchange partition p200 with table t;
alter table pt_range_list exchange subpartition p200_p_def with table t;

alter table pt_range exchange partition for (399) with table t;
lock  table pt_range_list partition for (399) in exclusive mode;
alter table pt_range_list exchange subpartition for (399,'0') with table t;

alter table pt_range exchange partition for (599) with table t;
lock  table pt_range_list partition for (599) in exclusive mode;
alter table pt_range_list exchange subpartition for (599,'0') with table t;

prompt	=====================================
prompt	Show that we've got the data in place
prompt	=====================================

execute dbms_stats.gather_table_stats(user,'pt_range_list',granularity=>'ALL')

break on partition_name skip 1

select  partition_name, subpartition_name, num_rows 
from    user_tab_subpartitions 
where   table_name = 'PT_RANGE_LIST'
order by
        partition_name, subpartition_name
;

Now that the data is in the target table we can split each default subpartition into the four subpartitions that we want for each partition. To cater for the future, though, I’ve first modified the subpartition template so that each new partition will have four subpartitions (though the naming convention won’t be applied, of course, Oracle will generate system name for all new partitions and subpartitions).


prompt  ================================================
prompt  Change the subpartition template to what we want
prompt  ================================================

alter table pt_range_list
set subpartition template(
        subpartition p_0 values (0),
        subpartition p_1 values (1),
        subpartition p_2 values (2),
        subpartition p_def values (default)
)
;

prompt  ====================================================
prompt  Second nice feature - multiple splits in one command
prompt  Again, first split is fixed name.
prompt  We could do this online after allowing the users in
prompt  ====================================================

alter table pt_range_list split subpartition p200_p_def
        into (
                subpartition p200_p_0 values(0),
                subpartition p200_p_1 values(1),
                subpartition p200_p_2 values(2),
                subpartition p200_p_def
        )
;

alter table pt_range_list split subpartition for (399,'0')
        into (
                subpartition p400_p_0 values(0),
                subpartition p400_p_1 values(1),
                subpartition p400_p_2 values(2),
                subpartition p400_p_def
        )
;

alter table pt_range_list split subpartition for (599,'0')
        into (
                subpartition p600_p_0 values(0),
                subpartition p600_p_1 values(1),
                subpartition p600_p_2 values(2),
                subpartition p600_p_def
        )
;

Finally a little demonstration that we can’t add an explicitly named partition to the interval partitioned table; then we insert a row to generate the partition and show that it has 4 subpartitions.

Finishing off we rename everything (though that’s a fairly pointless exercise).


prompt  ==============================================================
prompt  Could try adding a partition to show it uses the new template
prompt  But that's not allowed for interval partitions: "ORA-14760:"
prompt  ADD PARTITION is not permitted on Interval partitioned objects
prompt  So insert a value that would go into the next (800) partition
prompt  ==============================================================

alter table pt_range_list add partition p800 values less than (800);

insert into pt_range_list (
        id, grp, small_vc, padding
)
values ( 
        799, '0', lpad(799,10,'0'), rpad('x',100,'x')
)
;

commit;

prompt  ===================================================
prompt  Template naming is not used for the subpartitions,
prompt  so we have to use the "subpartition for()" strategy 
prompt  ===================================================

alter table pt_range_list rename subpartition for (799,'0') to p800_p_0;
alter table pt_range_list rename subpartition for (799,'1') to p800_p_1;
alter table pt_range_list rename subpartition for (799,'2') to p800_p_2;
alter table pt_range_list rename subpartition for (799,'3') to p800_p_def;

prompt  ==============================================
prompt  Might as well clean up the partition names too
prompt  ==============================================

alter table pt_range_list rename partition for (399) to p400;
alter table pt_range_list rename partition for (599) to p600;
alter table pt_range_list rename partition for (799) to p800;

prompt  =======================================
prompt  Finish off by listing the subpartitions 
prompt  =======================================

execute dbms_stats.gather_table_stats(user,'pt_range_list',granularity=>'ALL')

select  partition_name, subpartition_name, num_rows 
from    user_tab_subpartitions 
where   table_name = 'PT_RANGE_LIST'
order by
        partition_name, subpartition_name
;

It’s worth pointing out that you could do the exchanges (and the splitting and renaming at the same time) through some sort of simple PL/SQL loop – looping through the named partitions in the original table and using a row from the first exchange to drive the lock and second exchange (and splitting and renaming). For exanple something like the following which doesn’t have any of the error-trapping and defensive mechanisms you’d want to use on a production system:



declare
        m_pt_val number;
begin
        for r in (select partition_name from user_tab_partitions where table_name = 'PT_RANGE' order by partition_position) 
        loop
                execute immediate
                        'alter table pt_range exchange partition ' || r.partition_name ||
                        ' with table t';
        
                select id into m_pt_val from t where rownum = 1;
        
                execute immediate 
                        'lock table pt_range_list partition for (' || m_pt_val || ') in exclusive mode';
        
                execute immediate
                        'alter table pt_range_list exchange subpartition  for (' || m_pt_val || ',0)' ||
                        ' with table t';
        
        end loop;
end;
/

If you do go for a programmed loop you have to be really careful to consider what could go wrong at each step of the loop and how your program is going to report (and possibly attempt to recover) the situation. This is definitely a case where you don’t want code with “when others then null” appearing anywhere, and don’t be tempted to include code to truncate the exchange table.

 

Re-partitioning – 18

Fri, 2019-05-24 06:50

In yesterday’s note on the options for converting a range-partioned table into a composite range/list parititioned table I mentioned that you could do this online with a single command in 18c, so here’s some demonstration code to demonstrate that claim:


rem
rem     Script:         pt_comp_from_pt_18.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem

create table pt_range (
        id              number(8,0)     not null,
        grp             varchar2(1)     not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range(id) (
        partition p200 values less than (200),
        partition p400 values less than (400),
        partition p600 values less than (600)
)
;

insert into pt_range
select
        rownum - 1,
        mod(rownum,2),
        lpad(rownum,10,'0'),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 600 -- > comment to avoid WordPress format issue
;

commit;

alter table pt_range modify
partition by range(id) interval (200)
subpartition by list (grp) 
subpartition template (
        subpartition p_0 values (0),
        subpartition p_1 values (1),
        subpartition p_2 values (2),
        subpartition p_def values (default)
)
(
        partition p200 values less than (200)
)
-- online
;

execute dbms_stats.gather_table_stats(null, 'pt_range', granularity=>'all')

break on partition_name skip 1

select  partition_name, subpartition_name, num_rows 
from    user_tab_subpartitions 
where   table_name = 'PT_RANGE'
order by
        partition_name, subpartition_name
;

Run this (with or without the online option) and you’ll (probably) see the Oracle error “ORA-00604: error occurred at recursive SQL level 1” with one of two underlying errors:

    ORA-01950: no privileges on tablespace 'SYSTEM'
    ORA-01536: space quota exceeded for tablespace 'SYSTEM'

So what’s gone wrong – it ought to work.

After enabling a 10046 trace I repeated the “alter table” command then scanned the trace file for the text “err=1950” (that being the error I’d received on my first attempt) and scanned backwards for the “PARSING IN CURSOR” line with a matching cursor id:


ERROR #139721552722200:err=1950 tim=26541227462

PARSING IN CURSOR #139721552722200 len=182 dep=1 uid=104 oct=1 lid=0 tim=26541224560 hv=2451601965 ad='7f1377267890' sqlid='0wsjfgk920yjd'
create table  "TEST_USER"."SYS_RMTAB$$_H124028"  ( src_rowid rowid not null , tgt_rowid rowid not null) 
    segment creation immediate nologging 
    tablespace  "SYSTEM"  
    rowid_mapping_table
END OF STMT

The code is trying to create a “rowid_mapping_table” in the system tablespace and I have no quota for the tablespace. (The 124028 in the table name relates to the object_id of the table I was trying to modify, by the way.)

The source of the error offered a big clue about a possible workaround sp I gave myself a quota (unlimited) on the system tablespace (alter user test_user quota unlimited on system) and that made it possible for the restructuring to take place. It’s not really an appropriate workaround for a production system though – especially if you’re using the online option and the table is subject to a lot of change.  (Note – this “rowid_mapping_table” and a “journal” table are created even if you haven’t selected the online option.)

Footnotes
  • The problem has been fixed in 19c (tested on LiveSQL)  and is listed on MoS as Bug 27580976 : INTERNAL RECURSIVE MAPPING TABLE MISTAKENLY PLACE IN SYSTEM FOR ONLINE OPS. There are no patches for 18c at present.
  • After I’d finished testing the quota workaround I tried to deprive myself of the quota on the system tablespace. I may have missed something in the manuals but it looks like the only way to do this is to give myself a zero quota (or, as I have done occasionally in the past, drop user cascade) because there’s no option for “quota denied” or “revoke quota” . This is why you may get one of two different messages after the ORA-00604. If you’ve never had a quota on the system tablespace you’ll get the “ORA-1950: no privileges” message, if you’ve had a quota at some time in the pasat and then had it set to zero’ you’ll get the “ORA-01536: space quota exceeded” message.

 

Re-partitioning

Thu, 2019-05-23 05:45

I wrote a short note a little while ago demonstrating how flexible Oracle 12.2 can be about physically rebuilding a table online to introduce or change the partitioning while discarding data, and so on.  But what do you do (as a recent question on ODC asked) if you want to upgrade a customer’s database to meet the requirements of a new release of your application by changing a partitioned table into a composite partitioned table and don’t have enough room to do an online rebuild. Which could require two copies of the data to exist at the same time.)

If you’ve got the down time (and not necessarily a lot is needed) you can fall back on “traditional methods” with some 12c enhancements. Let’s start with a range partitioned table:


rem
rem     Script:         pt_comp_from_pt.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem

create table pt_range (
        id              number(8,0)     not null,
        grp             varchar2(1)     not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range(id) (
        partition p200 values less than (200),
        partition p400 values less than (400),
        partition p600 values less than (600)
)
;

insert into pt_range
select
        rownum-1,
        mod(rownum,2),
        lpad(rownum,10,'0'),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 600
;

commit;

So we’ve got a range-partitioned table with three partitions and some data in each partition. Let’s pretend we want to change this to range/list with the grp column as the subpartition key, allowing explicit use of values 0,1,2 and a bucket subpartition for anything else. First we create an empty version of the table with a suitable subpartition template, and a simple heap table to be used as an exchange medium:


create table pt_range_list (
        id              number(8,0)     not null,
        grp             varchar2(1)     not null,
        small_vc        varchar2(10),
        padding         varchar2(100)
)
partition by range(id)
subpartition by list (grp)
subpartition template (
        subpartition p_def      values(default)
)
(
        partition p200 values less than (200),
        partition p400 values less than (400),
        partition p600 values less than (600)
)
;

prompt  ===============================================
prompt  First nice 12.2 feature - "create for exchange"
prompt  ===============================================

create table t for exchange with table pt_range;

You’ll notice that our subpartition template identifies just a single subpartition that takes default values – i.e. anything for which no explicit subpartition has been identified. This means we have a one to one correspondance between the data segments of the original table and the copy table. So now we go through a tedious loop (which we could code up with a PL/SQL “execute immediate” approach) to do a double-exchange for each partition in turn. (Any PL/SQL code is left as an exercise to the interested reader.)


alter table pt_range exchange partition p200 with table t;
alter table pt_range_list exchange subpartition p200_p_def with table t;

alter table pt_range exchange partition p400 with table t;
alter table pt_range_list exchange subpartition p400_p_def with table t;

alter table pt_range exchange partition p600 with table t;
alter table pt_range_list exchange subpartition p600_p_def with table t;

prompt  =====================================
prompt  Show that we've got the data in place
prompt  =====================================

execute dbms_stats.gather_table_stats(user,'pt_range_list',granularity=>'ALL')

break on partition_name skip 1

select  partition_name, subpartition_name, num_rows
from    user_tab_subpartitions
where   table_name = 'PT_RANGE_LIST'
order by
        partition_name, subpartition_name
;


PARTITION_NAME         SUBPARTITION_NAME        NUM_ROWS
---------------------- ---------------------- ----------
P200                   P200_P_DEF                    200

P400                   P400_P_DEF                    200

P600                   P600_P_DEF                    200


3 rows selected.

We now have to split the newly arrived subpartitions into the 4 pieces we want – but before we do that let’s make sure that any new partitions automatically have the correct subpartitions by changing the subpartition template:


alter table pt_range_list
set subpartition template(
        subpartition p_0 values (0),
        subpartition p_1 values (1),
        subpartition p_2 values (2),
        subpartition p_def values (default)
)
;

prompt  =========================================================
prompt  Second nice 12.2 feature - multiple splits in one command
prompt  We could do this online after allowing the users back on.
prompt  =========================================================

alter table pt_range_list split subpartition p200_p_def
        into (
                subpartition p200_p_0 values(0),
                subpartition p200_p_1 values(1),
                subpartition p200_p_2 values(2),
                subpartition p200_p_def
        )
;

alter table pt_range_list split subpartition p400_p_def
        into (
                subpartition p400_p_0 values(0),
                subpartition p400_p_1 values(1),
                subpartition p400_p_2 values(2),
                subpartition p400_p_def
        )
;

alter table pt_range_list split subpartition p600_p_def
        into (
                subpartition p600_p_0 values(0),
                subpartition p600_p_1 values(1),
                subpartition p600_p_2 values(2),
                subpartition p600_p_def
        )
;

Now, just to check that everything is behaving, let’s add a new partition, and check to see what partitions and subpartitions we end up with:


alter table pt_range_list add partition p800 values less than (800);

execute dbms_stats.gather_table_stats(user,'pt_range_list',granularity=>'ALL')

select  partition_name, subpartition_name, num_rows
from    user_tab_subpartitions
where   table_name = 'PT_RANGE_LIST'
order by
        partition_name, subpartition_name
;

PARTITION_NAME         SUBPARTITION_NAME        NUM_ROWS
---------------------- ---------------------- ----------
P200                   P200_P_0                      100
                       P200_P_1                      100
                       P200_P_2                        0
                       P200_P_DEF                      0

P400                   P400_P_0                      100
                       P400_P_1                      100
                       P400_P_2                        0
                       P400_P_DEF                      0

P600                   P600_P_0                      100
                       P600_P_1                      100
                       P600_P_2                        0
                       P600_P_DEF                      0

P800                   P800_P_0                        0
                       P800_P_1                        0
                       P800_P_2                        0
                       P800_P_DEF                      0


16 rows selected.

And as a final note – if we decide we want to put it all back we could merge four subpartitions down to one subpartition with a single command – then loop through every partition in turn:


alter table pt_range_list
        merge subpartitions  p200_p_0, p200_p_1, p200_p_2, p200_p_def
        into  subpartition  p200_p_def
;

And now I feel like I’m turning into Tim Hall – writing potentially useful demonstrations instead of trying to baffle people with rocket science. But I hope to get over that quite soon. Of course I have left out some important “real-world” details – particularly how you choose to handle indexes while doing the double-exchange. My view would be to take the local indexes with you on the exchange, bypass the global indexes on the exchange out, and be choosy about which global indexes to maintain on the exchange back in; but it all depends on how much downtime you have, how many indexes there are, and the state they’re likely to start or end in.

As ever it’s possible to start with a simple idea like this, then discover there are real-world complications that have to be dealt with. So there’s another article in the pipeline to handle a slightly more complex case. I’ll also be publishing a short note about the easy way of getting the job done from 18c onwards – if you’ve got the spare resources.

 

Danger – Hints

Wed, 2019-05-22 08:56

It shouldn’t be possible to get the wrong results by using a hint – but hints are dangerous and the threat may be there if you don’t know exactly what a hint is supposed to do (and don’t check very carefully what has happened when you’ve used one that you’re not familiar with).

This post was inspired by a blog note from Connor McDonald titled “Being Generous to the Optimizer”. In his note Connor gives an example where the use of “flexible” SQL results in an execution plan that is always expensive to run when a more complex version of the query could produce a “conditional” plan which could be efficient some of the time and would be expensive only when there was no alternative. In his example he rewrote the first query below to produce the second query:


select data
from   address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
;

select  data
from    address
where   ( :choice = 1 and street = :val )
union all
select  data
from    address
where   ( :choice = 2 and suburb = :val );

(We are assuming that bind variable :choice is constrained to be 1 or 2 and no other value.)

In its initial form the optimizer had to choose a tablescan for the query, in its final form the query can select which half of a UNION ALL plan to execute because the optimizer inserts a pair of FILTER operations that check the actual value of :choice at run-time.

When I started reading the example my first thought was to wonder why the optimizer hadn’t simply used “OR-expansion” (or concatenation if you’re running an older version), then I remembered that by the time the optimizer really gets going it has forgotten that “:choice” is the same bind variable in both cases, so doesn’t realise that it would use only one of two possible predicates. However, that doesn’t mean you can’t tell the optimizer to use concatenation. Here’s a model – modified slightly from Connor’s original:


drop table address purge;
create table address ( street int, suburb int, post_code int,  data char(100));

insert into address
select mod(rownum,1e4), mod(rownum,10), mod(rownum,1e2), rownum
from dual connect by level  <= 1e5 -- > comment to avoid WordPress format issue
;

commit;

exec dbms_stats.gather_table_stats('','ADDRESS')

create index ix1 on address ( street );
create index ix2 on address ( suburb );
create index ix3 on address ( post_code );

variable val number = 6
variable choice number = 1

alter session set statistics_level = all;
set serveroutput off
set linesize 180
set pagesize 60

select
        /*+ or_expand(@sel$1) */
        count(data)
from    address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline'));

I’ve added one more column to the table and indexed it – I’ll explain why later. I’ve also modified the query to show the output but restricted the result set to a count of the data column rather than a (long) list of rows.

Here’s the execution plan output when hinted:


SQL_ID  6zsh2w6d9mddy, child number 0
-------------------------------------
select  /*+ or_expand(@sel$1) */  count(data) from    address where  (
:choice = 1 and street = :val ) or     ( :choice = 2 and suburb = :val )

Plan hash value: 3986461375

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |      12 |     27 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |      12 |     27 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  10010 |     10 |00:00:00.01 |      12 |     27 |
|   3 |    UNION-ALL                            |                 |      1 |        |     10 |00:00:00.01 |      12 |     27 |
|*  4 |     FILTER                              |                 |      1 |        |     10 |00:00:00.01 |      12 |     27 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      1 |     10 |     10 |00:00:00.01 |      12 |     27 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      1 |     10 |     10 |00:00:00.01 |       2 |     27 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |      0 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |       0 |      0 |
------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$9162BF3C_2")
      OUTLINE_LEAF(@"SET$9162BF3C_1")
      OUTLINE_LEAF(@"SET$9162BF3C")
      OR_EXPAND(@"SEL$1" (1) (2))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1")
      FULL(@"SET$9162BF3C_2" "ADDRESS"@"SET$9162BF3C_2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))

As you can see we have a UNION ALL plan with two FILTER operations, and the filter operations allow one or other of the two branches of the UNION ALL to execute depending on the value for :choice. Since I’ve reported the rowsource execution statistics you can also see that the table access through index range scan (operations 5 and 6) has executed once (Starts = 1) but the tablescan (operation 8) has not been executed at all.

If you check the Predicate Information you will see that operation 8 has introduced two lnnvl() predicates. Since the optimizer has lost sight of the fact that :choice is the same variable in both cases it has to assume that sometimes both branches will be relevant for a single execution, so it has to add predicates to the second branch to eliminate data that might have been found in the first branch. This is the (small) penalty we pay for avoiding a “fully-informed” manual rewrite.

Take a look at the Outline Data – we can see our or_expand() hint repeated there, and we can discover that it’s been enhanced. The hint should have been or_expand(@sel$1 (1) (2)). This might prompt you to modify the original SQL to use the fully qualified hint rather than the bare-bones form we’ve got so far. So let’s assume we do that before shipping the code to production.

Now imagine that a couple of months later an enhancement request appears to allow queries on post_code and the front-end has been set up so that we can specify a post_code query by selecting choice number 3. The developer who happens to pick up the change request duly modifies the SQL as follows:


select
        /*+ or_expand(@sel$1 (1) (2)) */
        count(data)
from    address
where  ( :choice = 1 and street = :val )
or     ( :choice = 2 and suburb = :val )
or     ( :choice = 3 and post_code = :val)
;

Note that we’ve got the “complete” hint in place, but there’s now a 3rd predicate. Do you think the hint is still complete ? What do you think will happen when we run the query ? Here’s the execution plan when I set :choice to 3.


select  /*+ or_expand(@sel$1 (1) (2)) */  count(data) from    address
where  ( :choice = 1 and street = :val ) or     ( :choice = 2 and
suburb = :val ) or     ( :choice = 3 and post_code = :val)

Plan hash value: 3986461375

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  10010 |      0 |00:00:00.01 |
|   3 |    UNION-ALL                            |                 |      1 |        |      0 |00:00:00.01 |
|*  4 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      0 |     10 |      0 |00:00:00.01 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      0 |     10 |      0 |00:00:00.01 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |
-----------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$9162BF3C_2")
      OUTLINE_LEAF(@"SET$9162BF3C_1")
      OUTLINE_LEAF(@"SET$9162BF3C")
      OR_EXPAND(@"SEL$1" (1) (2))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$9162BF3C_1" "ADDRESS"@"SET$9162BF3C_1")
      FULL(@"SET$9162BF3C_2" "ADDRESS"@"SET$9162BF3C_2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))

We get a UNION ALL with two branches, one for :choice = 1, one for :choice = 2 and both of them show zero starts – and we don’t have any part of the plan to handle :choice = 3. The query returns no rows – and if you check the table creation code you’ll see it should have returned 1000 rows. An incorrect (historically adequate) hint has given us wrong results.

If we want the full hint for this new queryy we need to specify the 3rd predicate, by adding (3) to the existing hint to get the following plan (and correct results):


select  /*+ or_expand(@sel$1 (1) (2) (3)) */  count(data) from
address where  ( :choice = 1 and street = :val ) or     ( :choice = 2
and suburb = :val ) or     ( :choice = 3 and post_code = :val)

Plan hash value: 2153173029

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |      1 |        |      1 |00:00:00.01 |    1639 |
|   1 |  SORT AGGREGATE                         |                 |      1 |      1 |      1 |00:00:00.01 |    1639 |
|   2 |   VIEW                                  | VW_ORE_B7380F92 |      1 |  11009 |   1000 |00:00:00.01 |    1639 |
|   3 |    UNION-ALL                            |                 |      1 |        |   1000 |00:00:00.01 |    1639 |
|*  4 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS         |      0 |     10 |      0 |00:00:00.01 |       0 |
|*  6 |       INDEX RANGE SCAN                  | IX1             |      0 |     10 |      0 |00:00:00.01 |       0 |
|*  7 |     FILTER                              |                 |      1 |        |      0 |00:00:00.01 |       0 |
|*  8 |      TABLE ACCESS FULL                  | ADDRESS         |      0 |  10000 |      0 |00:00:00.01 |       0 |
|*  9 |     FILTER                              |                 |      1 |        |   1000 |00:00:00.01 |    1639 |
|* 10 |      TABLE ACCESS FULL                  | ADDRESS         |      1 |    999 |   1000 |00:00:00.01 |    1639 |
---------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('18.1.0')
      DB_VERSION('18.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SET$49E1C21B_3")
      OUTLINE_LEAF(@"SET$49E1C21B_2")
      OUTLINE_LEAF(@"SET$49E1C21B_1")
      OUTLINE_LEAF(@"SET$49E1C21B")
      OR_EXPAND(@"SEL$1" (1) (2) (3))
      OUTLINE_LEAF(@"SEL$B7380F92")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$B7380F92" "VW_ORE_B7380F92"@"SEL$B7380F92")
      INDEX_RS_ASC(@"SET$49E1C21B_1" "ADDRESS"@"SET$49E1C21B_1" ("ADDRESS"."STREET"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SET$49E1C21B_1" "ADDRESS"@"SET$49E1C21B_1")
      FULL(@"SET$49E1C21B_2" "ADDRESS"@"SET$49E1C21B_2")
      FULL(@"SET$49E1C21B_3" "ADDRESS"@"SET$49E1C21B_3")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(:CHOICE=1)
   6 - access("STREET"=:VAL)
   7 - filter(:CHOICE=2)
   8 - filter(("SUBURB"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL))))
   9 - filter(:CHOICE=3)
  10 - filter(("POST_CODE"=:VAL AND (LNNVL(:CHOICE=1) OR LNNVL("STREET"=:VAL)) AND (LNNVL(:CHOICE=2) OR
              LNNVL("SUBURB"=:VAL))))


We now have three branches to the UNION ALL, and the final branch (:choice =3) ran to show A-rows = 1000 selected in the tablescan.

Conclusion

You shouldn’t mess about with hints unless you’re very confident that you know how they work and then test extremely carefully – especially if you’re modifying old code that already contains some hints.

 

Misleading Execution Plan

Tue, 2019-05-21 05:48

A couple of weeks ago I published a note about an execution plan which showed the details of a scalar subquery in the wrong place (as far as the typical strategies for interpreting execution plans are concerned). In a footnote to the article I commented that Andy Sayer had produced a simple reproducible example of the anomaly based around the key features of the query supplied in the original posting and had emailed it to me.  With his permission (and with some minor modifications) I’ve reproduced it below:


rem
rem     Script:         misplaced_subq_plan.sql
rem     Author:         Andrew Sayer
rem     Dated:          May 2019
rem

drop table recursive_table;
drop table lookup_t;
drop table join_t;

@@setup

set linesize 180
set pagesize 60

create table recursive_table (
        my_id           number constraint rt_pk primary key,
        parent_id       number,
        fk_col          number
);

insert into recursive_table 
select 
        rownum, 
        nullif(rownum-1,0)      parent_id, 
        mod(rownum,10) 
from 
        dual 
connect by 
        rownum <=100
;

prompt  ==================================================
prompt  Note that fk_col will be zero for 1/10 of the rows
prompt  ==================================================

create table lookup_t(
        pk_col number  constraint lt_pk primary key,
        value varchar2(30 char)
)
;

insert into lookup_t 
select 
        rownum, 
        rpad('x',30,'x') 
from 
        dual 
connect by 
        rownum <=100
;

create table join_t(
        pk_col number primary key,
        col_10 number,
        value varchar2(100 char)
);

insert into join_t 
select 
        rownum, mod(rownum,10), rpad('x',30,'x') 
from 
        dual 
connect by 
        rownum <=1000 --> comment to avoid WordPress format problem.
;

execute dbms_stats.gather_table_stats(null,'recursive_table')
execute dbms_stats.gather_table_stats(null,'lookup_t')
execute dbms_stats.gather_table_stats(null,'join_t')

prompt	================================
prompt	note that pk_col will never be 0
prompt	================================

set serverout off
alter session set statistics_level=all;

var x number
exec :x := 1

spool misplaced_subq_plan

select  /* distinct */ 
        r.my_id, j.value, r.ssq_value
from    (
	select 
		my_id, parent_id, fk_col, 
		(select l.value from lookup_t l where l.pk_col = r.parent_id) ssq_value 
        from 
		recursive_table r 
	connect by 
		prior my_id = parent_id 
	start with 
		my_id = :x
	) r
join    join_t j
on	r.fk_col = j.pk_col
/

select * from table(dbms_xplan.display_cursor(format=>'allstats projection'));

set serveroutput on

spool off

The code generates, populates, and queries three tables:

  • recursive_table is used in a “connect by” query to generate some data.
  • lookup_t is used in a scalar subquery in the select list of the “connect by” query.
  • join_t is then joined to the result of the “connect by” query to eliminate some rows.

The construction allows us to see a difference between the number of rows returned and the number of times the scalar subquery is executed, and makes it easy to detect an anomaly in the presentation of the execution plan. And here is the execution plan from an 18.3 instance:

--------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
--------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |      1 |        |     90 |00:00:00.01 |     170 |       |       |          |
|   1 |  TABLE ACCESS BY INDEX ROWID              | LOOKUP_T        |    100 |      1 |     99 |00:00:00.01 |     102 |       |       |          |
|*  2 |   INDEX UNIQUE SCAN                       | LT_PK           |    100 |      1 |     99 |00:00:00.01 |       3 |       |       |          |
|*  3 |  HASH JOIN                                |                 |      1 |      2 |     90 |00:00:00.01 |     170 |  1123K|  1123K|     1/0/0|
|   4 |   VIEW                                    |                 |      1 |      2 |    100 |00:00:00.01 |     125 |       |       |          |
|*  5 |    CONNECT BY NO FILTERING WITH START-WITH|                 |      1 |        |    100 |00:00:00.01 |      23 |  9216 |  9216 |     2/0/0|
|   6 |     TABLE ACCESS FULL                     | RECURSIVE_TABLE |      1 |    197 |    100 |00:00:00.01 |      23 |       |       |          |
|   7 |   TABLE ACCESS FULL                       | JOIN_T          |      1 |      1 |   1000 |00:00:00.01 |      45 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("L"."PK_COL"=:B1)
   3 - access("R"."FK_COL"="J"."PK_COL")
   5 - access("PARENT_ID"=PRIOR NULL)
       filter("MY_ID"=:X)

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "L"."VALUE"[VARCHAR2,120]
   2 - "L".ROWID[ROWID,10]
   3 - (#keys=1) "R"."MY_ID"[NUMBER,22], "J"."VALUE"[VARCHAR2,400], "R"."SSQ_VALUE"[VARCHAR2,120], "J"."VALUE"[VARCHAR2,400]
   4 - "R"."MY_ID"[NUMBER,22], "R"."FK_COL"[NUMBER,22], "R"."SSQ_VALUE"[VARCHAR2,120]
   5 - "PARENT_ID"[NUMBER,22], "MY_ID"[NUMBER,22], "FK_COL"[NUMBER,22], "R"."PARENT_ID"[NUMBER,22], PRIOR NULL[22], LEVEL[4]
   6 - "MY_ID"[NUMBER,22], "PARENT_ID"[NUMBER,22], "FK_COL"[NUMBER,22]
   7 - "J"."PK_COL"[NUMBER,22], "J"."VALUE"[VARCHAR2,400]

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - this is an adaptive plan


In a typical execution plan with scalar subqueries in the select list, the sub-plans for the scalar subqueries appear in the plan before the main query – and in this plan you can see the scalar subquery here at operations 1 and 2.

But the scalar subquery is in the select list of a non-mergeable view (operations 4, 5, 6). We can see that this view generates 100 rows (A-rows of operation 4) and the scalar subquery starts 100 times (Starts of operation 1) – so we can infer that the subquery ran for each row generated by the view.

The problem, though, is that the result set from the view is joined to another table, eliminating some rows and reducing the size of the result set; so if we don’t look carefully at all the details of the plan we appear to have a driving query that produces a result set of 90 rows (at operation 3), but manages to execute the scalar subquery just above it in the plan more times than there are rows in the result set.

It’s easy to unpick what’s really happening in this very simple query with a very short plan – but much harder to do so in the original case where the scalar subquery appeared “outside” the hash join when it actually executed inside a complex subplan that generated the second input (proble table) for the hash join.

As a further little note – if you look at the Column Projection Information you’ll see that operation 4 is where Oracle first projects ‘r.ssq_value[varchar2,120]’ which is the column created by the execution of the sub-plan at operation 1.

Arguably the execution plan should have look more like:


Plan hash value: 2557600799

--------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
--------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |      1 |        |     90 |00:00:00.01 |     170 |       |       |          |
|*  1 |  HASH JOIN                                |                 |      1 |      2 |     90 |00:00:00.01 |     170 |  1123K|  1123K|     1/0/0|
|   2 |   VIEW                                    |                 |      1 |      2 |    100 |00:00:00.01 |     125 |       |       |          |
|   3 |    TABLE ACCESS BY INDEX ROWID            | LOOKUP_T        |    100 |      1 |     99 |00:00:00.01 |     102 |       |       |          |
|*  4 |     INDEX UNIQUE SCAN                     | LT_PK           |    100 |      1 |     99 |00:00:00.01 |       3 |       |       |          |
|*  5 |    CONNECT BY NO FILTERING WITH START-WITH|                 |      1 |        |    100 |00:00:00.01 |      23 |  9216 |  9216 |     2/0/0|
|   6 |     TABLE ACCESS FULL                     | RECURSIVE_TABLE |      1 |    100 |    100 |00:00:00.01 |      23 |       |       |          |
|   7 |   TABLE ACCESS FULL                       | JOIN_T          |      1 |      1 |   1000 |00:00:00.01 |      45 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------

Inevitably, there are cases where the sub-plan for a scalar subquery appears much closer to its point of operation rather than being moved to the top of the execution plan. So any time you have scalar subqueries in select lists inside in-line views keep a careful lookout for where they appear and how many times they run in the execution plan. And don’t forget that giving every query block a name will help you track down your migrating subqueries.

Footnote

If you’re wondering why the Column Projection Information reports s.ssq_value as varchar2(120) when I’ve declared the column as varchar2(30), my declaration is 30 CHAR, and the database (by default) is running with a multi-byte character set that allows a maximum of 4 bytes per character.

Update (22nd May 201)

Following the comment from Iudith Mentzel below about clever optimisations, primary keys, and related inferences I thought it worth pointing out that it is possible to modify the demonstration query to get the same plan (shape) with different Start counts. We note that instead of putting the scalar subquery inside the inline view we would get the same result if we passed the parent_id to the outer query block and ran the scalar subquery there:

select  /* distinct */ 
        r.my_id, j.value,
        (select l.value from lookup_t l where l.pk_col = r.parent_id) ssq_value 
from    (
        select 
                my_id, parent_id, fk_col
        from 
                recursive_table r 
        connect by 
                prior my_id = parent_id 
        start with 
                my_id = :x
        ) r
join    join_t j
on      r.fk_col = j.pk_col
/

This gives us the following execution plan (with rowsource execution statistics):


Plan hash value: 2557600799

--------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
--------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |      1 |        |     90 |00:00:00.03 |      29 |       |       |          |
|   1 |  TABLE ACCESS BY INDEX ROWID              | LOOKUP_T        |     90 |      1 |     89 |00:00:00.01 |      97 |       |       |          |
|*  2 |   INDEX UNIQUE SCAN                       | LT_PK           |     90 |      1 |     89 |00:00:00.01 |       8 |       |       |          |
|*  3 |  HASH JOIN                                |                 |      1 |      2 |     90 |00:00:00.03 |      29 |  1695K|  1695K|     1/0/0|
|   4 |   VIEW                                    |                 |      1 |      2 |    100 |00:00:00.01 |       7 |       |       |          |
|*  5 |    CONNECT BY NO FILTERING WITH START-WITH|                 |      1 |        |    100 |00:00:00.01 |       7 |  6144 |  6144 |     2/0/0|
|   6 |     TABLE ACCESS FULL                     | RECURSIVE_TABLE |      1 |    100 |    100 |00:00:00.01 |       7 |       |       |          |
|   7 |   TABLE ACCESS FULL                       | JOIN_T          |      1 |      1 |   1000 |00:00:00.01 |      22 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------

Note that the plan hash values are the same even though (mechanically) the real order of activity is dramatically different. But now we can see that the scalar subquery (operations 1 and 2) starts 90 times – once for each row returned by the hash join at operation 3, and we have done slightly fewer buffer visits (97 compared to 102) for that part of the plan.

It’s a pity, though, that when you start poking at a plan and looking too closely there are always new buggy bits to see. With the scalar subquery now at its optimal position (and maybe it will eventually get there without a manual rewrite) the arithmetic of “summing up the plan” has gone wrong for (at least) the Buffers column. In the new plan the 97 buffer visits attributed to operation 1 (and its descendents) should have been added to the 29 buffer visits attributed to the hash join (and its descendents) at operation 3 to get a total of 126; instead the 97 have just disappeared from the query total.

By comparison, and reading the operations in the original plan a suitable order, we see the view at operation 4 reporting 109 buffers which comes from 7 for its “obvious” descendents plus the 102 from operation 1 that actually happen inside the view. Then the hash join at operation 3 reports 131 buffers which is the 109 from the view plus the 22 from the tablescan at operation 7, and that 131 buffers is the final figure for the query.

So, for this particular example, it doesn’t matter what you do, the execution plan and its stats try to confuse you.

Execution Plan Puzzle

Tue, 2019-05-07 06:45

Here’s an execution plan that’s just been published on the ODC database forum. The plan comes from a call to dbms_xplan.display_cursor() with rowsource execution statistics enabled.

There’s something unusual about the execution statistics that I don’t think I’ve seen before – can anyone else see anything really odd, or (better still) anything which they would expect others to find odd but which they can easily explain.

A couple of hints:

  • It’s nothing to do with the fact that E-Rows and A-Rows don’t match – that’s never a surprise.
  • It’s not really about the fact that huge amounts of time seems to appear out of “nowhere” in the A-Time column
  • It is something to do with the relationship between A-Rows and Starts

I’ve inserted a few separator lines to break the plan into smaller pieces that can be examined in isolation. There are two “Load as Select” sections (presumably from “with” subqueries) and the main body of the query.

We don’t, as at time of writing, have the SQL or the Oracle version number that produced this plan. [Update: version now reported as 12.1.0.2]


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                       | Name                         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                |                              |      1 |        |     50 |00:00:18.00 |     367K|     55 |     55 |       |       |          |
|   1 |  TEMP TABLE TRANSFORMATION                      |                              |      1 |        |     50 |00:00:18.00 |     367K|     55 |     55 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   2 |   LOAD AS SELECT                                |                              |      1 |        |      0 |00:00:00.55 |   56743 |      0 |     53 |  1040K|  1040K|          |
|   3 |    TABLE ACCESS BY INDEX ROWID                  | OBJECTS                      |   7785 |      1 |   7785 |00:00:00.03 |    8150 |      0 |      0 |       |       |          |
|*  4 |     INDEX UNIQUE SCAN                           | PK_OBJECTS                   |   7785 |      1 |   7785 |00:00:00.01 |     360 |      0 |      0 |       |       |          |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED          | ATTRIBUTES                   |   7785 |      1 |   5507 |00:00:00.05 |   12182 |      0 |      0 |       |       |          |
|*  6 |     INDEX RANGE SCAN                            | UK_ATTR                      |   7785 |      1 |   5507 |00:00:00.03 |    9621 |      0 |      0 |       |       |          |
|*  7 |      TABLE ACCESS FULL                          | ATTRIBUTE_TYPES              |      1 |      1 |      1 |00:00:00.01 |      38 |      0 |      0 |       |       |          |
|   8 |    TABLE ACCESS BY INDEX ROWID BATCHED          | ATTRIBUTES                   |   7785 |      1 |   5507 |00:00:00.03 |   12182 |      0 |      0 |       |       |          |
|*  9 |     INDEX RANGE SCAN                            | UK_ATTR                      |   7785 |      1 |   5507 |00:00:00.02 |    9621 |      0 |      0 |       |       |          |
|* 10 |      TABLE ACCESS FULL                          | ATTRIBUTE_TYPES              |      1 |      1 |      1 |00:00:00.01 |      38 |      0 |      0 |       |       |          |
|  11 |    TABLE ACCESS BY INDEX ROWID BATCHED          | ATTRIBUTES                   |   1366 |      1 |   1366 |00:00:00.02 |    4592 |      0 |      0 |       |       |          |
|* 12 |     INDEX RANGE SCAN                            | IDX_ATTR_NDC_OBJECT_VALUE    |   1366 |      1 |   1366 |00:00:00.01 |    3227 |      0 |      0 |       |       |          |
|* 13 |      INDEX RANGE SCAN                           | NCI_NODES_COVERING_IDX       |   1366 |      1 |   1366 |00:00:00.01 |     595 |      0 |      0 |       |       |          |
|* 14 |    VIEW                                         |                              |      1 |     12 |   7785 |00:00:00.41 |   24174 |      0 |      0 |       |       |          |
|* 15 |     FILTER                                      |                              |      1 |        |   7891 |00:00:00.39 |   19582 |      0 |      0 |       |       |          |
|* 16 |      CONNECT BY WITH FILTERING                  |                              |      1 |        |  66134 |00:00:00.37 |   19144 |      0 |      0 |  7069K|  1062K| 6283K (0)|
|  17 |       TABLE ACCESS BY INDEX ROWID               | NODES                        |      1 |      1 |      1 |00:00:00.01 |       4 |      0 |      0 |       |       |          |
|* 18 |        INDEX UNIQUE SCAN                        | PK_NODES                     |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|  19 |       NESTED LOOPS                              |                              |      9 |     11 |  66133 |00:00:00.19 |   19137 |      0 |      0 |       |       |          |
|  20 |        CONNECT BY PUMP                          |                              |      9 |        |  66134 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
|* 21 |        TABLE ACCESS BY INDEX ROWID BATCHED      | NODES                        |  66134 |     11 |  66133 |00:00:00.15 |   19137 |      0 |      0 |       |       |          |
|* 22 |         INDEX RANGE SCAN                        | NCI_NODES_PARENT_NODE_ID     |  66134 |     11 |  67807 |00:00:00.08 |   12139 |      0 |      0 |       |       |          |
|  23 |         TABLE ACCESS BY INDEX ROWID             | OBJECT_TYPES                 |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|* 24 |          INDEX UNIQUE SCAN                      | UK_IDX_OBJECT_TYPE_NDC       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
|  25 |       TABLE ACCESS BY INDEX ROWID               | OBJECT_TYPES                 |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|* 26 |        INDEX UNIQUE SCAN                        | UK_IDX_OBJECT_TYPE_NDC       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
|* 27 |      TABLE ACCESS BY INDEX ROWID                | OBJECT_TYPES                 |    219 |      1 |      1 |00:00:00.01 |     438 |      0 |      0 |       |       |          |
|* 28 |       INDEX UNIQUE SCAN                         | PK_OBJECT_TYPES              |    219 |      1 |    219 |00:00:00.01 |     219 |      0 |      0 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  29 |   LOAD AS SELECT                                |                              |      1 |        |      0 |00:00:02.86 |   37654 |     53 |      2 |  1040K|  1040K|          |
|  30 |    TABLE ACCESS BY INDEX ROWID                  | OBJECTS                      |    316 |      1 |    316 |00:00:00.01 |     603 |      0 |      0 |       |       |          |
|* 31 |     INDEX UNIQUE SCAN                           | PK_OBJECTS                   |    316 |      1 |    316 |00:00:00.01 |     287 |      0 |      0 |       |       |          |
|  32 |    TABLE ACCESS BY INDEX ROWID BATCHED          | ATTRIBUTES                   |    316 |      1 |    316 |00:00:00.01 |     950 |      0 |      0 |       |       |          |
|* 33 |     INDEX RANGE SCAN                            | UK_ATTR                      |    316 |      1 |    316 |00:00:00.01 |     666 |      0 |      0 |       |       |          |
|* 34 |      TABLE ACCESS FULL                          | ATTRIBUTE_TYPES              |      1 |      1 |      1 |00:00:00.01 |      38 |      0 |      0 |       |       |          |
|  35 |    HASH UNIQUE                                  |                              |      1 |    148 |    316 |00:00:02.86 |   37650 |     53 |      0 |  1041K|  1041K| 1371K (0)|
|* 36 |     FILTER                                      |                              |      1 |        |   5500 |00:00:02.85 |   36097 |     53 |      0 |       |       |          |
|  37 |      MERGE JOIN CARTESIAN                       |                              |      1 |    148 |   5114K|00:00:02.23 |   34073 |     53 |      0 |       |       |          |
|* 38 |       HASH JOIN                                 |                              |      1 |     12 |    657 |00:00:01.05 |   34016 |      0 |      0 |  1003K|  1003K|  728K (0)|
|  39 |        NESTED LOOPS                             |                              |      1 |     69 |    969 |00:00:00.36 |   20145 |      0 |      0 |       |       |          |
|  40 |         NESTED LOOPS                            |                              |      1 |    132 |    970 |00:00:00.36 |   19975 |      0 |      0 |       |       |          |
|  41 |          VIEW                                   |                              |      1 |     12 |    312 |00:00:00.35 |   19582 |      0 |      0 |       |       |          |
|* 42 |           FILTER                                |                              |      1 |        |    312 |00:00:00.35 |   19582 |      0 |      0 |       |       |          |
|* 43 |            CONNECT BY WITH FILTERING            |                              |      1 |        |  66134 |00:00:00.34 |   19144 |      0 |      0 |  6219K|  1010K| 5527K (0)|
|  44 |             TABLE ACCESS BY INDEX ROWID         | NODES                        |      1 |      1 |      1 |00:00:00.01 |       4 |      0 |      0 |       |       |          |
|* 45 |              INDEX UNIQUE SCAN                  | PK_NODES                     |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|  46 |             NESTED LOOPS                        |                              |      9 |     11 |  66133 |00:00:00.18 |   19137 |      0 |      0 |       |       |          |
|  47 |              CONNECT BY PUMP                    |                              |      9 |        |  66134 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
|* 48 |              TABLE ACCESS BY INDEX ROWID BATCHED| NODES                        |  66134 |     11 |  66133 |00:00:00.15 |   19137 |      0 |      0 |       |       |          |
|* 49 |               INDEX RANGE SCAN                  | NCI_NODES_PARENT_NODE_ID     |  66134 |     11 |  67807 |00:00:00.08 |   12139 |      0 |      0 |       |       |          |
|  50 |               TABLE ACCESS BY INDEX ROWID       | OBJECT_TYPES                 |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|* 51 |                INDEX UNIQUE SCAN                | UK_IDX_OBJECT_TYPE_NDC       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
|  52 |             TABLE ACCESS BY INDEX ROWID         | OBJECT_TYPES                 |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|* 53 |              INDEX UNIQUE SCAN                  | UK_IDX_OBJECT_TYPE_NDC       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
|* 54 |            TABLE ACCESS BY INDEX ROWID          | OBJECT_TYPES                 |    219 |      1 |      1 |00:00:00.01 |     438 |      0 |      0 |       |       |          |
|* 55 |             INDEX UNIQUE SCAN                   | PK_OBJECT_TYPES              |    219 |      1 |    219 |00:00:00.01 |     219 |      0 |      0 |       |       |          |
|* 56 |          INDEX RANGE SCAN                       | NCI_NODES_PARENT_NODE_ID     |    312 |     11 |    970 |00:00:00.01 |     393 |      0 |      0 |       |       |          |
|* 57 |         TABLE ACCESS BY INDEX ROWID             | NODES                        |    970 |      6 |    969 |00:00:00.01 |     170 |      0 |      0 |       |       |          |
|* 58 |        VIEW                                     | index$_join$_065             |      1 |     42 |      4 |00:00:00.69 |   13871 |      0 |      0 |       |       |          |
|* 59 |         HASH JOIN                               |                              |      1 |        |    434 |00:00:00.01 |      12 |      0 |      0 |  1519K|  1519K| 1491K (0)|
|  60 |          INDEX FAST FULL SCAN                   | PK_OBJECT_TYPES              |      1 |     42 |    434 |00:00:00.01 |       4 |      0 |      0 |       |       |          |
|  61 |          INDEX FAST FULL SCAN                   | UK_IDX_OBJECT_TYPE_NDC       |      1 |     42 |    434 |00:00:00.01 |       8 |      0 |      0 |       |       |          |
|  62 |       BUFFER SORT                               |                              |    657 |     12 |   5114K|00:00:00.63 |      57 |     53 |      0 |   372K|   372K|  330K (0)|
|  63 |        VIEW                                     |                              |      1 |     12 |   7785 |00:00:00.02 |      57 |     53 |      0 |       |       |          |
|  64 |         TABLE ACCESS FULL                       | SYS_TEMP_0FD9D761B_1445481D  |      1 |     12 |   7785 |00:00:00.02 |      57 |     53 |      0 |       |       |          |
|  65 |      TABLE ACCESS BY INDEX ROWID                | OBJECTS                      |    657 |      1 |    657 |00:00:00.01 |    1068 |      0 |      0 |       |       |          |
|* 66 |       INDEX UNIQUE SCAN                         | PK_OBJECTS                   |    657 |      1 |    657 |00:00:00.01 |     410 |      0 |      0 |       |       |          |
|  67 |      TABLE ACCESS BY INDEX ROWID BATCHED        | ATTRIBUTES                   |    318 |      1 |    318 |00:00:00.01 |     956 |      0 |      0 |       |       |          |
|* 68 |       INDEX RANGE SCAN                          | UK_ATTR                      |    318 |      1 |    318 |00:00:00.01 |     670 |      0 |      0 |       |       |          |
|* 69 |        TABLE ACCESS FULL                        | ATTRIBUTE_TYPES              |      1 |      1 |      1 |00:00:00.01 |      38 |      0 |      0 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  70 |   SORT GROUP BY                                 |                              |      1 |      1 |     50 |00:00:14.59 |     273K|      2 |      0 |   619K|   471K|  550K (0)|
|  71 |    VIEW                                         |                              |      1 |      1 |   4375 |00:00:13.31 |     273K|      2 |      0 |       |       |          |
|  72 |     HASH UNIQUE                                 |                              |      1 |      1 |   4375 |00:00:13.31 |     273K|      2 |      0 |  1186K|  1186K| 1400K (0)|
|  73 |      TABLE ACCESS BY INDEX ROWID                | OBJECTS                      |   4606 |      1 |   4606 |00:00:05.59 |   37088 |      0 |      0 |       |       |          |
|* 74 |       INDEX UNIQUE SCAN                         | PK_OBJECTS                   |   4606 |      1 |   4606 |00:00:05.56 |   32472 |      0 |      0 |       |       |          |
|* 75 |      HASH JOIN                                  |                              |      1 |      1 |   4375 |00:00:13.29 |     273K|      2 |      0 |  1410K|  1075K| 1423K (0)|
|  76 |       NESTED LOOPS                              |                              |      1 |      1 |   4375 |00:00:00.07 |   12952 |      2 |      0 |       |       |          |
|  77 |        NESTED LOOPS                             |                              |      1 |      2 |   4375 |00:00:00.06 |   12593 |      2 |      0 |       |       |          |
|  78 |         NESTED LOOPS                            |                              |      1 |      1 |   4375 |00:00:00.05 |   11761 |      2 |      0 |       |       |          |
|* 79 |          HASH JOIN                              |                              |      1 |      1 |   5500 |00:00:00.01 |      60 |      2 |      0 |  1321K|  1321K| 1775K (0)|
|  80 |           VIEW                                  |                              |      1 |     12 |   7785 |00:00:00.01 |      54 |      0 |      0 |       |       |          |
|  81 |            TABLE ACCESS FULL                    | SYS_TEMP_0FD9D761B_1445481D  |      1 |     12 |   7785 |00:00:00.01 |      54 |      0 |      0 |       |       |          |
|  82 |           VIEW                                  |                              |      1 |    148 |    316 |00:00:00.01 |       6 |      2 |      0 |       |       |          |
|  83 |            TABLE ACCESS FULL                    | SYS_TEMP_0FD9D761C_1445481D  |      1 |    148 |    316 |00:00:00.01 |       6 |      2 |      0 |       |       |          |
|  84 |          TABLE ACCESS BY INDEX ROWID BATCHED    | ATTRIBUTES                   |   5500 |      1 |   4375 |00:00:00.04 |   11701 |      0 |      0 |       |       |          |
|* 85 |           INDEX RANGE SCAN                      | IDX_ATTR_NDC_OBJECT_VALUE    |   5500 |      1 |   4375 |00:00:00.02 |    7353 |      0 |      0 |       |       |          |
|* 86 |         INDEX RANGE SCAN                        | NCI_ATTRIBUTE_VALUES_ATTR_ID |   4375 |      2 |   4375 |00:00:00.01 |     832 |      0 |      0 |       |       |          |
|  87 |        TABLE ACCESS BY INDEX ROWID              | ATTRIBUTE_VALUES             |   4375 |      2 |   4375 |00:00:00.01 |     359 |      0 |      0 |       |       |          |
|  88 |       VIEW                                      |                              |      1 |   1730 |   4606 |00:00:13.21 |     260K|      0 |      0 |       |       |          |
|* 89 |        FILTER                                   |                              |      1 |        |   4606 |00:00:00.06 |    2094 |      0 |      0 |       |       |          |
|* 90 |         CONNECT BY WITH FILTERING               |                              |      1 |        |   4922 |00:00:00.05 |    2037 |      0 |      0 |   478K|   448K|  424K (0)|
|  91 |          NESTED LOOPS                           |                              |      1 |    148 |    316 |00:00:00.01 |     953 |      0 |      0 |       |       |          |
|  92 |           NESTED LOOPS                          |                              |      1 |    148 |    316 |00:00:00.01 |     637 |      0 |      0 |       |       |          |
|  93 |            VIEW                                 | VW_NSO_1                     |      1 |    148 |    316 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|  94 |             HASH UNIQUE                         |                              |      1 |    148 |    316 |00:00:00.01 |       3 |      0 |      0 |  2170K|  2170K| 2517K (0)|
|  95 |              VIEW                               |                              |      1 |    148 |    316 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|  96 |               TABLE ACCESS FULL                 | SYS_TEMP_0FD9D761C_1445481D  |      1 |    148 |    316 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|* 97 |            INDEX UNIQUE SCAN                    | PK_NODES                     |    316 |      1 |    316 |00:00:00.01 |     634 |      0 |      0 |       |       |          |
|  98 |           TABLE ACCESS BY INDEX ROWID           | NODES                        |    316 |      1 |    316 |00:00:00.01 |     316 |      0 |      0 |       |       |          |
|  99 |          NESTED LOOPS                           |                              |      2 |   1582 |   4606 |00:00:00.01 |    1081 |      0 |      0 |       |       |          |
| 100 |           CONNECT BY PUMP                       |                              |      2 |        |   4922 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
|*101 |           TABLE ACCESS BY INDEX ROWID BATCHED   | NODES                        |   4922 |     11 |   4606 |00:00:00.01 |    1081 |      0 |      0 |       |       |          |
|*102 |            INDEX RANGE SCAN                     | NCI_NODES_PARENT_NODE_ID     |   4922 |     11 |   4608 |00:00:00.01 |     950 |      0 |      0 |       |       |          |
| 103 |            TABLE ACCESS BY INDEX ROWID          | OBJECT_TYPES                 |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|*104 |             INDEX UNIQUE SCAN                   | UK_IDX_OBJECT_TYPE_NDC       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
| 105 |          TABLE ACCESS BY INDEX ROWID            | OBJECT_TYPES                 |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|*106 |           INDEX UNIQUE SCAN                     | UK_IDX_OBJECT_TYPE_NDC       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
|*107 |         TABLE ACCESS BY INDEX ROWID             | OBJECT_TYPES                 |      3 |      1 |      1 |00:00:00.01 |      57 |      0 |      0 |       |       |          |
|*108 |          INDEX UNIQUE SCAN                      | PK_OBJECT_TYPES              |      3 |      1 |      3 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("O"."OBJECT_ID"=:B1)
   6 - access("A"."OBJECT_ID"=:B1 AND "A"."ATTRIBUTE_TYPE_ID"=)
   7 - filter("NAME_DISPLAY_CODE"='ATTRIBUTE_TYPE.MDR_CODELIST')
   9 - access("A"."OBJECT_ID"=:B1 AND "A"."ATTRIBUTE_TYPE_ID"=)
  10 - filter("NAME_DISPLAY_CODE"='ATTRIBUTE_TYPE.MDR_CODELIST')
  12 - access("A"."NAME_DISPLAY_CODE"='ATTRIBUTE_TYPE.MDR_ACTIVE_STATUS' AND "A"."OBJECT_ID"=)
  13 - access("NOD"."NODE_ID"=:B1)
  14 - filter("GROUP_ACTIVE_STATUS"='LOOKUP_VALUE.ACTIVE_STATUS_A')
  15 - filter(("DELETION_DATE"='01-Jan-1900' AND  IS NOT NULL AND "OBJECT_ID" IS NOT NULL))
  16 - access("N"."PARENT_NODE_ID"=PRIOR NULL)
  18 - access("N"."NODE_ID"=TO_NUMBER(:I_NODE_ID))
  21 - filter("N"."OBJECT_TYPE_ID"<>)
  22 - access("connect$_by$_pump$_029"."PRIOR n.node_id"="N"."PARENT_NODE_ID")
  24 - access("NAME_DISPLAY_CODE"='OBJECT_TYPE.ARCHIVE_CONTAINER')
  26 - access("NAME_DISPLAY_CODE"='OBJECT_TYPE.ARCHIVE_CONTAINER')
  27 - filter("NAME"=:I_SEARCH_OBJ_TYPE)
  28 - access("OBJECT_TYPE_ID"=:B1)
  31 - access("O"."OBJECT_ID"=:B1)
  33 - access("A"."OBJECT_ID"=:B1 AND "A"."ATTRIBUTE_TYPE_ID"=)
  34 - filter("AT"."NAME_DISPLAY_CODE"='ATTRIBUTE_TYPE.MDR_MASTER_UNIQUE_ITEM_ID')
  36 - filter(("CN"."CODELIST"= AND "CN"."CODELIST_MUI"=))
  38 - access("N"."OBJECT_TYPE_ID"="OBJECT_TYPE_ID")
  42 - filter(("DELETION_DATE"='01-Jan-1900' AND  IS NOT NULL))
  43 - access("PARENT_NODE_ID"=PRIOR NULL)
  45 - access("NODE_ID"=TO_NUMBER(:I_NODE_ID))
  48 - filter("OBJECT_TYPE_ID"<>)
  49 - access("connect$_by$_pump$_049"."PRIOR node_id "="PARENT_NODE_ID")
  51 - access("NAME_DISPLAY_CODE"='OBJECT_TYPE.ARCHIVE_CONTAINER')
  53 - access("NAME_DISPLAY_CODE"='OBJECT_TYPE.ARCHIVE_CONTAINER')
  54 - filter(("NAME_DISPLAY_CODE"='OBJECT_TYPE.MDR_CL_CONTAINER' OR "NAME_DISPLAY_CODE"='OBJECT_TYPE.MDR_STUDY_CL_PARENT_CONTAINER'))
  55 - access("OBJECT_TYPE_ID"=:B1)
  56 - access("N"."PARENT_NODE_ID"="NODE_ID")
  57 - filter("N"."DELETION_DATE"='01-Jan-1900')
  58 - filter(("NAME_DISPLAY_CODE"MEMBER OF"PKG_MDR_COMP_RPT_MGR"."F_GET_LOV_MAPPED_OBJTYPES"("VP40"."VARCHAR_TBL"('OBJECT_TYPE.MDR_CL_CODELIST')) OR
              "NAME_DISPLAY_CODE"MEMBER OF"PKG_MDR_COMP_RPT_MGR"."F_GET_LOV_MAPPED_OBJTYPES"("VP40"."VARCHAR_TBL"('OBJECT_TYPE.MDR_CL_SUBSET'))))
  59 - access(ROWID=ROWID)
  66 - access("O"."OBJECT_ID"=:B1)
  68 - access("A"."OBJECT_ID"=:B1 AND "A"."ATTRIBUTE_TYPE_ID"=)
  69 - filter("AT"."NAME_DISPLAY_CODE"='ATTRIBUTE_TYPE.MDR_MASTER_UNIQUE_ITEM_ID')
  74 - access("O"."OBJECT_ID"=NVL(:B1,"PKG_MDR_UTIL"."F_GET_REF_NODE_OBJECT_ID"(:B2,"REQ_INFO_TYPE"("USER_CREDENTIALS_TYPE"(1,NULL,'en-US'),
              "AUDIT_INFO_TYPE"(NULL,NULL,NULL),NULL,NULL,NULL))))
  75 - access("COD"."CL_NODE_ID"="CL"."NODE_ID" AND "AV"."ATTRIBUTE_VALUE"="COD"."OBJ_NAME")
  79 - access("CN"."CODELIST_MUI"="CL"."MUI_VALUE")
  85 - access("A"."NAME_DISPLAY_CODE"='ATTRIBUTE_TYPE.MDR_CODELIST_VALUE' AND "CN"."OBJECT_ID"="A"."OBJECT_ID")
  86 - access("A"."ATTRIBUTE_ID"="AV"."ATTRIBUTE_ID")
  89 - filter(("DELETION_DATE"='01-Jan-1900' AND  IS NOT NULL))
  90 - access("N"."PARENT_NODE_ID"=PRIOR NULL)
  97 - access("N"."NODE_ID"="NODE_ID")
 101 - filter("N"."OBJECT_TYPE_ID"<>)
 102 - access("connect$_by$_pump$_082"."PRIOR n.node_id "="N"."PARENT_NODE_ID")
 104 - access("NAME_DISPLAY_CODE"='OBJECT_TYPE.ARCHIVE_CONTAINER')
 106 - access("NAME_DISPLAY_CODE"='OBJECT_TYPE.ARCHIVE_CONTAINER')
 107 - filter("NAME_DISPLAY_CODE"MEMBER OF"PKG_MDR_COMP_RPT_MGR"."F_GET_LOV_MAPPED_OBJTYPES"("VP40"."VARCHAR_TBL"('OBJECT_TYPE.MDR_CL_CODE')))
 108 - access("OBJECT_TYPE_ID"=:B1)

Any observations welcome. I’m not expecting many people to see the anomaly I see (and there may be further anomalies I haven’t even looked for that others do see straight away), but it’s possible that the pattern is one that some people frequently see and find totally unsurprising.

Update – where’s the anomaly

The anomaly is the presence of operations 73 and 74.

There are two different observations that make these lines stand out. First, operation 72 is a hash unique which is a “single child” operation that calls its child to supply a rowsource and then reduces that rowsource to a distinct set using a hashing mechanism. But in this plan we can see that operation 72 appears to have two child operations – numbers 73 and 75 – so clearly the plan isn’t following the pure “standard” pattern.

Secondly, notice that operations 73 and 74 both report 4,606 Starts. An operation that reports “N” starts has to have a parent operation calling it N times, which means the parent operation must have reported (at least) N rows  under the A-Rows heading. But we know that the hash unique operation will call its child operation exactly once – and we can see that the hash unique here has only been called once. So something else much be causing the 4,606 Starts.

Fortunately we remember that “scalar subqueries in the select list” will report their execution plans above the part of the plan that describes the main body of the query. In fact we can see this several times in the two “load as select” parts of this plan; operations (3,4), (5,6,7), (8,9,10), (11,12,13) describe 4 scalar subqueries that must be embedded in the select list of the first “with” subquery that is described by operations 14 – 28.

So we could assume, for the moment, that operations 73 and 74 are in some way an inline scalar subquery in a select list – and that leads to the next step in the problem. A scalar subquery will operate at most once for each row returned in the main rowsource – though the number of starts might be reduced by the effects of scalar subquery caching. Operations 73 and 74 start 4,606 times; the rowsource that we feel it ought to be associated with is the hash join immediately below it (operation 75) which returns 4,375 rows, moreover the first child of the hash join returns 4,375 rows – so we’re not seeing enough rows returned to justify our second attempt at interpreting the plan.

So where can we find something that returns 4,606 (or more) rows that would allow us to say “maybe that’s were the scalar subquery goes” ?

Look further down the plan – operation 88 (the view operation that constitutes the second child of the hash join) reports A-Rows = 4,606. Operations 73,74 really ought to be reported between operations 88 and 89.  There’s a filter at operation 89 that reduces the 4,922 rows produced by operation 90 to 4,606 and it’s after that that the scalar subquery runs to add a column to the rowsource before passing it upwards. (We might be able to see this if we had the projection information for the plan)

Corroborating this claim we can look at the A-Time for operation 88: it’s 13.21 seconds and there’s nothing below it that accounts for that time; but if we insert operations 73 and 74 just below operation 88 we suddenly have 4,606 subquery calls which report 5.59 seconds and that’s a step in the right direction for identifying the 13.21 seconds that appeared “from nowhere” – especially when you notice that the predicate for operation 74 (or 88a) calls a PL/SQL packaged procedure that is either calling three more Pl/SQL procedures or 3 user-defined types and probably using a fair amount of unrecorded time.

Conclusion

Scalar subqueries in select lists can dump their execution plans in places you don’t expect. We know that the plan for a scalar subquery in the select list of a simple query will report itself above the main body of the query plan. Here we have an example of a scalar subquery that reports itself an extra step further out than we intuitively suspect and, quite possibly, if we hadn’t had the rowsource execution statistics to guide us, we wouldn’t have been able to work out what the plan was really trying to tell us.

Footnote

Since Andy Sayer had been commenting on the same ODC thread I emailed him a brief version of the above notes last night, and he created, and emailed to me, a very simple example to reproduce this behaviour which I’ve also tested on 11.2.0.4.

Occurence

Fri, 2019-05-03 07:34

Before you comment – I do know that the title has a spelling mistake in it. That’s because the Oracle code uses exactly this spelling in one of the little-used features of tracing.

I write a note a few years ago about enabling sql_trace (and other tracing events) system-wide for a single SQL statement. In the note I suggested that you could enable tracing for a few minutes then disable it to minimise the impact on the system while still capturing a reasonable number of statement traces. A recent ODC thread, however, described a problem where a particular statement executed in the order of 1,000,000 times per hour – which is getting on for about 300 executions per second, and you probably don’t want to leave a system-wide trace running for any length of time when things are operating at that rate. Fortunately we can refine the method with the occurence filter to capture a small and limited number of executions, spread over as many sessions as are running. Here’s an example of the syntax:

rem
rem     Script: trace_occur.sql
rem     Author: Jonathan Lewis
rem     Dated:  April 2019
rem

define m_sql_id = 'gu1s28n6y73dg'
define m_sql_id = 'fu0ftxk6jcyya'

alter system set events 
        '
        sql_trace[SQL:&m_sql_id] 
        {occurence: start_after 101, end_after 496}
        bind=true,
        wait=true
        '
;

pause   Run the test script here and press return when it ends

alter system set events 
        '
        sql_trace[SQL:&m_sql_id]
        off
        '
;

All I’ve done, compared to the earlier note, is include in curly brackets, just after identifying the SQL ID, the text: “{occurence: start_after 101 , end_after 496}”. Roughly speaking this means that every session will start counting calls to the given statement and on the hundred and first it will start dumping the trace file, and for a total of 496 calls it will continue dumping the trace file. So it’s possible to make sure that a session does trace but doesn’t dump a huge volume of trace data. Of course I do still execute a call to switch tracing off for the statement otherwise every session that subsequently logs on will still start tracing and dump a few executions into their trace file.

There is, unfortunately, a catch. I don’t know how Oracle is counting for the start_after/end_after values – but it’s not executions of the statement, and it varies with working environment, and it changes as the trace is enabled, and it changes with version, and is probably dependent on the session_cached_cursors parameter, and it behaves differently when interacting with the PL/SQL cursor cache. It is perhaps easiest to show an example.

I have table called test_lobs (id, bytes …) with a unique index on (id) for this test.


create table test_lobs (
        id        number(8,0) primary key,
        bytes     number(8,0)
);

insert into test_lobs values(-1,999):
commit;

execute dbms_stats.gather_table_stats(user,'test_lobs')

And one of my test scripts is as follows:

rem
rem     This generates a statement with SQL_ID = gu1s28n6y73dg
rem

declare
        m_result number;
begin
        for i in 1..1000 loop
                begin
                        select bytes into m_result from test_lobs where id = i;
                exception
                        when others then null;
        end;
        end loop;
end;
/

Running 18.3 I start the trace script from one session, then start the test script from another session. As it stands the SQL statement embedded in the PL/SQL loop will have the SQL_ID I am tracing, so the second session will start dumping a trace file. The big question is: which executions of the statement will it dump? Since I’ve enabled bind variable dumping and the bound value is a simple loop counter it will be easy (!) to find the answer to this question.

To stabilise the results I did the following:

  • Session 1: Create the table.
  • Session 1: Start the trace event
  • Session 2: Connect to the database and run the test
  • Session 1: End the trace event
  • Session 1: Start the trace event again
  • Session 2: Connect to the database again and run the test a second time
  • Session 1: End the trace event

I’ll explain the need for looking at the results of the second cycle in a moment.

The trace file I produced started with the first three lines below, and then repeated the 10 line highlighted fragment a number of times:


PARSING IN CURSOR #140126713239784 len=43 dep=1 uid=104 oct=3 lid=104 tim=168304257545 hv=233016751 ad='63b8f0c0' sqlid='gu1s28n6y73dg'
SELECT BYTES FROM TEST_LOBS WHERE ID = :B1
END OF STMT

====================================================================================================
BINDS #140126713239784:

 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=03 fl2=1206001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f71cb0c67c0  bln=22  avl=02  flg=05
  value=50
EXEC #140126713239784:c=0,e=57,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=2054437130,tim=168304262695
FETCH #140126713239784:c=0,e=3,p=0,cr=1,cu=0,mis=0,r=0,dep=1,og=1,plh=2054437130,tim=168304262729
CLOSE #140126713239784:c=0,e=1,dep=1,type=3,tim=168304262772
====================================================================================================

Notice the bind value report. A key feature that we are interested in is the first “value=” reported and the last “value=”. In my case the low/high were 26 and 87, for a total of 62 executions. A little arithmetic (and a few corroborating checks) highlight the significance of the following:

  • I started tracing after the 25th execution, and 25 * 4 + 1 = 101, my start_after value.
  • I traced 62 executions and 62 * 8 = 496, my end_after value.

Oracle is counting something whenever it hits the SQL_ID we’ve specified but (for SQL inside a PL/SQL loop) it’s counting something which happens 4 times for each execution; then when it hits the start_after and starts tracing whatever it counts happens twice as often each time around the loop.

My general response to this type of detail is: “Argh!!!” – by the way. Maybe a call to Frits or Stefan asking them to trace C functions is in order.

By this time you may have guessed why I examined the trace file from the second run of the test. The counting seems to include counts of database calls that take place in the recursive SQL needed to optimise / hard parse the query – anything that relates to the SQL_ID we specify may be included in the counts. So on the first test I got a load of garbage in the trace file then saw a partial dump of the trace data for value=2 and the trace file ended partway through the trace data for value=17.

As a further test, I had a go with pure SQL calls in a test script:


set serveroutput off
variable b1 number;

exec :b1 := 100
select bytes from test_lobs where id = :b1;

exec :b1 := 101
select bytes from test_lobs where id = :b1;

...

exec :b1 := 129
select bytes from test_lobs where id = :b1;

After getting a stable result, versions 12.1.0.2 and 18.3.0.0 behaved differently;

  • 18.3.0.0 – counted 5 for every execution, so start_after = 16 skipped the first 3 executions and started tracing for value = 103
  • 12.1.0.2 – counted 7 for the first execution and 5 thereafter, so start_after=8 skipped one execution, start_after=13 skipped two and so on.
  • Both versions counted 10 for every execution while tracing was enabled, so end_after = 30 traced 3 executions in both cases.

It’s possible, of course, that some differences in the way session_cached_cursors works would for the small difference – but I suspect I could have spent a couple of days trying to sort out minor variations due to slight parameter and implementation changes between versions. It’s also possible that some of my guesses are wrong and there is more method to the madness than I have spotted.

Conclusion

It is possible to enable tracing system-wide for a limited number of executions per session of a given statement; however the number of executions that might take place before tracing starts and the number of executions actually traced depends on a variety of details of which some may be outside your control.

As a baseline, it looks as if the number of executions before tracing starts is going to be about one-fifth of the value you set for start_after, and the number of executions trace will be about one-tenth of the end_after; however recursive SQL (perhaps even including dynamic sampling) can get caught up in the counts, potentially reducing the number of executions of the target statement that you see.

LOB length

Thu, 2019-05-02 05:32

This note is a reminder combined with a warning about unexpected changes as you move from version to version. Since it involves LOBs (large objects) it may not be relevant for most people but since there’s a significant change in the default character set for the database as you move up to 18.3 (or maybe even as you move to 12.2) anyone using character LOBs may get a surprise.

Here’s a simple script that I’m first going to run on an instance of 11.2.0.4:

rem
rem     Script:         lob_length_4.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2019
rem

create table t1(
        n1      number,
        version varchar2(8),
        v1      varchar2(4000),
        c1      clob,
        nc1     nclob
)
lob (c1) store as securefile c1_lob(
        enable storage in row
        chunk 8K
        cache logging
)
lob (nc1) store as securefile nc1_lob(
        enable storage in row
        chunk 8K
        cache logging
)
;

insert into t1 values( 0,'General',  null,               empty_clob(),       empty_clob()      ) ;

insert into t1 values( 1,'11.2.0.4', rpad('x',1951,'x'), rpad('x',1951,'x'), rpad('x',1951,'x')) ;
insert into t1 values( 2,'11.2.0.4', rpad('x',1952,'x'), rpad('x',1952,'x'), rpad('x',1952,'x')) ;

insert into t1 values( 3,'12.1.0.2', rpad('x',1951,'x'), rpad('x',1951,'x'), rpad('x',1951,'x')) ;
insert into t1 values( 4,'12.1.0.2', rpad('x',1952,'x'), rpad('x',1952,'x'), rpad('x',1952,'x')) ;

insert into t1 values( 5,'12.2.0.1', rpad('x',1937,'x'), rpad('x',1937,'x'), rpad('x',1937,'x')) ;
insert into t1 values( 6,'12.2.0.1', rpad('x',1938,'x'), rpad('x',1938,'x'), rpad('x',1938,'x')) ;

insert into t1 values( 7,'18.3.0.0', rpad('x',1984,'x'), rpad('x',1984,'x'), rpad('x',1984,'x')) ;
insert into t1 values( 8,'18.3.0.0', rpad('x',1985,'x'), rpad('x',1985,'x'), rpad('x',1985,'x')) ;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
end;
/

break on version skip 1 on report

compute avg of sys_op_opnsize(c1)  on report
compute avg of sys_op_opnsize(nc1) on report

select
        version, n1,
        length(v1), length(c1), length(nc1),
        sys_op_opnsize(v1), sys_op_opnsize(c1), sys_op_opnsize(nc1)
from
        t1
order by
        n1
;

select  avg_row_len
from    user_tables
where   table_name = 'T1'
;

select  column_name, avg_col_len
from    user_tab_cols
where   table_name = 'T1'
order by
        column_id
;

I’ve created a table holding a varchar2() a CLOB, and an NCLOB, then I’ve inserted some rows into that table, labelling the rows in pairs with what appear to be Oracle version numbers, with one extra row labelled “General” that holds the special “empty LOB value (note a NULL and an empty clob behave very differently). Then I’ve reported the lengths of the LOB columns in two different ways, once using the standard length() function (I could have used the dbms_lob.getlength() function) and once using the internal sys_op_opnsize() function that Oracle uses in its queries to gather table stats.

Here’s the output from the script. Since this first run is on 11.2.0.4 I want you to note, particularly, the numbers in the rows labelled 11.2.0.4

VERSION          N1 LENGTH(V1) LENGTH(C1) LENGTH(NC1) SYS_OP_OPNSIZE(V1) SYS_OP_OPNSIZE(C1) SYS_OP_OPNSIZE(NC1)
-------- ---------- ---------- ---------- ----------- ------------------ ------------------ -------------------
General           0                     0           0                                    96                  96

11.2.0.4          1       1951       1951        1951               1951               2048                3999
                  2       1952       1952        1952               1952               2049                  86

12.1.0.2          3       1951       1951        1951               1951               2048                3999
                  4       1952       1952        1952               1952               2049                  86

12.2.0.1          5       1937       1937        1937               1937               2034                3971
                  6       1938       1938        1938               1938               2035                3973

18.3.0.0          7       1984       1984        1984               1984               2081                  86
                  8       1985       1985        1985               1985               2082                  86

********                                                                 ------------------ -------------------
avg                                                                              1835.77778          1820.22222


AVG_ROW_LEN
-----------
       5410

COLUMN_NAME          AVG_COL_LEN
-------------------- -----------
N1                             3
VERSION                        9
V1                          1740
C1                          1837
NC1                         1822

The reminders:
  • A LOB that uses a multi-byte character set (and possibly, but I have not checked, an NCLOB that is using a single-byte character set) stores its content using a two-byte fixed width character set. So when I insert a character string of 1,951 (multibyte) characters the internal representation uses double the number of bytes as there are characters – so the data content will be 3,902 bytes.
  • The length() and dbms_lob.getlength() functions report the number of characters not the number of bytes. (There is a lengthb() function to report the number of bytes, but it doesn’t work with multibyte character set LOBs, raising error: “ORA-22998: CLOB or NCLOB in multibyte character set not supported”). Even though our NCLOB needs 3,902 bytes for its content it is reported – just like the CLOB – with a length of 1,951.
  • The sys_op_opnsize() internal function tells you about the number of bytes a column takes up in the row – though not always totally accurately, as we shall soon see – so it will tell you about the 3,902 bytes stored for the data plus the “structural” data that helps to describe the LOB. So for the row holding strings of length 1,951 we can see that the CLOB seems to take 1,951 + 97 = 2,048 bytes while the NCLOB seems to take 2 * 1951 + 97 = 3,999 bytes.
  • When the total storage for a LOB column exceeds 4,000 bytes the LOB is stored “out of line” even when it has been declared as mine have with “enable storage in row”. So when we insert strings of length 1,952 the CLOB column stays in line and reports a size of 1,952 + 97 = 2.049 bytes while the NCLOB exceeds the limit (2 + 1,952 + 97 = 4001) and goes out of row, reporting an “in-row” size of 114 bytes which is (we assume) the LOB metadata.

You might note, by the way, that the avg_col_len is the average (which I’ve reported for the CLOB and NCLOB columns) of the sys_op_opnsize() values rounded up plus 1 (for the “column count” byte); and – because the number of columns in the table is small – the avg_row_len is very similar to the sum of the avg_col_len.

You should also note that the storage size of an “empty” LOB is (or seems to be) 96 bytes in this version of Oracle. That’s quite a lot of space to say that there’s no data  – but using an empty_[n]clob() to represent “nothing” does mean that you don’t have to code for “is null or is empty” and it may help you to avoid incorrect results as a consequence. In fact a block dump shows that the actual space usage for the empty_[n]clob() is only 30 bytes – so it’s not quite as bad as it seems.

The warning

It’s a minor variation of the standard warning: “odd little things change when you upgrade and some of them may have a big side effect”. Here are the results from 18.3 (which, amongst other details, defaults to a multi-byte character set):

SQL> select * from v$nls_parameters where parameter like '%CHARACTERSET%';

PARAMETER                        VALUE
-------------------------------- --------------------
NLS_CHARACTERSET                 AL32UTF8
NLS_NCHAR_CHARACTERSET           AL16UTF16

This affects the storage of CLOBs if you’ve previously been using a default single-byte character set – you’re suddenly going to find your LOB segments are twice as big as they used to be – even though any report of “length()” will be unchanged. But there are other changes. Here’s the output from the same script running on 18.3.0.0:



VERSION          N1 LENGTH(V1) LENGTH(C1) LENGTH(NC1) SYS_OP_OPNSIZE(V1) SYS_OP_OPNSIZE(C1) SYS_OP_OPNSIZE(NC1)
-------- ---------- ---------- ---------- ----------- ------------------ ------------------ -------------------
General           0                     0           0                                    30                  30

11.2.0.4          1       1951       1951        1951               1951               3933                3933
                  2       1952       1952        1952               1952               3935                3935

12.1.0.2          3       1951       1951        1951               1951               3933                3933
                  4       1952       1952        1952               1952               3935                3935

12.2.0.1          5       1937       1937        1937               1937               3905                3905
                  6       1938       1938        1938               1938               3907                3907

18.3.0.0          7       1984       1984        1984               1984               3999                3999
                  8       1985       1985        1985               1985                132                 132
********                                                                 ------------------ -------------------
avg                                                                              3078.77778          3078.77778

AVG_ROW_LEN
-----------
       7912

COLUMN_NAME          AVG_COL_LEN
-------------------- -----------
N1                             3
VERSION                        9
V1                          1740
C1                          3080
NC1                         3080

Looking at the details for “General” and “18.3.0.0” what can we see that’s changed?

First, the length of an empty_[n]clob is now reported correctly at 30 bytes. This might (if the value is also used internally) explain why a LOB can now be 33 characters (66 bytes) longer before it’s moved out of line – it’s a small difference, but you might be at a boundary condition where it makes a big difference (for good, or for bad – who can tell) or your system.

Secondly, the LOB metadata for an out of line LOB seems to have jumped from 86 bytes to 132 bytes. Like the empty_[n]clob() issue, this is an error. The actual space usage in row was 38 bytes – consisting of the basic 30 bytes “empty” metadata, 4 extra bytes overhead, and a 4-byte block address linking to the stored LOB value.

Finally, you’re allowed to guess that the rows labelled 12.2.0.1 are there to give you a hint that in that version the NCLOB moves out of line at 1,938 characters – but you’ll probably want to run the test on 12.2 to confirm that claim. This does mean, though, that an upgrade to 12.2 from a lower version might be a bigger threat than any upgrade to 18.3. You’ll also find that in 12.2 the empty_[n]clob() is reported with a length of 124. (Again, this may explain the change in the break-point for going out of line: 124 – 96 = 28, and (1952 – 28/2) = 1,938. The arithmetic is consistent with the incorrect reporting of the size of the empty LOB metadata.

So the bottom line is this – if you do a lot of work with LOBs, do some careful checking of how much space you are using, how much space you will use on the next install, how changes in character sets can affect numbers, how the statistics might change – even when collected in exactly the same way – and what this might do to execution plans.

Footnote

This testing was all done using securefile LOBs – which has been the default for some time if you don’t specify the type of LOB you want to use. The numbers, particularly the break from in-row to out-of-row, are a little different if you switch to basicfile LOBs.

A quick test on LiveSQL (19.2) suggests that nothing has changed from 18.3 to 19.3

For further reading on quirky details of LOBs – here’s the article that has a footnote reporting another interesting change in LOB handling in 18.3; and a very old note about analysing the contents of a LOB segment with a view to assessing the impact of deleting old LOBs and potentially moving the LOB segment to a tablespace of a different size.

 

 

Parse Calls

Tue, 2019-04-23 12:31

When dealing with the library cache / shared pool it’s always worth checking from time to time to see if a new version of Oracle has changed any of the statistics you rely on as indicators of potential problems. Today is also (coincidentally) a day when comments about “parses” and “parse calls” entered my field of vision from two different directions. I’ve tweeted out references to a couple of quirkly little posts I did some years ago about counting parse calls and what a parse call may entail, but I thought I’d finish the day off with a little demo of what the session cursor cache does for you when your client code issues parse calls.

There are two bit of information I want to highlight – activity in the library cache and a number that shows up in the session statistics. Here’s the code to get things going:

rem
rem     Script:         12c_session_cursor_cache.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2019
rem
rem     Note:
rem     start_1.sql contains the one line
rem          select * from t1 where n1 = 0;
rem

create table t1 
as
select 99 n1 from dual
;

execute dbms_stats.gather_table_stats(user,'t1')

spool 12c_session_cursor_cache

prompt  =======================
prompt  No session cursor cache
prompt  =======================

alter session set session_cached_cursors = 0;

set serveroutput off
set feedback off

execute snap_libcache.start_snap
execute snap_my_stats.start_snap

execute snap_libcache.start_snap
execute snap_my_stats.start_snap

@start_1000

set feedback on
set serveroutput on

execute snap_my_stats.end_snap
execute snap_libcache.end_snap


prompt  ============================
prompt  Session cursor cache enabled
prompt  ============================


alter session set session_cached_cursors = 50;

set serveroutput off
set feedback off

execute snap_libcache.start_snap
execute snap_my_stats.start_snap

execute snap_libcache.start_snap
execute snap_my_stats.start_snap

@start_1000

set feedback on
set serveroutput on

execute snap_my_stats.end_snap
execute snap_libcache.end_snap

spool off

I’ve made use of a couple of little utilities I wrote years ago to take snapshots of my session statistics and the library cache (v$librarycache) stats. I’ve also used my “repetition” framework to execute a basic query 1,000 times. The statement is a simple “select from t1 where n1 = 0”, chosen to return no rows.

The purpose of the whole script is to show you the effect of running exactly the same SQL statement many times – first with the session cursor cache disabled (session_cached_cursors = 0) then with the cache enabled at its default size.

Here are some results from an instance of 12.2.0.1 – which I’ve edited down by eliminating most of the single-digit numbers.

=======================
No session cursor cache
=======================
---------------------------------
Session stats - 23-Apr 17:41:06
Interval:-  4 seconds
---------------------------------
Name                                                                         Value
----                                                                         -----
Requests to/from client                                                      1,002
opened cursors cumulative                                                    1,034
user calls                                                                   2,005
session logical reads                                                        9,115
non-idle wait count                                                          1,014
session uga memory                                                          65,488
db block gets                                                                2,007
db block gets from cache                                                     2,007
db block gets from cache (fastpath)                                          2,006
consistent gets                                                              7,108
consistent gets from cache                                                   7,108
consistent gets pin                                                          7,061
consistent gets pin (fastpath)                                               7,061
logical read bytes from cache                                           74,670,080
calls to kcmgcs                                                              5,005
calls to get snapshot scn: kcmgss                                            1,039
no work - consistent read gets                                               1,060
table scans (short tables)                                                   1,000
table scan rows gotten                                                       1,000
table scan disk non-IMC rows gotten                                          1,000
table scan blocks gotten                                                     1,000
buffer is pinned count                                                       2,000
buffer is not pinned count                                                   2,091
parse count (total)                                                          1,035
parse count (hard)                                                               8
execute count                                                                1,033
bytes sent via SQL*Net to client                                           338,878
bytes received via SQL*Net from client                                     380,923
SQL*Net roundtrips to/from client                                            1,003

PL/SQL procedure successfully completed.

---------------------------------
Library Cache - 23-Apr 17:41:06
Interval:-      4 seconds
---------------------------------
Type      Cache                           Gets        Hits Ratio        Pins        Hits Ratio   Invalid    Reload
----      -----                           ----        ---- -----        ----        ---- -----   -------    ------
NAMESPACE SQL AREA                       1,040       1,032   1.0       1,089       1,073   1.0         0         1
NAMESPACE TABLE/PROCEDURE                   17          16    .9         101          97   1.0         0         0
NAMESPACE BODY                               9           9   1.0          26          26   1.0         0         0
NAMESPACE SCHEDULER GLOBAL ATTRIBU          40          40   1.0          40          40   1.0         0         0

PL/SQL procedure successfully completed.

The thing to notice, of course, is the large number of statistics that are (close to) multiples of 1,000 – i.e. the number of executions of the SQL statement. In particular you can see the ~1,000 “parse count (total)” which is not reflected in the “parse count (hard)” because the statement only needed to be loaded into the library cache and optimized once.

The other notable statistics come from the library cache where we do 1,000 gets and pins on the “SQL AREA” – the “get” creates a “KGL Lock” (the “breakable parse lock”) that is made visible as an entry in v$open_cursor (x$kgllk), and the the “pin” created a “KGL Pin” that makes it impossible for anything to flush the child cursor from memory while we’re executing it.

So what changes when we enabled the session cursor cache:


============================
Session cursor cache enabled
============================

Session altered.

---------------------------------
Session stats - 23-Apr 17:41:09
Interval:-  3 seconds
---------------------------------
Name                                                                         Value
----                                                                         -----
Requests to/from client                                                      1,002
opened cursors cumulative                                                    1,004
user calls                                                                   2,005
session logical reads                                                        9,003
non-idle wait count                                                          1,013
db block gets                                                                2,000
db block gets from cache                                                     2,000
db block gets from cache (fastpath)                                          2,000
consistent gets                                                              7,003
consistent gets from cache                                                   7,003
consistent gets pin                                                          7,000
consistent gets pin (fastpath)                                               7,000
logical read bytes from cache                                           73,752,576
calls to kcmgcs                                                              5,002
calls to get snapshot scn: kcmgss                                            1,002
no work - consistent read gets                                               1,000
table scans (short tables)                                                   1,000
table scan rows gotten                                                       1,000
table scan disk non-IMC rows gotten                                          1,000
table scan blocks gotten                                                     1,000
session cursor cache hits                                                    1,000
session cursor cache count                                                       3
buffer is pinned count                                                       2,000
buffer is not pinned count                                                   2,002
parse count (total)                                                          1,002
execute count                                                                1,003
bytes sent via SQL*Net to client                                           338,878
bytes received via SQL*Net from client                                     380,923
SQL*Net roundtrips to/from client                                            1,003

PL/SQL procedure successfully completed.

---------------------------------
Library Cache - 23-Apr 17:41:09
Interval:-      3 seconds
---------------------------------
Type      Cache                           Gets        Hits Ratio        Pins        Hits Ratio   Invalid    Reload
----      -----                           ----        ---- -----        ----        ---- -----   -------    ------
NAMESPACE SQL AREA                           5           5   1.0       1,014       1,014   1.0         0         0
NAMESPACE TABLE/PROCEDURE                    7           7   1.0          31          31   1.0         0         0
NAMESPACE BODY                               6           6   1.0          19          19   1.0         0         0

PL/SQL procedure successfully completed.

The first thing to note is that “parse count (total)” still shows up 1,000 parse calls. However we also see the statistic “session cursor cache hits” at 1,000. Allowing for a little noise around the edges virtually every parse call has turned into a short-cut that takes us through the session cursor cache directly to the correct cursor.

This difference shows up in the library cache activity where we still see 1,000 pins – we have to pin the cursor to execute it – but we no longer see 1,000 “gets”. In the absence of the session cursor cache the session has to keep searching for the statement then creating and holding a KGL Lock while we execute the statement – but when the cache is enabled the session will very rapidly recognise that the statement is one we are likely to re-use, so it will continue to hold the KGL lock
after we have finished executing the statement and we can record the location of the KGL lock in a session state object. After the first couple of executions of the statement we no longer have to search for the statement and attach a spare lock to it, we can simply navigate from our session state object to the cursor.

As before, the KGL Lock will show up in v$open_cursor – though this time it will not disappear between executions of the statement. Over the history of Oracle versions the contents of v$open_cursor have become increasingly helpful, so I’ll just show you what the view held for my session by the end of the test:


SQL> select cursor_type, sql_text from V$open_cursor where sid = 250 order by cursor_type, sql_text;

CURSOR_TYPE                                                      SQL_TEXT
---------------------------------------------------------------- ------------------------------------------------------------
DICTIONARY LOOKUP CURSOR CACHED                                  BEGIN DBMS_OUTPUT.DISABLE; END;
DICTIONARY LOOKUP CURSOR CACHED                                  BEGIN snap_libcache.end_snap; END;
DICTIONARY LOOKUP CURSOR CACHED                                  BEGIN snap_my_stats.end_snap; END;
DICTIONARY LOOKUP CURSOR CACHED                                  SELECT DECODE('A','A','1','2') FROM SYS.DUAL
OPEN                                                             begin         dbms_application_info.set_module(
OPEN                                                             table_1_ff_2eb_0_0_0
OPEN-RECURSIVE                                                    SELECT VALUE$ FROM SYS.PROPS$ WHERE NAME = 'OGG_TRIGGER_OPT
OPEN-RECURSIVE                                                   select STAGING_LOG_OBJ# from sys.syncref$_table_info where t
OPEN-RECURSIVE                                                   update user$ set spare6=DECODE(to_char(:2, 'YYYY-MM-DD'), '0
PL/SQL CURSOR CACHED                                             SELECT INDX, KGLSTTYP LTYPE, KGLSTDSC NAME, KGLSTGET GETS, K
PL/SQL CURSOR CACHED                                             SELECT STATISTIC#, NAME, VALUE FROM V$MY_STATS WHERE VALUE !
SESSION CURSOR CACHED                                            BEGIN DBMS_OUTPUT.ENABLE(1000000); END;
SESSION CURSOR CACHED                                            BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END;
SESSION CURSOR CACHED                                            BEGIN snap_libcache.start_snap; END;
SESSION CURSOR CACHED                                            BEGIN snap_my_stats.start_snap; END;
SESSION CURSOR CACHED                                            select * from t1 where n1 = 0
SESSION CURSOR CACHED                                            select /*+ no_parallel */ spare4 from sys.optstat_hist_contr

17 rows selected.

The only one of specific interest is the penultimate one in the output – its type is “SESSION CURSOR CACHED” and we can recognise our “select from t1” statement.

In-table predicates

Fri, 2019-04-12 07:49

This note was prompted by a recent email asking about the optimizer’s method for estimating the selectivity of a predicate which compared two columns in the same table – for example:  “where orders.amount_invoiced = orders.amount_paid”. It’s been about 14 years since I wrote “Cost Based Oracle – Fundamentals” so my memory of what I wrote (and whether I even mentioned this case) was rather hazy, so I sent off a quick reply and decided to do a little checking.

It turned out that I’d already written a blog note with a throwaway comment about the estimates and a general workaround for optimizer problems caused by examples of this kind. The comment I made about the estimate was that the selectivity seems to be the smaller of the selectivities of (using the example above) “amount_paid = :unpeekable_bind” and “amount_invoice = :unpeekable_bind”. I’m fairly sure I’ve made similar comments several times in the past, but after replying to the email I started to wonder whether this would still be true if there were histograms on the columns. So I ran up a little test and here, to start things off, is the code to generate the data I used for testing:


rem
rem     Script:         column_equality_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2019
rem     Purpose:

create table t1(
        id      number(8,0),
        n1      number(6,0)
)
;

create table t2(
        id      number(8,0),
        n1      number(6,0)
)
;

create table t3(
        n1      number(6,0),
        n2      number(6,0),
        v1      varchar2(50)
)
;

execute dbms_random.seed(0)

insert into t1
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(10 * abs(dbms_random.normal))     n1
from
        generator       v1
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(10 * abs(dbms_random.normal))     n1
from
        generator       v1
;

insert into t3 (n1, n2, v1)
select
        t1.n1,
        t2.n1,
        rpad(rownum,50)
from
        t1, t2
where
        t1.id = t2.id
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T3',
                method_opt  => 'for all columns size 254 for columns v1 size 1'
        );
end;
/

select
        table_name, column_name, num_distinct, density, histogram,
        low_value, high_value
from
        user_tab_cols
where
        table_name in ('T1','T2','T3')
and     column_name in ('N1','N2')
order by
        table_name, column_name
;


TABLE_NAME      COLUMN_NAME     NUM_DISTINCT    DENSITY HISTOGRAM       LOW_VALUE  HIGH_VALUE
--------------- --------------- ------------ ---------- --------------- ---------- ----------
T1              N1                        38     .00005 FREQUENCY       80         C128

T2              N1                        38     .00005 FREQUENCY       80         C126

T3              N1                        38     .00005 FREQUENCY       80         C128
                N2                        38     .00005 FREQUENCY       80         C126


I’ve created two sets of 10,000 rows each of normally distributed data – but taken the absolute values so I’ve only got half the bell curve, and I’ve scaled up by a factor of 10 and truncated. This has given me two similar but slightly different sets of values which happen to cover 38 distinct values each.

I’ve then generated my test set by joining these two tables on the unique (though not declared as such) id column to give a table with the same number of rows and two skewed sets of data. The calls to dbms_stats create histograms on the skewed data sets, and I’ve reported a few significant numbers about the 4 relevant columns.

Looking at the column statistics we have num_distinct = 38 across the board – so my observation from paragraph 2 above would tend to suggest that the optimizer would report 10,000/38 = 263 as the cardinality estimate for the predciate “t3.n1 = t3.n2” (I’m fairly confident that in this case 1/num_distinct will be preferred over using the density from user_tab_cols). But here’s what we get from a call to explain plan:


explain plan for
select
        v1
from
        t3
where
        n1 = n2
;

select * from table(dbms_xplan.display);

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   564 | 32148 |    18   (6)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T3   |   564 | 32148 |    18   (6)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"="N2")

The estimate is 564 – which is a pretty good estimate in this case (the actual result was 552) as the two columns were randomly generated and there’s no correlation between them. Unfortunately this is quite a long way of my assumption of 263, so where did the optimizer get that number from?

Here’s a query (with result set) that you may recognise from an earlier post.


break on report skip 1
compute count of value on report
compute sum of t1_frequency on report
compute sum of t2_frequency on report
compute sum of product on report

column product format 999,999,999

with f1 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T3'
and     column_name = 'N1'
),
f2 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T3'
and     column_name = 'N2'
)
select
        f1.value,
        f1.frequency    t1_frequency,
        f2.frequency    t2_frequency,
        f1.frequency * f2.frequency product
from
        f1, f2
where
        f2.value = f1.value
order by
        f1.value
;



     VALUE T1_FREQUENCY T2_FREQUENCY      PRODUCT
---------- ------------ ------------ ------------
         0          777          768      596,736
         1          806          753      606,918
         2          794          779      618,526
         3          808          763      616,504
         4          752          749      563,248
         5          627          729      457,083
         6          623          628      391,244
         7          584          616      359,744
         8          544          597      324,768
         9          512          546      279,552
        10          441          439      193,599
        11          409          342      139,878
        12          345          370      127,650
        13          318          300       95,400
        14          257          282       72,474
        15          244          242       59,048
        16          214          206       44,084
        17          172          193       33,196
        18          161          140       22,540
        19          113          114       12,882
        20          108           93       10,044
        21           95           81        7,695
        22           72           55        3,960
        23           54           56        3,024
        24           43           36        1,548
        25           38           31        1,178
        26           23           18          414
        27           18           23          414
        28            7           14           98
        29            9           13          117
        30           14           11          154
        31            4            2            8
        32            5            3           15
        33            1            3            3
        35            4            1            4
        37            2            2            4
---------- ------------ ------------ ------------
        36
                   9998         9998    5,643,754


I’m querying the histoggram information for the two columns, and where t3.n1 and t3.n2 have a value in common I’ve reported the two frequencies for that value and the product of the frequencies. For convenience I’ve included a count and a couple of sums to show that there isn’t a perfect match in the set of values for the two columns. The most important number at the bottom of the page, though, is the sum of the products of frequencies of common values. Take that value and divide by 10,000 and you get 564.3754 – compare that with the cardinality estimate of the predicate “t3.n1 = t3.n2”, it’s a perfect match (allowing for rounding).

The query against user_tab_histograms is the query I used to calculate the cardinality of a join where there were frequency histograms on the columns at both ends of the join. The optimizer’s estimate for “intra-table” predicates is consistent with its estimate for joins (in the special cases of “no histograms” and “two frequency histograms”, at least). Viewing it from a slightly different angle: the selectivity of the predicate “n1 = n2” can be derived as “the cardinality estimate for joining t3 to itself” divided by “the cardinality of the cartesian join” (the latter being num_rows * num_rows, of course).

Just as a closing demo – lets generate a plan for the appropriate self-join of t3 and check the cardinality estimate:


explain plan for
select
        t3a.v1, t3b.v1
from
        t3 t3a, t3 t3b
where
        t3a.n2 = t3b.n1
;

select * from table(dbms_xplan.display);


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  5643K|   581M|   138  (83)| 00:00:01 |
|*  1 |  HASH JOIN         |      |  5643K|   581M|   138  (83)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T3   | 10000 |   527K|    13   (8)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T3   | 10000 |   527K|    13   (8)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3A"."N2"="T3B"."N1")


As expected the (rounded) join cardinality is reported as 5,643K.

So the selectivity of the single table predicate “n1 = n2” will be (5,643,000 / (10,000 * 10,000) = 0.05643 and the cardinality estimate of the single table query will be 10,000 * 0.05643 = 564.3 QED.

I haven’t tested any other variations of types of histogram, degree of overlap of value ranges, etc. but I suspect that the general principle is probably going to give the selectivity as (or with the appearance of): “estimated cardinality of self-join” / “square of num_rows (allowing for nulls)”.

 

Describe Upgrade

Mon, 2019-04-08 05:02

Here’s an odd little change between Oracle versions that could have a stunning impact on the application performance if the thing that generates your client code happens to use an unlucky selection of constructs.  It’s possible to demonstrate the effect remarkably easily – you just have to describe a table, doing it lots of times to make it easy to see what’s happening.


create table t1 as select * from all_objects
where rownum = 1
;

set heading off
set feedback off
set pagesize 0
set linesize 156
set trimspool on
set termout off
set serveroutput on

execute snap_rowcache.start_snap
execute snap_libcache.start_snap

start start_10000
-- start_1 contains "describe t1"

set termout on
set serveroutput on

spool temp
execute snap_rowcache.end_snap
execute snap_libcache.end_snap

spool off

The start_10000 script is my mechanism for running a simple piece of code many times, and as the comment following it says, all I’m doing is repeating “describe t1”. The calls to “snap” something are examples of procedures I use to find the changes recorded in various dynamic performance views over short periods of time (there’s an example of the code for v$mystat here) In this case, as the names suggest, the snapshots record the changes in v$rowcache (the dictionary cache) and v$librarycache (the library cache). I’ve put a simple variant of the code at the end of the blog note so you don’t have to do all the preparation if you want to run a quick test for yourself.

Here are the results I get when running the test in Oracle 18.3.0.0

---------------------------------
Dictionary Cache - 05-Apr 19:00:00
Interval:-      27 seconds
---------------------------------
Parameter                 Sub# Usage Fixed    Gets  Misses   Scans  Misses    Comp    Mods Flushes
---------                ----- ----- -----    ----  ------   -----  --------------    ---- -------
dc_objects                         0     0 260,051       0       0       0       0       0       0
dc_users                           0     0 260,000       0       0       0       0       0       0
---------------------------------
Library Cache - 05-Apr 19:00:00
Interval:-      27 seconds
---------------------------------
Type      Description              Gets        Hits Ratio        Pins        Hits Ratio   Invalid    Reload
-----     -----                    ----        ---- -----        ----        ---- -----   -------    ------
NAMESPACE TABLE/PROCEDURE        10,003      10,003   1.0     280,028     280,028   1.0         0         0

Before showing you corresponding figures from 12.2.0.1 I’ll just point out that in version 18.3.0.0 of Oracle the structure of view all_objects gives me a table of 26 columns. Think about that and the 10,000 describes while looking at the number above, then look at the corresponding 12.2.0.1 results:

---------------------------------
Dictionary Cache - 05-Apr 19:00:00 
Interval:-      28 seconds
---------------------------------
Parameter                 Usage Fixed    Gets  Misses   Scans  Misses    Comp    Mods Flushes
---------                 ----- -----    ----  ------   -----  --------------    ---- -------
dc_users                      0     0       2       0       0       0       0       0       0
dc_objects                    0     0       9       0       0       0       0       0       0

---------------------------------
Library Cache - 05-Apr 19:04:17
Interval:-      28 seconds
---------------------------------
Type      Description              Gets        Hits Ratio        Pins        Hits Ratio   Invalid    Reload
-----     -----                    ----        ---- -----        ----        ---- -----   -------    ------
NAMESPACE TABLE/PROCEDURE        10,005      10,005   1.0      20,018      20,018   1.0         0         0

The internal mechanism of the “describe” call has changed between 12.2.0.1 to 18.3.0.0.

For each describe in 18.3, for each column in the table you see a “get” on dc_users and dc_objects in v$rowcache and you see one “get” on the TABLE/PROCEDURE namespace in v$librarycache, and (2 + number of columns) “pins”. In 12.2.0.1 there are no gets on the dictionary cache and only 1 get and two pins in the library cache for each describe.

As a couple of extra checks I modified the test to query from a 12c client to and 18c server, then from an 18c client to a 12c server. The big numbers appeared in the former test (i.e. when the server is 18c) and the small number for the latter (when the server is 12c). I also tried a couple of other variations on the theme:

  • If the table t1 doesn’t exist when I run the test then there are no gets on the row cache, and I see 2 gets and pins (with hits) on the library cache per describe.
  • If I run the test using “decribe other_schema.t1 the pins (and hits) on the library cache go up by 1 per describe
  • If I execute “alter session set current_schema = other_schema” so that “describe t1” is actually describing a table in another schema the pins (and hits) on the library cache go up by 1 per describe
  • If I run the test in the SYS schema, 18c behaves like 12c !! But SYS is often a little wierd compared to other schemas

Enabling the 10051 trace – I can see that both versions report an OPI call type = 119: “V8 Describe Any” for each call to “describe”. And that’s really where this blog started, and why lots of people might need to be aware(at least in the short term) of this change in behaviour across versions .

Welcome to the Real World.

My demonstration is clearly silly – no-one does hundreds of describes per second in a real application, surely. Well, not deliberately, and not necessarily with the intent to do a full describe, but sometimes n-tier development environments end up generating code that does things you might not expect. One such example is the way that JDBC can handle a requirement of the form:

insert into tableX( id, ......) values (my_sequence.nextval, ...) returning id into ?;

In the course of handling this requirement one of the coding strategies available to JDBC ends up executing the type 119 “V8 Describe Any” call. Imagine the effect this has when you have a couple of dozen concurrent sessions totalling a couple of million single row inserts per hour. The competition for library cache pins and row cache gets is massive – and the memory structures involved are all protected by mutexes. So when a a client of mine recently upgraded their system from 11.2.0.4 to 18.3.0.0 they saw “library cache: mutex X” waits change from a few hundred seconds per hour to tens of thousands of seconds, and “row cache mutex” leaping up  from nowhere in the “Top timed events” to reporting further even more thousands of seconds of wait time per hour.

The actual impact of this issue will depend very strongly on how much use you (or your ORM) makes of this construct. The problem may be particularly bad for my client because of the very large number of concurrent executions of a very small number of distinct statements that all address the same table. For low concurrency, or for a wide range of different tables and statements, you may not see so much contention.

If you are seeing contention for “row cache mutex” and “library cache: mutex X”, then a quick corroborative test (if you are licensed for the performance and dianostic packs) is to check the top_level_call# and top_level_call_name from v$active_session_history:

select
        top_level_call#, top_level_call_name, count(*)
from    v$active_session_history
group by
        top_level_call#, top_level_call_name
order by
        count(*)

If (119, ‘V8 Describe Any’) shows up as a significant fraction of the total then you’re probably looking at this issue.

Java is not my strong point – but here’s a trivial piece of standalone Java that you can use to demonstrate the issue if you’re familiar with running Java on the server. There are a few notes inline to explain necessary preparatory steps and code changes:


/*
        To create a class file, you need to execute
        javac temptest2.java

        This generates file temptest2.class
        If this is successful then execute
        java temptest {number of iterations} {commit frequency}

        e.g.
        java temptest2 10000 10

        To be able to compile, you need a CLASSPATH environment variable
        e.g. export CLASSPATH=$CLASSPATH:$ORACLE_HOME/jdbc/lib/ojdbc8.jar

        For java to see the created class the CLASSPATH must also include 
        the holding directory
        e.g. export CLASSPATH=$CLASSPATH:/mnt/working

        Example combined setting:
        export CLASSPATH=$CLASSPATH:$ORACLE_HOME/jdbc/lib/ojdbc8.jar:/mnt/working

        A schema will need to be created to hold two objects,
        And the connection string in the code below will need to be modified -
        and the listener has to be started and the database registered to it.

        Database objects:
        -----------------
        create sequence s1;

        create table test(
                id number, 
                code varchar2(32), 
                descr varchar2(32), 
                insert_user varchar2(32),
                insert_date date
        );

*/


import java.sql.*;
import oracle.jdbc.OracleDriver;
import java.util.Date;

public class temptest2
{
  public static void main (String arr[]) throws Exception
  {
    DriverManager.registerDriver(new oracle.jdbc.OracleDriver());
    Connection con = DriverManager.getConnection
          ("jdbc:oracle:thin:@localhost:1521:or18","test_user","test");

    Integer iters = new Integer(arr[0]);
    Integer commitCnt = new Integer(arr[1]);

    con.setAutoCommit(false);
    doInserts( con, iters.intValue(), commitCnt.intValue() );

    con.commit();
    con.close();
  }

  static void doInserts(Connection con, int count, int commitCount )
  throws Exception
  {

    int  rowcnt = 0;
    int  committed = 0;
    long start = new Date().getTime();

    for (int i = 0; i < count; i++ ) {
      PreparedStatement ps = con.prepareStatement(
           "insert into test (id, code, descr, insert_user, insert_date) " +
                     "values (s1.nextval,?,?, user, sysdate)",
           new String[]{"id"}
      );
      ps.setString(1,"PS - code" + i);
      ps.setString(2,"PS - desc" + i);
      ps.executeUpdate();

      ResultSet rs = ps.getGeneratedKeys();
      int x = rs.next() ? rs.getInt(1) : 0;
      System.out.println(x);
                
      rowcnt++;
      ps.close();

      if ( rowcnt == commitCount )
      {
        con.commit();
        rowcnt = 0;
        committed++;
      }
    }
    long end = new Date().getTime();
    con.commit();
    System.out.println
    ("pstatement " + count + " times in " + (end - start) + " milli seconds committed = "+committed);
  }
}

/*
 *
 * Sample from trace file after setting events 10046 and 10051:
 *
 * OPI CALL: type=119 argc= 7 cursor=  0 name=V8 Describe Any
 * OPI CALL: type=94 argc=38 cursor=  0 name=V8 Bundled Exec
 * PARSE #140693461998224:c=0,e=15,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3884345238,tim=1368184246
 * EXEC #140693461998224:c=0,e=135,p=0,cr=0,cu=4,mis=0,r=1,dep=0,og=1,plh=3884345238,tim=1368184411
 * OPI CALL: type=105 argc= 2 cursor=  0 name=Cursor close all
 * CLOSE #140693461998224:c=0,e=15,dep=0,type=3,tim=1368185231
 * OPI CALL: type=14 argc= 0 cursor=  0 name=COMMIT
 * XCTEND rlbk=0, rd_only=0, tim=1368185291
 * OPI CALL: type=119 argc= 7 cursor=  0 name=V8 Describe Any
 * OPI CALL: type=94 argc=38 cursor=  0 name=V8 Bundled Exec
 * PARSE #140693461998224:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3884345238,tim=1368187929
 * EXEC #140693461998224:c=0,e=162,p=0,cr=0,cu=4,mis=0,r=1,dep=0,og=1,plh=3884345238,tim=1368188141
 * OPI CALL: type=105 argc= 2 cursor=  0 name=Cursor close all
 * CLOSE #140693461998224:c=0,e=6,dep=0,type=3,tim=1368189336
 * OPI CALL: type=14 argc= 0 cursor=  0 name=COMMIT
 * XCTEND rlbk=0, rd_only=0, tim=1368189373
 *
*/


You’ll notice that I’ve prepared, executed and closed a statement inside a loop. The problem wouldn’t happen if I prepared the statement before the loop and closed it after the loop, doing nothing but the execute inside the loop; but the code is simply modelling the “single row processing” effect that typically appears through ORMs.

You’ll have to decide for yourself how to take snapshots of the dynamic performance views while this code is running, and how to emable tracing – but anyone who want to fiddle with the code is probably better at coding Java than I am – so it’s left as an exercise to the reader (I used a logon trigger for the traces, and snap_rowcache and snap_libcache from another session).

There is a certain cruel irony to this issue.  For years I have been telling people that

    insert into my_table(id, ...) values(my_sequence.nextval,...) returning id into :bind1;

is more efficient than:

    select my_sequence.nextval into :bind1 from dual;
    insert into my_table(id,.....) values(:bind1, ...);

If, at present, you’re using  Hibernate as your ORM it generates code that does the (inefficient, bad practice) latter and you won’t see the “describe” problem.

Footnote

If you want a very simple SQL*Plus script to see the effect – and have privileges to query v$rowcache and v$library_cache – here’s a hundred describes with a little wrapper to show the effect:

em
rem     The schema running this script must not be SYS
rem     but must be granted select on v_$rowcache and
rem     v_$librarycache. For the results to be clearly
rem     visible the test needs to be run while virtually
rem     nothing else is running on the instance.
rem
rem     In 18.3.0.0 every describe seems to access 
rem     dc_users, dc_objects, and pin the library cache
rem     once for every column in the table described
rem     (plus a fixed "overhead" of 2 pins and one get)
rem
rem     When run by SYS the counts fall back to the
rem     12.2  numbers -i.e. only two pins and one get
rem     on the libraray cache with no accesses to the 
rem     dictionary cache
rem
rem     The excess gets and pins disappear in 19.2, 
rem     thought the pin count on the library cache 
rem     goes up to 4 per describe.
rem

drop table t1 purge;
create table t1(
        n1 number,
        n2 number,
        n3 number,
        n4 number,
        n5 number,
        n6 number,
        n7 number
)
;


-- variant create table t1 as select * from all_objects where rownum = 1;


set serveroutput off
set linesize 167
set trimspool on


spool temp_desc
select namespace, gets, pins from v$librarycache where namespace = 'TABLE/PROCEDURE';
select parameter, gets from v$rowcache where parameter in ('dc_users','dc_objects') and subordinate# is null;
spool off

set termout off

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1
describe t1

set termout on
set serveroutput on

spool temp_desc append

select namespace, gets, pins from v$librarycache where namespace = 'TABLE/PROCEDURE';
select parameter, gets from v$rowcache where parameter in ('dc_users','dc_objects') and subordinate# is null;

spool off



set doc off
doc

Sample output from 18.3
=======================
NAMESPACE                                                              GETS       PINS
---------------------------------------------------------------- ---------- ----------
TABLE/PROCEDURE                                                       27449      71108


PARAMETER                              GETS
-------------------------------- ----------
dc_users                              17341
dc_objects                           115830


NAMESPACE                                                              GETS       PINS
---------------------------------------------------------------- ---------- ----------
TABLE/PROCEDURE                                                       27555      72017


PARAMETER                              GETS
-------------------------------- ----------
dc_users                              18041
dc_objects                           116533


Note change in rowcache gets - one per column per describe on each parameter.
Note change in library cache pins - (one per column + 2) per describe.

Sample output from 12.2
=======================
NAMESPACE                  GETS       PINS
-------------------- ---------- ----------
TABLE/PROCEDURE           13393      20318


PARAMETER                              GETS
-------------------------------- ----------
dc_users                               4889
dc_objects                            31413


NAMESPACE                  GETS       PINS
-------------------- ---------- ----------
TABLE/PROCEDURE           13504      20539


PARAMETER                              GETS
-------------------------------- ----------
dc_users                               4889
dc_objects                            31416


No change in v$rowcache
Only the same single get and the "+2" pins per describe in v$librarycache

#


The good news is that I sent this text to a colleague who has access to Oracle 19, and the problem goes away (almost completley) – there are just two extra pins on the library cache in Oracle 19 compared to Oracle 12, and no gets on the rowcache dc_users and dc_objects. This suggests that it’s a known issue (even though there’s visible bug report, and the problem is still present in 18.5) so it may be possible to get a backport of the version 19 code for vesion 18 fairly quickly. If not the best temporary workaround is probably to bypass the ORM and manually code for a function call that processes an anonymous PL/SQL block – but I haven’t tested this yet.

There is a JDBC cursor cache available – and if this were enabled than the prepared statement that was closed by the code would be kept open by the JDBC cache and Oracle would receive the “describe” call. Unfortunately it seems that there’s a cursor leak (still) in JDBC caching algorithm that will lead to sessions hitting Oracle error “ORA-01000: maximum open cursors exceeded.”

Acknowledgements.

I’d particularly like to thank Lasse Jenssen who spent a couple of hours with me (when he could have been attending some interesting sessions) at the OUG Ireland conference a few days ago, working through various tests and strategies to pin down the problem and attempt to circumvent it. (Any criticism of the Java code above should, nevertheless be aimed at me).

 

Stats advisor

Fri, 2019-03-22 08:10

This is just a little shout-out about the Stats Advisor – if you decide to give it a go, what sort of things is it likely to tell you. The answer is in a dynamic performance view called v$stats_advisor_rules – which I’ve list below from an instance running 18.3.0.0.


SQL> set linesize 180
SQL> set trimspool on
SQL> set pagesize 40
SQL> column description format a75
SQL> column name format a32
SQL> break on rule_type duplicate skip 1
SQL> select * from v$stats_advisor_rules;

  RULE_ID NAME                             RULE_TYPE DESCRIPTION                                                                     CON_ID
---------- -------------------------------- --------- --------------------------------------------------------------------------- ----------
         0                                  SYSTEM                                                                                         0
         1 UseAutoJob                       SYSTEM    Use Auto Job for Statistics Collection                                               0
         2 CompleteAutoJob                  SYSTEM    Auto Statistics Gather Job should complete successfully                              0
         3 MaintainStatsHistory             SYSTEM    Maintain Statistics History                                                          0
         4 UseConcurrent                    SYSTEM    Use Concurrent preference for Statistics Collection                                  0
         5 UseDefaultPreference             SYSTEM    Use Default Preference for Stats Collection                                          0
         6 TurnOnSQLPlanDirective           SYSTEM    SQL Plan Directives should not be disabled                                           0

         7 AvoidSetProcedures               OPERATION Avoid Set Statistics Procedures                                                      0
         8 UseDefaultParams                 OPERATION Use Default Parameters in Statistics Collection Procedures                           0
         9 UseGatherSchemaStats             OPERATION Use gather_schema_stats procedure                                                    0
        10 AvoidInefficientStatsOprSeq      OPERATION Avoid inefficient statistics operation sequences                                     0

        11 AvoidUnnecessaryStatsCollection  OBJECT    Avoid unnecessary statistics collection                                              0
        12 AvoidStaleStats                  OBJECT    Avoid objects with stale or no statistics                                            0
        13 GatherStatsAfterBulkDML          OBJECT    Do not gather statistics right before bulk DML                                       0
        14 LockVolatileTable                OBJECT    Statistics for objects with volatile data should be locked                           0
        15 UnlockNonVolatileTable           OBJECT    Statistics for objects with non-volatile should not be locked                        0
        16 MaintainStatsConsistency         OBJECT    Statistics of dependent objects should be consistent                                 0
        17 AvoidDropRecreate                OBJECT    Avoid drop and recreate object seqauences                                            0
        18 UseIncremental                   OBJECT    Statistics should be maintained incrementally when it is beneficial                  0
        19 NotUseIncremental                OBJECT    Statistics should not be maintained incrementally when it is not beneficial          0
        20 AvoidOutOfRange                  OBJECT    Avoid Out of Range Histogram endpoints                                               0
        21 UseAutoDegree                    OBJECT    Use Auto Degree for statistics collection                                            0
        22 UseDefaultObjectPreference       OBJECT    Use Default Object Preference for statistics collection                              0
        23 AvoidAnalyzeTable                OBJECT    Avoid using analyze table commands for statistics collection                         0

24 rows selected.

As you can see the rules fall into three groups: system, operation, and object – and you can’t help noticing at all three levels how commonly the theme is: “just stick with the defaults!”.

As so often happens when I start writing a catch-up or “remind myself” not I found that Tim Hall has already written all about it.

Lost time

Thu, 2019-03-21 07:51

Here’s a little puzzle that came up in the ODC database forum yesterday – I’ve got a query that has been captured by SQL Monitor, and it’s taking much longer to run than it should but the monitoring report isn’t telling me what I need to know about the time.

Here’s a little model to demonstrate the problem – I’m going to join a table to itself (the self join isn’t a necessary feature of the demonstration, I’ve just been a bit lazy in preparing data). Here’s a (competely truthful) description of the table:

SQL> desc t1
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 ID                                     NUMBER
 MOD_1000                               NUMBER
 V1                                     VARCHAR2(40)
 V2                                     VARCHAR2(40)
 PADDING                                VARCHAR2(100)

SQL> select num_rows, blocks from user_tables where table_name = 'T1';

  NUM_ROWS     BLOCKS
---------- ----------
    400000       7798

1 row selected.


And here’s the output I get from dbms_monitor.report_sql_monitor() for the query running on 18.3.0.0


Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  TEST_USER (261:36685)
 SQL ID              :  g6j671u7zc9mn
 SQL Execution ID    :  16777218
 Execution Started   :  03/21/2019 08:54:56
 First Refresh Time  :  03/21/2019 08:54:56
 Last Refresh Time   :  03/21/2019 08:55:17
 Duration            :  21s
 Module/Action       :  MyModule/MyAction
 Service             :  SYS$USERS
 Program             :  sqlplus@linux183.localdomain (TNS V1-V3)
 Fetch Calls         :  2

Global Stats
==========================================================================
| Elapsed |   Cpu   |    IO    | PL/SQL  | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Time(s) | Calls |  Gets  | Reqs | Bytes |
==========================================================================
|      21 |      21 |     0.04 |    0.07 |     2 |     3M |  215 | 180MB |
==========================================================================

SQL Plan Monitoring Details (Plan Hash Value=83896840)
==================================================================================================================================================
| Id |       Operation       | Name |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Mem  | Activity | Activity Detail |
|    |                       |      | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | (Max) |   (%)    |   (# samples)   |
==================================================================================================================================================
|  0 | SELECT STATEMENT      |      |         |      |        21 |     +1 |     1 |        1 |      |       |     . |          |                 |
|  1 |   FILTER              |      |         |      |        21 |     +1 |     1 |        1 |      |       |     . |          |                 |
|  2 |    HASH JOIN OUTER    |      |     400 | 2014 |        21 |     +1 |     1 |      400 |      |       |   2MB |          |                 |
|  3 |     TABLE ACCESS FULL | T1   |     400 | 1005 |         1 |     +1 |     1 |      400 |   62 |  60MB |     . |          |                 |
|  4 |     TABLE ACCESS FULL | T1   |    400K | 1001 |        21 |     +1 |     1 |     400K |      |       |     . |          |                 |
==================================================================================================================================================

As you can see the total elapsed time 21 seconds of which the CPU time is the whole 21 seconds.

This seems a little large for a simple hash join so we should look at the “Activity Detail” section of the plan because that will report any ASH samples that can be found for this execution of this query. And that’s where the problem lies: despite using 21 CPU seconds in 21 seconds there are no ASH samples for the execution! Moreover – looking for other numerical oddities – the plan says we read 60MB in 62 read requests (that’s the first tablescan of of the 7,798 blocks of t1), but the summary says we read 180MB – where did we lose (or gain) 120MB ?

It might help to see the query (which is why I didn’t show it in the output above) and it might help to see the predicate section (which SQL Monitor doesn’t report). So here’s the query and its plan, with the resulting predicate section, pulled from memory:


SQL_ID  g6j671u7zc9mn, child number 0
-------------------------------------
select  /*+ monitor */  t1a.mod_1000, t1b.mod_1000 from  t1 t1a,  t1
t1b where  t1a.mod_1000 = 500 and t1b.id(+) = t1a.id and
nvl(t1b.mod_1000,0) + f2(t1a.id) + f3(t1a.id) > 0

Plan hash value: 83896840

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |  2014 (100)|          |
|*  1 |  FILTER             |      |       |       |            |          |
|*  2 |   HASH JOIN OUTER   |      |   400 |  7200 |  2014   (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| T1   |   400 |  3600 |  1005   (3)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| T1   |   400K|  3515K|  1001   (3)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NVL("T1B"."MOD_1000",0)+"F2"("T1A"."ID")+"F3"("T1A"."ID")>0)
   2 - access("T1B"."ID"="T1A"."ID")
   3 - filter("T1A"."MOD_1000"=500)


Notice the two function calls that are part of the FILTER operation. That’s where the time is going – but I don’t think you can infer that from the SQL Monitor report (which rather suggests that the problem might be with the second full tablescan of t1 at operation 4).

Functions f2() and f3() both do a table scan of another table that is as large as t1 – and it looks as if they’re both going to be called 400 times, which is a lot of work. The 120MB of “lost” reads is the first table scan of each of the two tables, the 3M buffer gets (in the summary, if you didn’t notice it) is from the repeated tablescans as each row comes out of the hash join and the filter is applied.

Why doesn’t the report show us the CPU (and disk) samples? Because it’s querying ASH (v$active_session_history) by the SQL_ID of the principle SQL statement – and the work is being done by two other statements with different SQL_IDs.

Since it’s PL/SQL functions doing all the work why isn’t the 21 CPU seconds showing up in the “PL/SQL time(s)” summary figure? Because the time isn’t being spent in PL/SQL, it’s being spent in the SQL being run by the PL/SQL.

Just to finish off, let’s try to find the SQL being run by the PL/SQL. I’ll post the script to recreate the example at the end of the posting but for the moment I’ll just show you the query against v$sql that I ran to extract the SQL that’s embedded in the functions:


select  sql_id, executions, disk_reads, buffer_gets, sql_text
from    V$sql 
where   sql_text like 'SELECT%T2%' 
or      sql_text like 'SELECT%T3%'
;

SQL_ID        EXECUTIONS DISK_READS BUFFER_GETS SQL_TEXT
------------- ---------- ---------- ----------- --------------------------------------------------
12ytf1rry45d9        400       7683     3072817 SELECT MAX(ID) FROM T2 WHERE MOD_1000 = :B1
85dmsgqg3bh4w          1       7680        7698 SELECT MAX(ID) FROM T3 WHERE MOD_1000 = :B1

As you can see, there are roughly 120MB of disk I/O and 3M buffer gets due to these two statement – and one of them has run the 400 times we expected. It looks as if Oracle has done a cute little optimisation with the other function, though. If you look at the predicate it says:

  • NVL(“T1B”.”MOD_1000″,0)+”F2″(“T1A”.”ID”)+”F3″(“T1A”.”ID”)>0

It looks as if Oracle has either cached the result of the f3() function call (in which case why didn’t it also cache the f2() result) or it’s stopped evaluating the predicate as soon as the running total exceeded zero (but that might be a problem since f3() could return a negative number !). I don’t know exactly why Oracle is doing what it’s doing – but if I reverse the order of the functions in the predicates the f3() query will run 400 times and the f2() query will run once.

Appendix

If you want to re-run some tests here’s the code I used to create the demo:


create table t1
nologging
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum-1,1000)              mod_1000,
        lpad(rownum,10,'0')             v1,
        lpad(rownum,10,'0')             v2,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 4e5 -- > comment to avoid WordPress format issue
;

-- alter table t1 add constraint t1_pk primary key(id);

create table t2 nologging as select * from t1;
create table t3 nologging as select * from t1;

create function f2 (i_in number) return number
as
        m_ret number;
begin
        select max(id)
        into    m_ret
        from    t2
        where   mod_1000 = i_in
        ;

        return m_ret;
end;
/

create function f3 (i_in number) return number
as
        m_ret number;
begin
        select  max(id)
        into    m_ret
        from    t3
        where   mod_1000 = i_in
        ;

        return m_ret;
end;
/


begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T3',
                method_opt  => 'for all columns size 1'
        );
end;
/

alter system flush buffer_cache;

variable b1 number
variable b2 number

set timing on

select
        /*+ monitor */
        t1a.mod_1000, t1b.mod_1000
from
        t1      t1a,
        t1      t1b
where
        t1a.mod_1000 = 500
and     t1b.id(+) = t1a.id
and     nvl(t1b.mod_1000,0) + f3(t1a.id) + f2(t1a.id) > 0
;


One interesting little detail, if you try the tests, is that the join has to be an outer join for the FILTER operation to appear.

IM_DOMAIN$

Tue, 2019-03-19 07:05

A few months ago Franck Pachot wrote about a recursive SQL statement that kept appearing in the library cache. I discovered the note today because I had just found a client site where the following statement suddenly appeared near the top of the “SQL ordered by Executions” section of their AWR reports after they had upgraded to 18c.


select domain# from sys.im_domain$ where objn = :1 and col# = :2

I found Franck’s article by the simple expedient of typing the entire query into a Google search – his note was the first hit on the list, and he had a convenient example (based on the SCOTT schema) to demonstrate the effect, so I built the tables from the schema and ran a simple test with extended SQL tracing (event 10046) enabled.

Here’s an extract (with no deletions) from the resulting trace file:

PARSING IN CURSOR #139819795591784 len=110 dep=0 uid=104 oct=3 lid=104 tim=31306461773 hv=3572295767 ad='6bf8b8a0' sqlid='8n2bcc3aftu2r'
select /*+ leading(EMP DEPT) USE_HASH(DEPT) USE_HASH(BONUS) */ * from DEPT natural join EMP natural join BONUS
END OF STMT
PARSE #139819795591784:c=59,e=59,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4262704031,tim=31306461772

PARSING IN CURSOR #139819795585328 len=64 dep=1 uid=0 oct=3 lid=0 tim=31306461966 hv=1240924087 ad='69a8b760' sqlid='0b639nx4zdzxr'
select domain# from sys.im_domain$ where objn = :1 and col# = :2
END OF STMT
PARSE #139819795585328:c=37,e=37,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=2321277860,tim=31306461965
EXEC #139819795585328:c=32,e=31,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=2321277860,tim=31306462058
FETCH #139819795585328:c=17,e=17,p=0,cr=1,cu=0,mis=0,r=0,dep=1,og=4,plh=2321277860,tim=31306462098
STAT #139819795585328 id=1 cnt=0 pid=0 pos=1 obj=10422 op='TABLE ACCESS BY INDEX ROWID IM_DOMAIN$ (cr=1 pr=0 pw=0 str=1 time=21 us cost=0 size=39 card=1)'
STAT #139819795585328 id=2 cnt=0 pid=1 pos=1 obj=10423 op='INDEX UNIQUE SCAN IM_DOMAIN_UK (cr=1 pr=0 pw=0 str=1 time=18 us cost=0 size=0 card=1)'
CLOSE #139819795585328:c=5,e=5,dep=1,type=1,tim=31306462287

EXEC #139819795591784:c=484,e=484,p=0,cr=1,cu=0,mis=0,r=0,dep=0,og=1,plh=4262704031,tim=31306462316
FETCH #139819795591784:c=0,e=804,p=0,cr=44,cu=0,mis=0,r=0,dep=0,og=1,plh=4262704031,tim=31306463191
STAT #139819795591784 id=1 cnt=0 pid=0 pos=1 obj=0 op='HASH JOIN  (cr=45 pr=0 pw=0 str=1 time=1222 us cost=72 size=97 card=1)'
STAT #139819795591784 id=2 cnt=4 pid=1 pos=1 obj=0 op='HASH JOIN  (cr=45 pr=0 pw=0 str=1 time=1001 us cost=70 size=232 card=4)'
STAT #139819795591784 id=3 cnt=4 pid=2 pos=1 obj=117764 op='TABLE ACCESS FULL EMP (cr=22 pr=0 pw=0 str=1 time=259 us cost=35 size=152 card=4)'
STAT #139819795591784 id=4 cnt=4 pid=2 pos=2 obj=117765 op='TABLE ACCESS FULL DEPT (cr=22 pr=0 pw=0 str=1 time=81 us cost=35 size=80 card=4)'
STAT #139819795591784 id=5 cnt=0 pid=1 pos=2 obj=117766 op='TABLE ACCESS FULL BONUS (cr=0 pr=0 pw=0 str=1 time=4 us cost=2 size=39 card=1)'
CLOSE #139819795591784:c=24,e=24,dep=0,type=1,tim=31306508552

PARSE #139819795591784:c=41,e=42,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4262704031,tim=31306508798
PARSE #139819795585328:c=21,e=22,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=2321277860,tim=31306509010
EXEC #139819795585328:c=132,e=132,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=2321277860,tim=31306509220
FETCH #139819795585328:c=20,e=19,p=0,cr=1,cu=0,mis=0,r=0,dep=1,og=4,plh=2321277860,tim=31306509415
CLOSE #139819795585328:c=8,e=8,dep=1,type=3,tim=31306509494
EXEC #139819795591784:c=682,e=704,p=0,cr=1,cu=0,mis=0,r=0,dep=0,og=1,plh=4262704031,tim=31306509558
FETCH #139819795591784:c=588,e=1246,p=0,cr=44,cu=0,mis=0,r=0,dep=0,og=1,plh=4262704031,tim=31306511014
CLOSE #139819795591784:c=23,e=22,dep=0,type=3,ti

As you can see, every time I do a parse call for the query against the SCOTT schema (PARSE #139819795591784), Oracle does a parse/exec/fetch/close for the query against im_domain$ (PARSE #139819795585328) – and this happens even when the SCOTT query is in the session cursor cache!

As Franck points out, this looks like something to do with the In Memory option even though the option wasn’t enabled in his database, and wasn’t enabled in my client’s database. Once you’ve got a reproducible example of a problem, though, you can start fiddling to see if you can bypass it. In this case I decided to check all the parameters to do with the in-memory option – which is a bit like hard work because there are 208 parameters that include the word “inmemory”. After checking the descriptions of the first twenty or so I decided there was an easier option – if Oracle is asking about “domains” for columns possibly it’s something to do with the relatively new “join group” feature for in-memory columnar compression, so I ran a query to produce the list of parameter names and description for parameter with the words “join” and “group” in their names – there are two:


_sqlexec_join_group_aware_hj_enabled              enable/disable join group aware hash join
_sqlexec_join_group_aware_hj_unencoded_rowset     minimum number of unencoded rowsets processed before adaptation 

The first one looks rather promising – and it has a default value to TRUE, and it can be changed by “alter session” or “alter system”. So I executed:


alter session set "_sqlexec_join_group_aware_hj_enabled" = false;
alter system flush shared_pool;

Then I ran my test again and voila! there it wasn’t. No more queries against in_domain$.

Problem solved (until the client decides they want to use the in-memory option, perhaps).

There may be other reasons why this recursive query appears which aren’t to do with hash joins, of course, but join groups are specifically to allow efficient hash joins with the in-memory option, (it’s a trick to do with common encoding for compression to allow Bloom filtering to eliminate CUs without decoding) so I’m hoping I won’t have to track down and eliminate another sources for the query.

 

Hash Partitions

Wed, 2019-03-13 08:13

Here’s an important thought if you’ve got any large tables which are purely hash partitioned. As a general guideline you should not need partition level stats on those tables. The principle of hash partitioned tables is that the rows are distributed uniformly and randomly based on the hash key so, with the assumption that the number of different hash keys is “large” compared to the number of partitions, any one partition should look the same as any other partition.

Consider, as a thought experiment (and as a warning), a table of product_deliveries which is hash partitioned by product_id with ca. 65,000 distinct products that have been hashed across 64 partitions. (Hash partitioning should always use a power of 2 for the partition count if you want the number of rows per partition to be roughly the same across all partitions – if you don’t pick a power of two then some of the partitions will be roughly twice the size of others.)

Consider a query for “deliveries to Basingstoke” – in the absence of a histogram on the delivery location the optimizer will produce a cardinality estimate that is:

  • total rows in table / number of distinct delivery locations in table

Now consider a query for: “deliveries of product X to Basingstoke” – again in the absence of histograms. The optimizer could have two ways of calculating this cardinality:

  • total rows in table / (number of distinct products in table * number of distinct delivery locations in table)
  • total rows in relevant partition / (number of distinct products in relevant partition * number of distinct delivery locations in relevant partition)

But given the intent of hash partitioning to distribute data evenly we can make three further observations:

  1. the number of rows in any one partition should be very similar to the number of rows in the table divided by the number of partitions
  2. the number of distinct products in any one partition should be very similar to the number of products in the table divided by the number of partitions
  3. the number of distinct locations in any one partition should be very similar to the number of distinct locations in the whole table.

The second condition holds because product is the partition key, the third holds because location is not the partition key.

So we can rewrite the second, partition-oriented, formula as:

  • (total rows in table / number of partitions) / ((number of distinct products in table / number of partitions) * number of distinct locations in table)

which, re-arranging parentheses and cancelling common factors, reduces to:

  • total rows in table / (number of distinct products in table * number of distinct locations in table)

which matches the first formula. (Q.E.D.) In the absence of any statistics on hash partitions the optimizer can (ought to be able to) produce reasonable cardinality estimates based purely on table-level stats.

In fact if you look back into the history of partitioning this observation is implicit in the early days of composite partitioning when the only option was for range/hash composite partitions – the optimizer never used sub-partition stats to calculate costs or cardinality it used only partition-level statistics. (And it was several years before the optimizer caught up to the fact that (e.g.) range/list composite partitioning might actually need to do arithmetic based on subpartition stats.)

I did say that the example was also a warning. Hash partitioning is “expected” to have a large number of distinct key values compared to the number of partitions. (If you don’t meet this requirement then possibly you should be using list partitioning). There’s also a “uniformity” assumption built into the arithmetic (both the basic arithmetic and the hand-waving discussion I produced above). Just imagine that your company supplies a handful of products that for some strange reason are incredibly popular  in Basingstoke. If this is the case then the assumption that “all partitions look alike” is weakened and you would have to consider the possibility that the variation would require you to produce a workaround to address problems of poor cardinality estimates that the variation might produce.

A pattern of this type has two generic effects on the optimizer, of course. First is the simple skew in the data – to have a significant impact the number of rows for the problem products would have to be much larger than average, which suggests the need for a suitably crafted histogram; secondly there’s an implied correlation between a few products and Basingstoke, so you might even end up creating a column group and manually coding a histogram on it to capture the correlation.

 

sys_op_lbid

Mon, 2019-03-11 08:23

I’ve made use of the function a few times in the past, for example in this posting on the dangers of using reverse key indexes, but every time I’ve mentioned it I’ve only been interested in the “leaf blocks per key” option. There are actually four different variations of the function, relevant to different types of index and controlled by setting a flag parameter to one of 4 different values.

The call to sys_op_lbid() take 3 parameters: index (or index [sub]partition object id, a flag vlaue, and a table “rowid”, where the flag value can be one of L, R, O, or G. The variations of the call are as follows:

  • L – the function will return the row directory address  (i.e. something that look like a rowid) of the first index entry in the leaf block that holds the index entry for the referenced table rowid. The effect of this is that the number of distinct values returned by calling the function for every row in the table is equal to the number of index leaf blocks which current hold an active entry.
  • R – Relevant only to bitmap indexes; the function will return the row directory address of the bitmap index entry for the referenced table rowid. The effect of this is that the number of distinct values returned by calling the function for every row in the table is equal to the number of index entries in the bitmap index.
  • O – Relevent only to the primary key index of an index organized table with an overflow. The function is used with a non-key column instead of a rowid and returns a rowid that corresponds to the row directory entry in the overflow segment. An interesting detail of the overflow entries is that there is an “nrid” (next rowid) pointer in the primary key index entry that does not get deleted when all the columns in the related overflow entry are set null – so you can delete all the data from the overflow (set every overflow column in every row to null) and the primary key clustering factor would not change.
  • G – Relevent only to secondary indexes on an index organized table. Like the L and R options this function takes a rowid (which is a special case for IOTs) as one of its inputs and uses the block guess from the secondary index to construct a row directory entry for the first entry in the primary key leaf block that corresponds to that block guess. This serves two purposes – it allows Oracle to calculate the clustering factor of the secondary index (as you walk the secondary index in order how much do you jump around the leaf blocks of the primary key), and it allows Oracle to produce the pct_direct_access figure for the secondary index by joining the secondary index to the primary key index on primary key, and comparing the ‘G’ result for the secondary with the ‘L’ result from the primary, which gives a count of the number of times the guess is correct.

These observations can be confirmed by gathering stats on different structures with trace enabled, and doing a couple of block dumps. For reference the following is just a simple script to create an index organized table with overflow and secondary index:


rem
rem     Script:         sys_op_lbid_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2018
rem

create table t1(
        id      constraint t1_pk primary key,
        v1      ,
        v2      ,
        v3      ,
        padding 
)
organization index
pctthreshold 2
overflow
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum,
        lpad(rownum,30),
        lpad(rownum,30),
        lpad(rownum,40),
        rpad('x',100,'x')
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

create index t1_i1 on t1(v3);

alter session set sql_trace true;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
end;
/

alter session set sql_trace false;

select
        object_id, object_name
from
        user_objects
order by
        object_id
;

The significance of the query for object_id and data_object_id shows up in the trace file (and subsequent dumps) when Oracle uses one or other of the values in its SQL and rowid construction.

Here are the interesting SQL statements generated as the stats are gathered – but cosmetically altered to be reader-friendly. In order they are:

  1. Stats for primary key of IOT: using the ‘L’ option for counting leaf blocks and the ‘O’ option for the clustering factor into overflow segment.
  2. Stats for secondary index of IOT: using the ‘L’ option for counting leaf blocks and the ‘G’ option for the clustering factor into the primary key index
  3. Calculate pct_direct_access: the ‘L’ option gives the actual leaf block in the primary key index, the ‘G’ option gives the leaf block guessed by the secondary index

select 
        /*+ index(t,t1_pk) */ 
        count(*) as nrw,
        count(distinct sys_op_lbid(351334,'L',t.rowid)) as nlb,
        null as ndk,
        (sys_op_lbid(351334,'O',V1),1) as clf
from
        t1 t 
where 
        id is not null
;


select 
        /*+ index(t,t1_i1) */ 
        count(*) as nrw,
        count(distinct sys_op_lbid(351335,'L',t.rowid)) as nlb,
        null as ndk,
        sys_op_countchg(sys_op_lbid(351335,'G',t.rowid),1) as clf
from
        t1 t 
where 
        v3 is not null
;


select
        case when count(*) = 0
                then 100
                else round(
                        count(
                                case when substr(gdba,7,9)=substr(lbid,7,9)
                                        then 1
                                        else null
                                end
                        )/count(*)*100
                )
        end
from    (
        select
                /*+
                        ordered
                        use_hash(i.t1 t2)
                        index_ffs(t2,t1_pk)
                */
                sys_op_lbid(351334,'L',t2.rowid) lbid,
                gdba
        from (
                select
                        /*+ index_ffs(t1,t1_i1) */
                        sys_op_lbid(351335,'G',t1.rowid) gdba,
                        t1.ID
                from
                        t1 t1
                ) i,
`               t1 t2
        where
                i.id = t2.id
        )
;

The strange substr(,7,9) that appears in the join between the primary key index and the secondary index is needed because the ‘G’ option uses the object_id of the table to turn an absolute block guess into a rowid while the ‘L’ option is using the data_object_id of the primary key index to turn its block addrss into a rowid. (This means there may be variants of this SQL for IOTs using partitioning.)

 

Pages