10gR2 New Feature: Case Insensitive Searches

James Koopmann's picture
articles: 

There are always a few topics in regards to writing SQL that always seem to come up more often than others. Querying data with case insensitivity is one of those topics. And Oracle has addressed this issue in many of their release. But in Oracle 10gR2 they have reached new levels. This article takes a dive into case insensitivity and how it is now handled in 10gR2. For the better!

Firstly for those that are new to case insensitivity I will give a very brief introduction here. Basically case insensitivity is the treating of upper case letters and lower case letters as equal when comparing or sorting them. So the comparison of the letters 'a' and 'A' in a case insensitive world would be equal. And if we had a table LETTERS(letter char(1)) and did the following query it would return the following two rows.

SQL> SELECT letter FROM letters WHERE letter = 'a';
LETTER
------
a
A

Up until the release of 10gR2 there have been many different approaches to accomplishing pattern matching such as using UPPER, NLS_UPPER, different NLS_SORT and NLS_COMP combinations such as GENERIC_BASELETTER for NLS_SORT, and you could even use Oracle's regular expression functions REGEXP%. But prior to 10gR2 writing a standard SQL statement that used a LIKE comparison operator and was able to take case insensitivity into consideration was non-existent. The only option was to use REGEXP_LIKE.

Before we get into how this is now solved in 10gR2 lets just take a quick look at the pieces of the solution and how they have worked in the past and how they currently work in 10gR2.

Table 1: NLS_COMP & NLS_SORT Visited

PARAMETER

Description
NLS_COMP
The sole purpose of this variable is to determine how the predicates in SQL statements will evaluate to each other when a comparison is required. The typical comparison operator is the WHERE clause but also includes such comparisons as ORDER BY, START WITH, HAVING, etc.

Prior to Oracle 10gR2 there were two settings.

BINARY : All comparisons will be done in accordance to the binary value of the characters.

ANSI : All comparisons will be done linguistically in accordance to the setting of the variable NLS_SORT.

After Oracle 10gR2.

BINARY : Still available.

ANSI : Still available but only for backward compatibility and should not be used.

LINGUISTIC : New setting and fully honors the setting of NLS_SORT variable. You will see this latter but basically when you had set NLS_SORT to be case insensitive it didn't always hold true. In 10gR2 it now does!

How to set


  1. Set with an operating system environment variable:
    SET(export) NLS_COMP=<binary|linguistic>

  2. Set inside your database session:
    ALTER SESSION SET NLS_COMP=<binary|linguistic>;

NLS_SORT NLS_SORT specifies the collating sequence for ORDER BY queries.

If the value is BINARY, then the collating sequence for ORDER BY queries is based on the numeric value of characters.

If the value is a named linguistic sort, sorting is based on the order of the defined linguistic sort. Most (but not all) languages supported by the NLS_LANGUAGE parameter also support a linguistic sort with the same name.

There are many values.

BINARY : All sorting is done in accordance to the binary value of the characters.

V$NLS_VALID_VALUES : All sorting is done in accordance to a named linguistic definition. This means that Oracle will sort in accordance to a particular locale (GERMAN, POLISH, FRENCH, etc.).These valid named definitions may be obtained by querying the V$NLS_VALID_VALUES where parameter = `SORT'. Just be aware that all the values in this view also have a hybrid definition by adding a suffix of _CI (case insensitivity)

How to set


  1. Set with an operating system environment variable:
    SET(export) NLS_SORT=<valid_value>
  2. ALTER SESSION SET NLS_SORT=<valid_value>;

Examples in Oracle 10g

In prior versions of Oracle (pre 10gR2) if we had a table called NAMES and have the following entries in that table:

SQL> select name from names;
NAME
-------------------
Johny Jacobson
johny jacobson
JOHNY JACOBSON

And we wanted to perform a search on 'Johny Jacobson', under normal setting of case sensitive (NLS_SORT=BINARY) searching we would get the following. This is exactly how you would assume an equality to work.

SQL> alter session set NLS_SORT=BINARY;
Session altered.

SQL> select name from names where name = 'Johny Jacobson';
NAME
--------------------------------------------------------------
Johny Jacobson

But often times we don't necessarily know how someone might have entered data and thus we want to find all occurrences of a name, email address, or some other information. In this case we turn on case insensitivity. If we set NLS_SORT=BINARY_CI we would get the following results:

SQL>  alter session set NLS_COMP=ANSI;
Session altered.
SQL> alter session set NLS_SORT=BINARY_CI;
Session altered.

SQL> select name from names where name = 'Johny Jacobson';
NAME
--------------------------------------------------------------------
Johny Jacobson
johny jacobson
JOHNY JACOBSON

These last two examples used strict equality (where name = 'Johny Jacobson'). Often times, more often than not, we don't know much about the column contents and want to perform a search where we can supply only a portion of a name. So under normal settings of case sensitivity NLS_SORT=BINARY we may perform searches such where name LIKE '%John%', 'Johny%', '%ohn%' and so forth until we get a result set we are comfortable with. Using the normal setting of case sensitive (NLS_SORT=BINARY) searching we might do the following. Obviously if we are looking for all occurrences of case insensitivity this doesn't work.

SQL> alter session set NLS_SORT=BINARY;
Session altered.

SQL> select name from names where name like 'Johny%';
NAME
------------------------------------------------------------
Johny Jacobson

So we quickly, in prior versions of Oracle, tried to switch to case insensitivity by setting the NLS_SORT parameter. If we are still looking for the simple BINARY case insensitivity we might try the following. But as you can see in prior versions of Oracle we still do not get true case insensitivity!

SQL> alter session set NLS_COMP=ANSI;
Session altered.
SQL> alter session set NLS_SORT=BINARY_CI;
Session altered.

SQL> select name from names where name like 'Johny%';
NAME
------------------------------------------------------------
Johny Jacobson

This is where the new setting of NLS_COMP comes into play. Setting NLS_COMP to LINGUISTIC tells Oracle to perform strict case insensitivity operations and looks like the following two examples. As you can see there is complete case insensitivity for both equality and now LIKE operations.

SQL> alter session set NLS_SORT=BINARY_CI;
Session altered.
SQL> alter session set NLS_COMP=LINGUISTIC;
Session altered.

SQL>  select name from names where name = 'Johny Jacobson';
NAME
------------------------------------------------------------------
Johny Jacobson
johny jacobson
JOHNY JACOBSON

SQL> select name from names where name like 'Johny%';
NAME
------------------------------------------------------------------
Johny Jacobson
johny jacobson
JOHNY JACOBSON

There are literally dozens of ways in Oracle you can pattern match and find the information you need. The problem was, for case insensitivity you couldn't rely on strict case insensitivity in its purest form. You needed to hook in other functions or expressions. Now with Oracle 10gR2 you can set the NLS_COMP parameter and get true case insensitivity without having to change your application. Nice! Especially if you needed to port to another database vendor. Just so long as their case insensitivity is the same as Oracle's.

Comments

In fact, I think Oracle still used a UPPER or else on it. Coz from the execution plan, we can see there is FTS, instead used the index on the field:

SQL> alter session set NLS_SORT=BINARY_CI;

Session altered.

SQL> alter session set NLS_COMP=LINGUISTIC;

Session altered.

SQL> select * from t2 where f1 = 'a';

F1                AAA
---------- ----------
A                   1
a                   2


Execution Plan
----------------------------------------------------------
Plan hash value: 1513984157

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     5 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T2   |     1 |     5 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

SQL> alter session set NLS_SORT=BINARY;

Session altered.

SQL> alter session set NLS_COMP=ANSI;

Session altered.

SQL> select * from t2 where f1 = 'a';

F1                AAA
---------- ----------
a                   2


Execution Plan
----------------------------------------------------------
Plan hash value: 2238318762

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     1 |     5 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T2      |     1 |     5 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T2_IDX1 |     2 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------

If you set autotrace on and re-run the query, you'll see something like this:

Execution Plan
----------------------------------------------------------
Plan hash value: 3979868219

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |     1 |     3 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_TABLE |     1 |     3 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(NLSSORT("TEXT_STRING",'nls_sort=''BINARY_CI''')=HEXTORAW('
              61626300') )

Note the filter. Now create an index on NLS_SORT("TEXT_STRING",'nls_sort=''BINARY_CI''') and re-run the query:

Execution Plan
----------------------------------------------------------
Plan hash value: 1565380963

----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                            |     1 |     3 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST_TABLE                 |     1 |     3 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_TT_NLSSORT_TEXT_STRING |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(NLSSORT("TEXT_STRING",'nls_sort=''BINARY_CI''')=HEXTORAW('61626300') )

I'm looking for a way to do case-insensitive searches without needed to issue "Alter session" statements and without needing to alter the SQL I run and without needing to create special indexes. I would like a database option in Oracle to declare that all queries in the database are case-insensitive. I have an application that runs on Microsoft Access OR Oracle (customer choice). Microsoft Access is powerful enough to do case-insensitive searches; to do it on Oracle with have to jump through all these hoops on every database connection. Is there hope for the future that Oracle could provide case-insensitive out-of-the-box?

gamyers's picture

"Is there hope for the future that Oracle could provide case-insensitive out-of-the-box?"
How about using ALTER SYSTEM....SCOPE=SPFILE (plus a restart).
Bear in mind you may have to rebuild indexes to make them usable.

I can't see it ever becoming the default, as it would break existing applications.

James,

One last question. I am writing Java code to select records from a table and I want the query to be case insensitive. Can I use these commands enclosed in my SELECT statement? Without taking up your time, is this a respectable (good practice) way to accomplish a case insensitive query? I do not have access to the OS to make these settings at the OS level.

alter session set NLS_SORT=BINARY_CI;
alter session set NLS_COMP=LINGUISTIC;

Thanks.

Phil

Hi there,

This is pretty old stuff but still up to date, as far as know. Still I have an issue with that with my Oracle 11g instance.

If I set those parameter using ALTER SESSION, it works fine (except in SQL Developper due to a bug specific to that SQL client). What I'd like to do is having those parameter set for all sessions (I have a Business Objects connecting to that DB), so I've applied a ALTER SYSTEM ... SCOPE=SPFILE and restarted the DB. It seems OK, I can see that the paramters value are correctly set by querying v$parameter. But still it has no effect on the queries, still case sensitive whatever SQL client I'm using (BO, SQL*D or SQLPlus), except for one, SQLPlus when ran directly on my Oracle server. Any ideas?