Tags

, , , , ,

Oracle enhanced it’s case-insensitive sorts and compares feature with 10g Release 2. Version 11g has the following types of sorts:

• Binary sort
• Monolingual linguistic sort
• Multilingual linguistic sort

NLSSORT

NLSSORT returns the string of bytes used to sort char.
Both char and ‘nlsparam’ can be any of the datatypes CHAR, VARCHAR2, NCHAR, or NVARCHAR2. The string returned is of RAW datatype.
The value of ‘nlsparam’ can have the form :
‘NLS_SORT = sort’
where sort is a linguistic sort sequence or BINARY. If you omit ‘nlsparam’, then this function uses the default sort sequence for your session. If you specify BINARY, then this function returns char.
If you specify ‘nlsparam’, then you can append to the linguistic sort name the suffix _ai to request an accent-insensitive sort or _ci to request a case-insensitive sort.
This function does not support CLOB data directly. However, CLOBs can be passed in as arguments through implicit data conversion.

/–Test Cases–/
Sample data and table used for examples throughout the scope of the document:-
To get us started, here’s a simple sample table with a few rows in it.


Connected to Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 
Connected as Lalit

SQL> CREATE TABLE NAMES AS SELECT * FROM (
  2  WITH DATA AS(
  3  SELECT 'Lalit'    NAME from dual UNION ALL
  4  SELECT 'Srikanth' NAME from dual UNION ALL
  5  SELECT 'Rahul'    NAME from dual UNION ALL
  6  SELECT 'Praveen'  NAME from dual UNION ALL
  7  SELECT 'Swetha'   NAME from dual UNION ALL
  8  SELECT 'Karthik'  NAME from dual UNION ALL
  9  SELECT 'Suresh'   NAME from dual)
 10  SELECT * from DATA);
 
Table created
 
SQL> select * from names;
 
NAME
--------
Lalit
Srikanth
Rahul
Praveen
Swetha
Karthik
Suresh
 
7 rows selected

The default: Binary sort

The default sort order is always binary, meaning that the characters are sorted in the order of their character number in the character set.

SQL> SHOW PARAMETER NLS_SORT;
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
nls_sort                             string      BINARY
 
SQL> SHOW PARAMETER NLS_COMP;
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
nls_comp                             string      BINARY

So this is how the default sorting happens –

SQL> SELECT name, ascii(substr(name,1,1)) FROM names order  by name;
 
NAME     ASCII(SUBSTR(NAME,1,1))
-------- -----------------------
Karthik                       75
Lalit                         76
Praveen                       80
Rahul                         82
Srikanth                      83
Suresh                        83
Swetha                        83
 
7 rows selected
 
SQL> SELECT name, ascii(name) FROM names order  by name;
 
NAME     ASCII(NAME)
-------- -----------
Karthik           75
Lalit             76
Praveen           80
Rahul             82
Srikanth          83
Suresh            83
Swetha            83
 
7 rows selected

What happens if the data is case-sensitive, the sorting of the characters changes as per the ASCII value returned. However, internally everything happens in binary format. Let’s see this happening when we alter the case sensitivity-


SQL> update names set name='kARTHIK' where name='Karthik';
 
1 row updated
 
SQL> SELECT name, ascii(name) FROM names order  by NLSSORT(name,'NLS_SORT=BINARY');
 
NAME     ASCII(NAME)
-------- -----------
Lalit             76
Praveen           80
Rahul             82
Srikanth          83
Suresh            83
Swetha            83
kARTHIK          107
 
7 rows selected
 
SQL> SELECT name, ascii(name) FROM names order  by NLSSORT(name,'NLS_SORT=BINARY_CI');
 
NAME     ASCII(NAME)
-------- -----------
kARTHIK          107
Lalit             76
Praveen           80
Rahul             82
Srikanth          83
Suresh            83
Swetha            83
 
7 rows selected
 
SQL> rollback;
 
Rollback complete

Let’s see the difference in the binary values when case-sensitivity matters-

SQL> select name, NLSSORT(name,'NLS_SORT=BINARY') BINARY ,NLSSORT(name,'NLS_SORT=BINARY_CI') BINARY_CI FROM names ORDER BY name;
 
NAME     BINARY                      BINARY_CI
-------- -------------------         --------------------
Karthik  4B61727468696B00            6B61727468696B00
Lalit    4C616C697400                6C616C697400
Praveen  5072617665656E00            7072617665656E00
Rahul    526168756C00                726168756C00
Srikanth 5372696B616E746800          7372696B616E746800
Suresh   53757265736800              73757265736800
Swetha   53776574686100              73776574686100
 
7 rows selected

Now let’s see the use of these session parameters.


SQL> select * from names where name = 'Rahul';
 
NAME
--------
Rahul
 
SQL> select * from names where name = 'rahul'; --case does not match with actual data
NAME
--------

--no rows returned in this case

So, in such a case usually we tend to use UPPER/LOWER function on both sides of the operator.

SQL> select * from names where UPPER(NAME) = UPPER('rahul');
 
NAME
--------
Rahul

OR


SQL> select * from names where LOWER(NAME) = LOWER('RAHUL');
 
NAME
--------
Rahul

Now we got our output, but, the question is did we do it optimally?
What if the NAMES table is a MASTER TABLE of the national data for AADHAR CARD/VOTER ID. It will have billions of rows making the table size to some Gigabytes.
Usually, if the top SQLs have the name column in their predicate, then it is must to have a Balanced-tree index. Or even a bitmap-index if the mapping criteria fulfils for not null and unique conditions. In such a case, using UPPER or LOWER function will skip the Index scan and we hit a FTS case.

Let’s create a normal balance-tree index.

SQL> CREATE INDEX names_indx ON names(NAME);
 
Index created

Now let’s check the execution plan for the query using the case function (UPPER/LOWER).

SQL> explain plan for SELECT * FROM names WHERE UPPER(name)=UPPER('kArtHIk');
 
Explained

SQL> select * from table(dbms_xplan.display);
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 4048250084
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |     6 |    15   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| NAMES |     1 |     6 |    15   (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NLSSORT(UPPER("NAME"),'nls_sort=''BINARY_CI''')=HEXTORAW('
              6B61727468696B00') )
Note
-----
   - dynamic sampling used for this statement (level=2)
 
18 rows selected

So, we just ended up with a FTS. A very obvious solution that would strike in the mind is to create a function-based index.

SQL> create index names_fb_indx on names(upper(name));
 
Index created
 
SQL> alter session set nls_comp='BINARY';
 
Session altered
 
SQL> alter session set nls_sort='BINARY';
 
Session altered
 
SQL> explain plan for SELECT * FROM names WHERE UPPER(name)=UPPER('kArtHIk');
 
Explained
 
SQL> select * from table(dbms_xplan.display);
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3644882080
--------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     1 |    12 |     2   (0
|   1 |  TABLE ACCESS BY INDEX ROWID| NAMES         |     1 |    12 |     2   (0
|*  2 |   INDEX RANGE SCAN          | NAMES_FB_INDX |     1 |       |     1   (0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(UPPER("NAME")='KARTHIK')
Note
-----
   - dynamic sampling used for this statement (level=2)
 
18 rows selected

Although creating a virtual column in 11g and up would also suffice. But it is similar to a function-based index. But doing either of them is nothing but adding an overhead to the existing design. We need to maintain the indexes in an index table space. It will maintain the rowid, value of the data, but in the same case that of the original data in table.
So, let’s see for a better approach.

SQL> alter session set nls_comp='LINGUISTIC';
 
Session altered
 
SQL> alter session set nls_sort='BINARY_CI';
 
Session altered
SQL> SELECT * FROM names WHERE name='kArtHIk';
 
NAME
--------
Karthik
SQL> explain plan for SELECT * FROM names WHERE name='kArtHIk';
 
Explained
 
SQL> select * from table(dbms_xplan.display);
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 4048250084
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |     6 |    15   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| NAMES |     1 |     6 |    15   (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NLSSORT("NAME",'nls_sort=''BINARY_CI''')=HEXTORAW('6B61727
              468696B00') )
Note
-----
   - dynamic sampling used for this statement (level=2)
 
18 rows selected

Everything seems fine. Task is achieved, but still we can tune it to next level.

Case Insensitive Indexes

A normal index uses the default sort order, by default, and so it’s useless in a case-insensitive search – and Oracle won’t use it. For large tables, the resulting full table scan can be quite a performance penalty. Fortunately, it’s easy to create an index that uses a specific sort order. You simply create a function-based index that uses the NLSSORT function we saw above.

SQL> create index names_ci_indx on names(NLSSORT(name,'NLS_SORT=BINARY_CI'));
 
Index created
 
SQL> explain plan for SELECT * FROM names WHERE name='kArtHIk';
 
Explained
 
SQL> select * from table(dbms_xplan.display);
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2561608627
--------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     1 |    45 |     2   (0
|   1 |  TABLE ACCESS BY INDEX ROWID| NAMES         |     1 |    45 |     2   (0
|*  2 |   INDEX RANGE SCAN          | NAMES_CI_INDX |     1 |       |     1   (0
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(NLSSORT("NAME",'nls_sort=''BINARY_CI''')=HEXTORAW('6B61727468696B0
Note
-----
   - dynamic sampling used for this statement (level=2)
 
18 rows selected