How does Oracle csscan utility working?

Links: http://www.dbatools.net/experience/how_oracle_csscan_working.html

    I create the following table in an UTF8 character set Oracle database.

create table t_charset(col1 varchar2(10), col2 varchar2(20));

    The I insert two rows under different NLS_LANG settings. The first column is the client NLS_LANG setting, the second column contains a Chinese word (same value for both two rows, two chars in GBK encoding), and then check the physical storage with Oracle DUMP function.

SQL> col col2 format a40
SQL> select col1, dump(col2) col2 from t_charset;

COL1       COL2
---------- ----------------------------------------
ZHS16GBK   Typ=1 Len=6: 228,184,173,229,155,189
UTF8       Typ=1 Len=4: 214,208,185,250

    Be cause we want to identify out which rows are not stored as UTF8 encoding, Oracle csscan can do this work, but I will use "CONVERT" function here. We will try to convert the string value from UTF8 encoding to UTFE encoding, because of UTFE is a super set of UTF8, so the conversion should go smoothly. So the source character set is UTF8 and target character set is UTFE. Let's see what happened. For the first row (UTF8), it's not stored as UTF8 encoding, so I get the following error.

SQL> SELECT CONVERT(COL2,'UTF8','UTFE') FROM T_CHARSET WHERE COL1='UTF8';
                                        *
ERROR at line 1:
ORA-12703: this character set conversion is not supported

    For the second row, every thing is ok.

SQL> SELECT CONVERT(COL2,'UTF8','UTFE') FROM T_CHARSET WHERE COL1='ZHS16GBK';

CONVERT(COL2,'UTF8','UTFE')
-----------------------------------------------
......

    So Oracle csscan just scan all the character values and try to do the conversion, if error then record the table name and ROWID into it's repository tables. After csscan run, you can get all the bad rows and do a fix job. If I use csscan in this case, it will identify out the second row for me.

Post a comment

« Previous | Main | Next »

Powered by
Movable Type 3.36