SQLULDR2 : Native Parallel Support! Parallel Inside!

Links: http://www.dbatools.net/mytools/parallel-inside-sqluldr2.html

    Supposed that we want to unload a huge table (100gb+) to text file, how to do it quickly? The answer is parallel unloading. I will suggest you perform a manual parallel by split the big table into multiple pieces with different where clause. But now I think the manual parallel is too complex, and it may perform multiple times full table scan, which brings extra IO overhead to the storage system, so I add the native parallel feature.

    In parallel SQLULDR2, we can split huge table into multiple piece by rowid range, each parallel slave just take piece of the data with extra IO overhead to the storage system. You just need to tell the parallel degree and the big table name in the SQL to do the parallel. Supposed that we have the following SQL query.

SELECT E.EMPNO, E.ENAME, D.DNAME
  FROM EMP E, DEPT D
WHERE E.DEPTNO = D.DEPTNO

    Supposed that the EMP is a huge table, we want to parallize the unload process. I will introduce two new options for parallel unloading.

  split  = table name for automatically parallelization.
  degree  = parallelize data copy degree (2-128).

    We also need to change the SQL query for parallel unloading, you can use two bind variables in the SQL, "MINRID" for the lower rowid range and "MAXRID" for the upper rowid range. So I rewrite the SQL as following.

SELECT E.EMPNO, E.ENAME, D.DNAME
  FROM EMP E, DEPT D
WHERE E.DEPTNO = D.DEPTNO AND
  E.ROWID >= :MINRID AND E.ROWID < :MAXRID

    If we want to unload the EMP table only, we just need pass the following options to SQLULDR2 for parallel unloading.

user=scott/tiger
query=select * from emp where rowid >= :minrid and rowid < :maxrid
split=emp
degree=4
file=uldrdata.%p.txt
log=log.%p.log

    The "%p" will be replaced by the parallel slave id (start from 1). With the parallel unloading feature, We can unload data very quickly. In a PC server with local disk only, I can unload data to text file with 32MB per second.

Comments (4)

How to get the MinRowID and MaxRowID?

The min rowid and the max rowid will generated by SQLULDR2 automatically.

Perfect! You rule!

Great work, really. Thanks!

I must say that those you are working on should be a standard feature of oracle database. But those guys don't any data OUT, all they want is IN which make sense for them.

I have several questions/suggestions as an ETL developer working with one of the leading ETL tool which does the extractions at an amazing speeds.

1. The source code of sqluldr2 is now available, is it possible to share it also. Available source code is pretty old and does not support these features.

2. Did you consider to host this project at a open source project portal like sourceforge.net? BTW is this software an open source project? If it is what is the licence of it?

3. How are you handling the parallesim? Are you using separate proceses, or separate threads? Did other processes or threads succeed to fail upon one of them fails?

I'm sorry to bother you with those questions yet I don't have the current source code.

One more thing I want to share: You can achive three different types extraction parallesim in oracle database. One is rowid approach which you've done already. But if a table has only subpartitions, it's very efficient to go with the subpartitions approach. And with composite partitioned tables like range-hash partitions(which has very high dependency to SQL statement) you would consider to range parallelism if available.

I think we can work on an intellegent extraction engine for oracle database which should decide what type of parallesim to use, or even for small tables not to use at all - running on serial.

Thanks for all this stuff.

Best regards,
Onur

Post a comment

Remember Me?

« Previous | Main | Next »

Powered by
Movable Type 5.01