ORACLE JOIN METHODS

Lets have a look at join methods in this article.

If you run an SQL query with multiple tables, then you would have to join all the tables to get the matching rows out of all those tables with the help of matching columns with same datatypes. All the equality relationships of table join are specified in ‘where clause’ of query. Here optimizer also should play major role in choosing the best access methods of join and order in which the tables are joined.

The join methods has two children

  • Driving table or Outer table
  • Inner table or driven-to table

Optimizer chooses the small table as driving table based on the statistics like blocks,rows and size of table

There are multiple factors which come into picture during join operation

  • If table has unique or primary key constraint, then only one row is returned. Optimizer decide to choose these tables first in the order of join
  • Tables with outer join comes after the table to which it should be joined
  • Selectivity values based out of tables,columns and index

Basically there are two types of join

  • Physical
  • Logical

There are four types of physical join methods in oracle at present

  • Nested loops
  • Hash join
  • Sort merge join
  • Cartesian join

Physical joins:

Nested loops: (loop (with in a (loop))(Smaller datasets)

In nested loop joins, each of the outer row source are read and joined based on the matching rows with the inner row source. Nested loop attracts index scan as always index and nested loop joins are friends

  • This join occur during the processing of rows with small table size
  • If joined columns are indexed
  • This join use less memory because of the processing of one row at a time
  • Here result sets should be smaller
  • Nested loops are always available when below operator are used
  • If optimizer mode is first_rows_n

<>,like,not like,not,lnnvl

Nested loops are used when cost of reading rows into 2nd table from first table result set is cheaper

FOR X IN (SELECT * FROM OUTER_TABLE WHERE...) LOOP
   FOR Y IN (SELECT * FROM INNER_TABLE WHERE X.C=Y.C) LOOP
      IF MATCH THEN SEND THE ROWS 
      IF INNER JOIN AND NO MATCH THEN DISCARD THE ROW
      IF OUTER JOIN AND NO MATCH SET INNER COLUMN VALUES TO NULL AND      SEND THE ROWS
   END LOOP
END LOOP

Hash joins:(Larger datasets,equijoin)

Hash joins attract full table scan and it require two tables to be joined independently and combines both the result set in where clause. The table with the less number of rows will be hashed into memory. This hash table contains all the row data for the table and is loaded into hash buckets based on randomizing function that converts the join key to hash value. If memory can offer to accomodate the small table,the hash table will reside in memory and if it is not fit in memory, then the table row data spill on the temporary disk.

select * from t1 x 
inner join t2 y on (x.c1 = y.c2)

Small table – Build table or hash table

Large table – Probe table

Hash joins should be used when,

cost of reading row source once < cost of reading rows from 1st table to scan rows from second table

If the build table fits into the memory(PGA) , then we call that as optimal hash join. Each hash join has its own dedicated server process serving sort and hash requests

If the table doesnot fit into memory, then table chunks will be partitioned into memory and unfit chunks will be processed into temporary tablespace in disk which can also be called as single pass operation

If the partitions are huge in size and they dont fit into memory, the partitions chunks of table should be poured into temporary tablespace and scan the result set one by one into memory(PGA)

If two different tables or row sources are joined ,for example build table x is scanned and the resultant rows are filtered pass with a hashing algorithm and assigned with a number which is distributed to individual hash bucket. Then we compare the rows to the probe table to search the rows from hash bucket

Left deep tree

hash join 
 hash join 
  hash join  
   table 1
   table 2
  table 3
 table 4

Right deep tree(hungry for memory)

hash join
 table 4
 hash join
  table 3
  hash join
    table 2
    table 1

Build table is always considered for small table because build table determines the optimal hash join.Less precision and cheaper algorithm

Tuning hash joins:

  • select less columns and rows in build table
  • Use partition for two tables on the same join keys
  • PGA_aggregate_target should be optimum size for hash join to avoid single or multipass operations on disk

Sort merge join:(Not an equijoin)

Sort merge joins read the tables as per condition in where clause independently, sort the rows from each table in order as per join key and finally merge the sorted result sets.

In this join, sorting the data in PGA is the costly operation which requires temporary disk space in case of insufficient PGA size. But after sorting the rows, merging happens faster.

The second row source is always sorted in PGA even if the columns are indexed.

>,<,>=,<=,=

select * from t1 x 
inner join t2 y on (x.c1 >= y.c2)

Adaptive nested loops and hash joins:(optimizer_adaptive_features)

Statistics collector has a predefined inflection point to choose between nested loops+index scan(less number of rows) and hash join+full table scan(more number of rows). This happens during adaptive ns or hj and bitmap pruning

Counts records – if rows less than inflection point,then nested loop is selected and if higher than inflection point,then hash joins are selected

Nested loop outer:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s