3.3 Types of Joins


There are several types of joins to be aware of:


Cross joins


Cross joins are joins without a join condition. Each row of one table is combined with each row of another table. The result is referred to as a Cartesian product.


Inner joins


Inner joins are the regular joins. An inner join returns the rows that satisfy the join condition. Each row returned by an inner join contains data from all the tables involved in the join.


Outer joins


Outer joins are an extension to inner joins. An outer join returns the rows that satisfy the join condition and also the rows from one table for which no corresponding rows (i.e., that satisfy the join condition) exist in the other table.


Equi- and non-equi-joins


An equi-join is a join where the join condition uses the equal to (=) operator to relate the rows of two tables. When a join condition uses any other operator to relate the rows of two tables, the join is called a non-equi-join.


Self joins


A self join is a join of a table to itself.


Partition outer joins


A new type of join introduced in Oracle Database 10g that is slated to be part of the next ANSI/ISO SQL standard after SQL:2003. A partition outer join divides your result set into groups, or partitions, and repeats the same outer join for each of these groups. Such joins are extremely handy for generating missing rows.

The following sections discuss each of these joins in detail, and with examples.

3.3.1 Cross Joins/Cartesian Products

If you don't specify a join condition when joining two tables, Oracle combines each row from the first table with each row from the second table. This type of result set is called a cross join or a Cartesian product; either term is acceptable. The number of rows in a cross join is the product of the number of rows in each table. Here's an example of a cross join:
SELECT e.lname, d.name

FROM employee e CROSS JOIN department d;



LNAME NAME

---------- --------------

SMITH ACCOUNTING

ALLEN ACCOUNTING

WARD ACCOUNTING

JONES ACCOUNTING

MARTIN ACCOUNTING

BLAKE ACCOUNTING

. . .

. . .

. . .

SCOTT OPERATIONS

KING OPERATIONS

TURNER OPERATIONS

ADAMS OPERATIONS

JAMES OPERATIONS

FORD OPERATIONS

MILLER OPERATIONS



56 rows selected.



Since the query didn't specify a join condition, each row from the employee table is combined with each row from the department table. Needless to say, this result set is of little use. More often than not, a cross join produces a result set containing misleading rows. Therefore, unless you are sure that you want a Cartesian product, don't use a cross join.

Notice the use of the keyword CROSS before the JOIN keyword in the previous example. If you omit the CROSS keyword, and don't specify a join condition, Oracle will throw an error, because it thinks that you are attempting a regular join and have inadvertently omitted the join condition. For example:
SELECT e.lname, d.name

FROM employee e JOIN department d;

FROM employee e JOIN department d

*

ERROR at line 2:

ORA-00905: missing keyword


What happens when you specify the CROSS keyword as well as a join condition through an ON or USING clause? Oracle rejects your query with an error, and rightly so, because cross joins are joins without join conditions. For example:
SELECT e.lname, d.name

FROM employee e CROSS JOIN department d

ON e.dept_id = d.dept_id;

ON e.dept_id = d.dept_id

*

ERROR at line 3:

ORA-00933: SQL command not properly ended


Be aware that it's easily possible to inadvertently specify a cross join when using the old join syntax described in the Appendix A. Using that syntax, a cross join occurs when you list two tables in the FROM clause separated by commas, and you forget to write a join condition into the query's WHERE clause.
3.3.2 Inner Joins

Inner joins are the most commonly used joins. When people refer simply to a "join," they most likely mean an "inner join." An inner join relates the rows from the source tables based on the join condition, and returns the rows that satisfy it. For example, to list the name and department for each employee, you would use the following SQL statement:
SELECT e.lname, d.name

FROM employee e JOIN department d

ON e.dept_id = d.dept_id;



LNAME NAME

-------------------- --------------------

CLARK ACCOUNTING

KING ACCOUNTING

MILLER ACCOUNTING

SMITH RESEARCH

ADAMS RESEARCH

FORD RESEARCH

SCOTT RESEARCH

JONES RESEARCH

ALLEN SALES

BLAKE SALES

MARTIN SALES

JAMES SALES

TURNER SALES

WARD SALES



14 rows selected.


In this example, each row of the employee table is combined with each row of the department table, and if the combination satisfies the join condition (dept_id in the employee table matches the dept_id in the department table), then it is included in the result set.

The JOIN keyword, unless prefixed with another keyword, means an inner join. Optionally, you can use the INNER keyword before the JOIN keyword to explicitly indicate an inner join, as in the following example:
SELECT e.lname, d.name

FROM employee e INNER JOIN department d

ON e.dept_id = d.dept_id;


Let's look at another example to go a bit deeper in the concept behind an inner join:
SELECT * FROM department;



DEPT_ID NAME LOCATION_ID

---------- -------------------- -----------

10 ACCOUNTING 122

20 RESEARCH 124

30 SALES

40 OPERATIONS 167



SELECT * FROM location;



LOCATION_ID REGIONAL_GROUP

----------- --------------------

122 NEW YORK

124 DALLAS

123 CHICAGO

167 BOSTON

144 SAN FRANCISCO



SELECT d.name, l.regional_group

FROM department d JOIN location l

ON d.location_id = l.location_id;



NAME REGIONAL_GROUP

-------------------- --------------------

ACCOUNTING NEW YORK

RESEARCH DALLAS

OPERATIONS BOSTON


Our department table has four rows, and our location table has five rows. However, the inner join returns only three rows. The inner join returns only those rows from the two tables that satisfy the join condition. What this means, with respect to this example, is that only those departments that have a corresponding location, and only those locations that have a corresponding department, are returned by the inner join query. The "SALES" department doesn't have a location_id, and therefore has no corresponding entry in the location table, so it is not included in the result set of the inner join. Similarly, the locations "CHICAGO" and "SAN FRANCISCO" don't have corresponding entries in the department table, and are not included in the result set of the inner join.

The concept of an inner join is easier to understand in terms of the Cartesian product (or cross join). While performing a join of the department and location tables, a Cartesian product is first formed (conceptually, Oracle doesn't physically materialize this Cartesian product), and then the join conditions in the ON (or USING) clause restrict the results to only those rows for which the location_id values match.





The most important concept to understand about joins, and especially about inner joins, is that a join is all about matching rows from one table with corresponding rows in another table.

3.3.3 Outer Joins

Sometimes, while performing a join between two tables, you need to return all the rows from one table even when there are no corresponding rows in the other table. For example, you may want to see all the departments even if they are not related to any particular location. Oracle provides a special type of join to include rows from one table that don't have matching rows from the other table. This type of join is known as an outer join.

The syntax of an outer join is:
FROM table1 { LEFT | RIGHT | FULL } [OUTER] JOIN table2


The syntax elements are:


table1, table2


Specifies the tables between which you are performing the outer join.


LEFT


Specifies that the results be generated using all rows from table1. For those rows in table1 that don't have corresponding rows in table2, NULLs are returned in the result set for the table2 columns.


RIGHT


Specifies that the results be generated using all rows from table2. For those rows in table2 that don't have corresponding rows in table1, NULLs are returned in the result set for the table1 columns.


FULL


Specifies that the results be generated using all rows from table1 and table2. For those rows in table1 that don't have corresponding rows in table2, NULLs are returned in the result set for the table2 columns. Additionally, for those rows in table2 that don't have corresponding rows in table1, NULLs are returned in the result set for the table1 columns.


OUTER


Specifies that you are performing an OUTER join. This keyword is optional. If you use LEFT, RIGHT, or FULL, Oracle automatically assumes an outer join. The OUTER keyword is for completeness' sake, and complements the INNER keyword.
3.3.3.1 Left outer joins

To list all departments even if they are not related to any particular location, you can perform a LEFT OUTER JOIN between the department and the location tables. For example:
SELECT d.dept_id, d.name, l.regional_group

FROM department d LEFT OUTER JOIN location l

ON d.location_id = l.location_id;



DEPT_ID NAME REGIONAL_GROUP

---------- -------------------- --------------

10 ACCOUNTING NEW YORK

20 RESEARCH DALLAS

30 SALES

40 OPERATIONS BOSTON


This query lists all the rows from the department table together with their corresponding locations from the location table. For the rows from department with no corresponding rows in location, NULLs are returned for the l.regional_group column in the result set.
3.3.3.2 Right outer joins

Likewise, to list all the locations even if they are not related to any particular department, you can perform a RIGHT OUTER JOIN between the location and the department tables. For example:
SELECT d.dept_id, d.name, l.regional_group

FROM department d RIGHT OUTER JOIN location l

ON d.location_id = l.location_id;



DEPT_ID NAME REGIONAL_GROUP

---------- -------------------- ---------------

10 ACCOUNTING NEW YORK

CHICAGO

20 RESEARCH DALLAS

SAN FRANCISCO

40 OPERATIONS BOSTON


This query lists all the rows from the location table, and their corresponding departments from the department table. For the rows from location that don't have corresponding rows in department, NULLs are returned for the d.dept_id and d.name columns in the result set.

The LEFT and RIGHT keywords in an outer join query are relative to the position of the tables in the FROM clause. The same result can be achieved using either a LEFT OUTER JOIN or a RIGHT OUTER JOIN, by switching the position of the tables. For example, the following two queries are equivalent:
SELECT d.dept_id, d.name, l.regional_group

FROM department d LEFT OUTER JOIN location l

ON d.location_id = l.location_id;



SELECT d.dept_id, d.name, l.regional_group

FROM location l RIGHT OUTER JOIN department d

ON d.location_id = l.location_id;


In each case, the directional word, either LEFT or RIGHT, points toward the anchor table, the table that is required. The other table is then the optional table in the join.

3.3.3.3 Full outer joins

Ocassionally, you may need the effect of an outer join in both directions, which you can think of as a combination of LEFT and RIGHT outer joins. For example, you may need to list all the departments (with or without a location), as well as all the locations (with or without a department). Use a FULL OUTER JOIN to generate such a result set:
SELECT d.dept_id, d.name, l.regional_group

FROM department d FULL OUTER JOIN location l

ON d.location_id = l.location_id;



DEPT_ID NAME REGIONAL_GROUP

---------- -------------------- ----------------

10 ACCOUNTING NEW YORK

20 RESEARCH DALLAS

30 SALES

40 OPERATIONS BOSTON

CHICAGO

SAN FRANCISCO



6 rows selected.


This query performs a FULL OUTER JOIN between the two tables, and lists:


All the rows that satisfy the join condition


The rows in the department table that don't have a corresponding location


The rows in the location table that don't have a corresponding department

A full outer join is bidirectional in the sense that the result is the same irrespective of the position of the tables in the FROM clause. In mathematical terms, you would consider the FULL OUTER JOIN operator to be "commutative."
3.3.4 Equi-Joins Versus Non-Equi-Joins

The join condition determines whether a join is an equi-join or a non-equi-join. When a join condition relates two tables by equating the columns from the tables, it is an equi-join. When a join condition relates two tables by an operator other than equality, it is a non-equi-join. A query may contain equi-joins as well as non-equi-joins.

Equi-joins are the most common join type. For example, if you want to list all the parts supplied by all the suppliers, you can join the supplier table with the part table by equating the supplier_id from one table to that of the other:
SELECT s.name supplier_name, p.name part_name

FROM supplier s JOIN part p

ON s.supplier_id = p.supplier_id;



SUPPLIER_NAME PART_NAME

------------------------------ -----------------------

Acme Industries Acme Part AI5-4557

Tilton Enterprises Tilton Part TZ50828

Eastern Importers Eastern Part EI-T5-001


However, there are situations in which you need non-equi-joins to get the required information. For example, if you want to list the inventory_class of each part, and the inventory_class is based on a range of unit costs, you need to execute the following query:
SELECT p.name part_name, c.inv_class inv_class

FROM part p JOIN inventory_class c

ON p.unit_cost BETWEEN c.low_cost AND c.high_cost;



PART_NAME INV

------------------------------ ---

Acme Part AI5-4557 A

Tilton Part TZ50828 B

Eastern Part EI-T5-001 B


The use of the BETWEEN operator to relate the unit_cost column from the part table to the low_cost and high_cost columns of the inventory_class table makes this query a non-equi-join. (You can achieve the same result by using the combination of >= and <= operators instead of BETWEEN. Try this as an exercise.)
3.3.5 Self Joins

There are situations in which one row of a table is related to another row of the same table. The employee table is a good example. The manager of one employee is also an employee. The rows for both are in the same employee table. This relationship is indicated in the manager_emp_id column:
CREATE TABLE employee (

emp_id NUMBER (5) NOT NULL PRIMARY KEY,

fname VARCHAR2 (20),

lname VARCHAR2 (20),

dept_id NUMBER (5),

manager_emp_id NUMBER (5) REFERENCES employee(emp_id),

salary NUMBER (5),

hire_date DATE,

job_id NUMBER (3));


To get information about an employee and his manager, you have to join the employee table with itself. You can do that by specifying the employee table twice in the FROM clause and using two different table aliases, thereby treating employee as if it were two separate tables. The following example lists the name of each employee and his manager:
SELECT e.lname employee, m.lname manager

FROM employee e JOIN employee m

ON e.manager_emp_id = m.emp_id;



EMPLOYEE MANAGER

-------------------- --------------------

SCOTT JONES

FORD JONES

ALLEN BLAKE

WARD BLAKE

JAMES BLAKE

TURNER BLAKE

MARTIN BLAKE

MILLER CLARK

ADAMS SCOTT

JONES KING

CLARK KING

BLAKE KING

SMITH FORD



13 rows selected.


Note that the employee table is used twice in the FROM clause with two different aliases. Also note the join condition that reads as: "Where the employee's manager_emp_id is the same as his manager's emp_id."
3.3.5.1 Self outer joins

Even though the employee table has 14 rows, the previous query returned only 13 rows. This is because there is an employee without a manager_emp_id. Oracle excludes that employee's row from the result set while performing the self inner join. To include employees without manager_emp_id values, in other words, without managers, you need an outer join:
SELECT e.lname employee, m.lname manager

FROM employee e LEFT OUTER JOIN employee m

ON e.manager_emp_id = m.emp_id;



EMPLOYEE MANAGER

-------------------- --------------------

FORD JONES

SCOTT JONES

JAMES BLAKE

TURNER BLAKE

MARTIN BLAKE

WARD BLAKE

ALLEN BLAKE

MILLER CLARK

ADAMS SCOTT

CLARK KING

BLAKE KING

JONES KING

SMITH FORD

KING



14 rows selected.


Be careful when using a LEFT or RIGHT outer join to join a table to itself. If you choose the wrong direction, you may get an absurd result set that makes no sense. In this case, we want to list all the employees irrespective of whether they have a manager or not. Therefore, the employee table we need to make optional is the one from which we are drawing manager names.
3.3.5.2 Self non-equi-joins

The previous example showed self equi-joins. However, there are situations when you need to perform self non-equi-joins. We will illustrate this by an example. Let's assume that you are in charge of organizing interdepartmental basketball competition within your organization. It is your responsibility to draw the teams and schedule the competition. You query the department table and get the following result:
SELECT name FROM department;



NAME

--------------

ACCOUNTING

RESEARCH

SALES

OPERATIONS


You find that there are four departments, and to make a fair competition, you decide that each department plays against the other three departments once, and at the end, the department with the maximum wins is declared the winner. You have been to an Oracle SQL training class recently, and decide to apply the concept of self join you learned there. You execute the following query:
SELECT d1.name team1, d2.name team2

FROM department d1 CROSS JOIN department d2;



TEAM1 TEAM2

-------------------- --------------------

ACCOUNTING ACCOUNTING

ACCOUNTING RESEARCH

ACCOUNTING SALES

ACCOUNTING OPERATIONS

RESEARCH ACCOUNTING

RESEARCH RESEARCH

RESEARCH SALES

RESEARCH OPERATIONS

SALES ACCOUNTING

SALES RESEARCH

SALES SALES

SALES OPERATIONS

OPERATIONS ACCOUNTING

OPERATIONS RESEARCH

OPERATIONS SALES

OPERATIONS OPERATIONS



16 rows selected.


Disappointing results. From your knowledge of high school mathematics, you know that four teams each playing once with the other three makes six combinations. However, your SQL query returned 16 rows. Now you realize that since you used a cross join (or didn't specify any join condition), you got a Cartesian product from your query. You put in a join condition, and your query and results now look as follows:
SELECT d1.name team1, d2.name team2

FROM department d1 JOIN department d2

ON d1.dept_id = d2.dept_id;



TEAM1 TEAM2

-------------- --------------

ACCOUNTING ACCOUNTING

RESEARCH RESEARCH

SALES SALES

OPERATIONS OPERATIONS


Oops! The equi-join returned a very unwanted result. A team can't play against itself. You realize your mistake, and this sparks the idea that you can use non-equi-joins in this situation. You rewrite the query as a non-equi-join. You don't want a team to play against itself, and therefore replace the = operator in the join condition with !=. Let's look at the results:
SELECT d1.name team1, d2.name team2

FROM department d1 JOIN department d2

ON d1.dept_id != d2.dept_id;



TEAM1 TEAM2

-------------- --------------

RESEARCH ACCOUNTING

SALES ACCOUNTING

OPERATIONS ACCOUNTING

ACCOUNTING RESEARCH

SALES RESEARCH

OPERATIONS RESEARCH

ACCOUNTING SALES

RESEARCH SALES

OPERATIONS SALES

ACCOUNTING OPERATIONS

RESEARCH OPERATIONS

SALES OPERATIONS



12 rows selected.


Still not done. In this result set, you have permutations such as (RESEARCH, ACCOUNTING) and (ACCOUNTING, RESEARCH), and so on. Therefore, each team plays against the others twice. You need to remove these permutations, which you rightly consider to be duplicates. You think about using DISTINCT. DISTINCT will not help here, because the row (RESEARCH, ACCOUNTING) is different from the row (ACCOUNTING, RESEARCH) from the viewpoint of DISTINCT; but not from the viewpoint of your requirement. After some thought, you want to try out an inequality operator other than !=. You decide to go with the less-than (<) operator. Here are the results you get:
SELECT d1.name team1, d2.name team2

FROM department d1 JOIN department d2

ON D1.DEPT_ID < D2.DEPT_ID;



TEAM1 TEAM2

-------------------- -----------

ACCOUNTING RESEARCH

ACCOUNTING SALES

ACCOUNTING OPERATIONS

RESEARCH SALES

RESEARCH OPERATIONS

SALES OPERATIONS



6 rows selected.


That's it! Now you have six combinations: each team plays against the other three just once. Let's examine why this version of the query works. Conceptually, when Oracle executes this query, a Cartesian product is first formed with 16 rows. Then the less-than (<) operator in the join condition restricts the result set to those rows in which the dept_id of Team 1 is less than the dept_id of Team 2. The less-than (<) operator eliminates the duplicates, because for any given permutation of two departments this condition is satisfied for only one. Using greater-than (>) instead of less-than (<) will also give you the required result, but the team1 and team2 values will be reversed:
SELECT d1.name team1, d2.name team2

FROM department d1 JOIN department d2

ON d1.dept_id > d2.dept_id;



TEAM1 TEAM2

-------------------- -----------

OPERATIONS SALES

OPERATIONS RESEARCH

OPERATIONS ACCOUNTING

SALES RESEARCH

SALES ACCOUNTING

RESEARCH ACCOUNTING



6 rows selected.


Don't be disheartened by the painful process you had to go through to get this result. Sometimes you have to go through an agonizing experience to get simple results such as these. That's life. Now that you have the team combinations right, go a bit further and assign a date for each match. Use "tomorrow" as the starting date:
SELECT d1.name team1, d2.name team2, SYSDATE + ROWNUM match_date

FROM department d1 JOIN department d2

ON d1.dept_id < d2.dept_id;



TEAM1 TEAM2 MATCH_DAT

-------------------- -------------------- ---------

ACCOUNTING RESEARCH 10-NOV-03

ACCOUNTING SALES 11-NOV-03

ACCOUNTING OPERATIONS 12-NOV-03

RESEARCH SALES 13-NOV-03

RESEARCH OPERATIONS 14-NOV-03

SALES OPERATIONS 15-NOV-03



6 rows selected.


Now publish these results on the corporate intranet along with the rules and regulations for the competition, and you are done.
3.3.6 Partition Outer Joins

Partition outer joins are an interesting new feature in Oracle Database 10g. They're useful for data densification, which is a fancy way of saying that they are useful for filling in rows that do not exist. This is a common requirement for data warehousing queries.

Part of our example database is an employee_expense table with summary information from employee expense reports. The data in that table looks as follows:
SELECT * FROM employee_expense;



EMP_ID YEAR MONTH EXPENSE_CLAIM APPROVED_AMT PAID_DATE

---------- ---------- ---------- ------------- ------------ ---------

7369 2002 2 3072.43 3072.43 03-MAR-02

7369 2002 4 30 30 01-JUN-02

7369 2002 5 235.03 35.03 01-JUN-02

7369 2002 9 5095.98 5095.08 31-OCT-02

7369 2002 12 1001.01 1001.01 01-FEB-03

7782 2002 1 111.09 111.09 01-FEB-02

7782 2002 3 9.85 9.85 01-APR-02

7782 2002 7 3987.32 3987.32 01-AUG-02

7782 2002 9 1200 1200 01-OCT-02


Management wants to review expenses, and you've been asked to generate a data extract of employee expense claims for the year 2002. Furthermore, to facilitate some analysis that management wishes to do using Excel, you've been asked to ensure that your extract contains one row per month per employee, but only for those employees who actively submit expense reports. Most do not.

You think about these requirements for a while, and realize that you have a months table containing one row for each month in 2002:
SELECT *

FROM months

WHERE year = 2002;



YEAR MONTH

---------- ----------

2002 1

2002 2

2002 3

2002 4

2002 5

2002 6

2002 7

2002 8

2002 9

2002 10

2002 11

2002 12


You can use this months table in an outer join to generate 12 rows for each employee. For example, to generate one row per month for employee 7782, you can write the following query:
SELECT NVL(ee.emp_id, 7782), m.year, m.month, NVL(ee.expense_claim,0)

FROM (SELECT * FROM months WHERE year = 2002) m

LEFT OUTER JOIN (SELECT *

FROM employee_expense

WHERE emp_id = 7782) ee

ON m.year = ee.year AND m.month = ee.month

ORDER BY m.month;



NVL(EE.EMP_ID,7782) YEAR MONTH NVL(EE.EXPENSE_CLAIM,0)

------------------- ---------- ---------- -----------------------

7782 2002 1 111.09

7782 2002 2 0

7782 2002 3 9.85

7782 2002 4 0

7782 2002 5 0

7782 2002 6 0

7782 2002 7 3987.32

7782 2002 8 0

7782 2002 9 1200

7782 2002 10 0

7782 2002 11 0



NVL(EE.EMP_ID,7782) YEAR MONTH NVL(EE.EXPENSE_CLAIM,0)

------------------- ---------- ---------- -----------------------

7782 2002 12 0


The query in this example is a bit intense. It performs a join of two nested SELECT statements, called subqueries. The first subquery returns the 12 rows from months for the year 2002. Those 12 rows are the mandatory rows in the outer join. The second subquery returns the actual expense rows for employee #7782. The outer join ensures that 12 rows are returned, and the two NVL functions ensure that each of those 12 rows has a value for the potentially NULL employee_expense fields. Expense claims for months in which no report was filed are simply set to zero.

Subqueries are discussed in detail in Chapter 5. We hated to have to bring them up now, but we felt the preceding example was necessary to help you understand what a partition outer join is all about.





The previous query is all well and good, but to generate your report you'd need to execute the preceding query many times, once for each employee who has submitted at least one expense report in the year 2002. This is where partition outer joins come into play. They make it trivial to do the equivalent of executing the preceding query once per employee. Here's how:
SELECT ee.emp_id, m.year, m.month, NVL(ee.expense_claim,0)

FROM (SELECT * FROM months WHERE year = 2002) m

LEFT OUTER JOIN employee_expense ee

PARTITION BY (ee.emp_id)

ON m.year = ee.year AND m.month = ee.month

ORDER BY ee.emp_id, m.month;



EMP_ID YEAR MONTH NVL(EE.EXPENSE_CLAIM,0)

---------- ---------- ---------- -----------------------

7369 2002 1 0

7369 2002 2 3072.43

7369 2002 3 0

7369 2002 4 30

7369 2002 5 235.03

7369 2002 6 0

7369 2002 7 0

7369 2002 8 0

7369 2002 9 5095.98

7369 2002 10 0

7369 2002 11 0

7369 2002 12 1001.01

7782 2002 1 111.09

7782 2002 2 0

7782 2002 3 9.85

7782 2002 4 0

7782 2002 5 0

7782 2002 6 0

7782 2002 7 3987.32

7782 2002 8 0

7782 2002 9 1200

7782 2002 10 0

7782 2002 11 0

7782 2002 12 0


Notice the PARTITION BY clause in this query. That clause is new in Oracle Database 10g, and in this example it causes the database engine to conceptually perform the following steps:


Divide the rows from employee_expense into groups based on their emp_id values, one group per value.


Outer join each group to the months table as a separate operation.

The key here is that rather than one outer join, you are getting the equivalent of many outer joins, but with a much simpler syntax, and from one query. The preceding query is logically equivalent to the following UNION ALL query:
SELECT NVL(ee.emp_id, 7369), m.year, m.month, NVL(ee.expense_claim,0)

FROM (SELECT * FROM months WHERE year = 2002) m

LEFT OUTER JOIN (SELECT *

FROM employee_expense

WHERE emp_id = 7369) ee

ON m.year = ee.year AND m.month = ee.month

ORDER BY m.month

UNION ALL

SELECT NVL(ee.emp_id, 7782), m.year, m.month, NVL(ee.expense_claim,0)

FROM (SELECT * FROM months WHERE year = 2002) m

LEFT OUTER JOIN (SELECT *

FROM employee_expense

WHERE emp_id = 7782) ee

ON m.year = ee.year AND m.month = ee.month

ORDER BY m.month;


You'll learn more about UNION queries in Chapter 7, so don't worry if you don't fully understand this example now. Our point here is to illustrate that, given the two employees represented in our example employee_expense table, our partition outer join query performs the equivalent of two outer joins, one for each employee.

Unlike the case with our UNION ALL code, you do not need to apply NVL to the partition columns when doing a partition outer join. The correct emp_id values were filled in automatically, for all new rows generated in each partition.





Because they make it easy to fill in gaps in your data, partition outer joins are particularly helpful when writing lag and lead queries, which are a type of query particularly sensitive to gaps in data. You'll learn more about lag and lead queries in Chapter 14.

Comments

Popular Posts