Wednesday, January 26, 2011

SQL Joins Tutorial: Joins are easy. Lets play cricket with them


The Different Types of Joins in SQL Server

Inner join or Equi join
Outer Join
Cross join
Let's suppose we have two tables Emp and Dept whose description is given below:-



CREATE TABLE [Emp](
[Empid] [Int] IDENTITY (1, 1) NOT NULL Primary key,
[EmpNumber] [nvarchar](50) NOT NULL,
[EmpFirstName] [nvarchar](150) NOT NULL,
[EmpLastName] [nvarchar](150) NULL,
[EmpEmail] [nvarchar](150) NULL,
[Managerid] [int] NULL,
[Deptid] [INT]
)
CREATE TABLE [Dept](
[Deptid] [int] IDENTITY (1, 1) NOT NULL primary key,
[DeptName] [nvarchar](255) NOT NULL
)

After the creation of the tables we need to insert the data into these tables. To insert the data the following queries are used:-

insert into Emp (EmpNumber,EmpFirstName,EmpLastName,EmpEmail,Managerid,Deptid)
values('E1','Sachin','Tendulkar','Sachin@mycomp.com',2,2)
insert into Emp (EmpNumber,EmpFirstName,EmpLastName,EmpEmail,Managerid,Deptid)
values('E2','Zaheer','Khan','Zaheer@mycomp.com',1,1)
insert into Emp(EmpNumber,EmpFirstName,EmpLastName,EmpEmail,Managerid,Deptid)
values('E3','Gambhir','Gautam','Gambhir@mycomp.com',1,2)
insert into Emp (EmpNumber,EmpFirstName,EmpLastName,EmpEmail,Managerid,Deptid)
values('E4','Dhoni','MS','Dhoni@mycomp.com',1,NULL)

insert into Dept(DeptName)
values('Bowling')
insert into Dept(DeptName)
values('Batting')
insert into Dept(DeptName)
values('Coach')
insert into Dept(DeptName)
values('Allrounder')

Inner Join

This type of join is also known as the Equi join.
This join returns all the rows from both tables where there is a match.
This type of join can be used in the situation where we need to select only those rows
which have values common in the columns which are specified in the ON clause.

Now, if we want to get Emp id, Emp first name and their Dept name
for those Emps which belongs to at least one Dept, then we can use the inner join.

Query for Inner Join

 SELECT Emp.Empid, Emp.EmpFirstName, Emp.EmpLastName, Dept.DeptName 
 FROM  Emp 
  INNER JOIN dept 
     ON Emp.Deptid=Dept.Deptid

Result

Empid EmpFirstName  DeptName
1     Sachin                  Batting
2     Zaheer                 Bowling
3     Gambhir              Batting

Explanation

In this query, we used the inner join based on the column "Deptid" which is common in both the tables "Emp" and "Dept".
This query will give all the rows from both the tables which have common values in the column "Deptid".
Gambhir Guatam and Sachin Tendulkar has the value "2" in the Deptid column of the table Emp.
In the Dept table, the Dept "Batting" has the value "2" in the Deptid column.
Therefore the above query returns two rows for the Dept "Batting", one for Gambhir Guatam and another for Sachin Tendulkar.

Lets understand more joins



Self Join


Sometime we need to join a table to itself.
This type of join is called Self join.
It is one of the type of inner join where both the columns belong to the same table.
In this Join, we need to open two copies of a same table in the memory.
Since the table name is the same for both instances, we use the table aliases to make identical copies of the same table
to be open in different memory locations.

For example if we need to get the Emp name and their manager name we need to use the self join,
since the managerid for an Emp is also stored in the same table as the Emp.

Query for the Self Join

SELECT Emp1.Empid, 
       Emp1.EmpFirstName+' '+Emp1.EmpLastName as EmpName, 
    Emp2.EmpFirstName+' '+Emp2.EmpLastName as ManagerName 
  FROM Emp Emp1 
     INNER JOIN Emp Emp2 
    ON Emp1.Managerid=Emp2.Empid

Result

Empid   EmpName            ManagerName
1       Sachin Tendulkar    Zaheer Khan
2       Zaheer Khan         Sachin Tendulkar
3       Gambhir Guatam     Sachin Tendulkar
4       Dhoni MS           Sachin Tendulkar

Explanation

Since the Emp and the manager information is contained in the same table (Emp, since both are Emps),
we have to use the Self Join. In the self join query,
we make two copies of the table Emp by using the aliases Emp1 and Emp2 and then use Inner join between them
by using the managerid column of the Emp1 and Empid column of the table Emp2.
In this example, we use managerid and empid columns of the Emp table since the Emp id
of the manager of an Emp is stored in the managerid of the Emp table.

Outer Join

This type of join is needed when we need to select all the rows from the table on the left
(or right or both) regardless of whether the other table has common values or not and it
usually enter null values for the data which is missing.

The Outer join can be of three types

Left Outer Join
Right Outer Join
Full Outer Join

Left Outer Join

If we want to get Emp id, Emp first name, and their Dept name for all the Emps regardless
of whether they belong to any Dept or not,then we can use the left outer join.
In this case we keep the Emp table on the left side of the join clause.
It will insert NULL values for the data which is missing in the right table.

Query for Left Outer Join

SELECT Emp.Empid, 
       Emp.EmpFirstName, 
       Dept.DeptName
  FROM Emp 
     LEFT OUTER JOIN Dept dept 
     ON Emp.Deptid=Dept.Depttid

Result

Empid EmpFirstName  DeptName
1     Sachin         Batting
2     Zaheer         Bowling
3     Gambhir        Batting
4     Dhoni           NULL

Explanation

Since we have use the Left Outer Join, this query will give the information
(Emp id, Emp first name, Emp last name and their Dept name)
for all the Emp from the Emp table and it insert NULL value in the DeptName column where the Emp does not belong to any Dept.
In the table Emp, since Sachin Tendulkar, Zaheer Khan and Gambhir Guatam have values in their Deptid column,
therefore the above query will display their Dept name under the heading DeptName.
But since Dhoni MS doesn't belongs to any Dept and has null value in the column Deptid
therefore the above query will Display the NULL value under the column heading DeptName.

Right Outer Join

If we want to get all the Depts name and Emp id, Emp first name, all the Emps belonging to the Dept
of whether a Dept have Emps or not, then we can use the right outer join.
In this case we keep the Dept table on the right side of the join clause.
It will insert NULL values for the data which is missing in the left table (Emp).

Query for Right Outer Join

SELECT Dept.DeptName, 
       Emp.Empid
    Emp.EmpFristName 
  FROM Emp 
    RIGHT OUTER JOIN Dept dept 
   ON Emp.Deptid=Dept.Deptid

Result

DeptName Empid EmpFirstName
Bowling    2     Zaheer  
Batting    1     Sachin
Batting    3     Gambhir
Coach    NULL  NULL  
Allrounder NULL  NULL  

Explanation

Since we have use the Right Outer Join, this query will join the two tables Emp
and Dept on the basis of the values contains in the column Depttid.
It will give the Dept name from the Dept table and the Emp id, Emp first name, and Emp last name of all the Emps that belong to that Dept.
If any Dept does not contain any Emp then it insert NULL value in the columns coming from the Emp table.
Since no Emp is connected to the Depts Coach and Allrounder,
this query will display NULL values under the columns Empid, EmpFirstName and EmpLastName for the Depts Coach and Allrounder.
Since the Dept Batting and Bowling contains the Emps therefore the columns Empid, EmpFirstName and EmpLastName contains the information,
Emp id, Emp first name respectively.

Full Outer Join

If we want to get all the Depts name and the Emp id, Emp first name,
all the Emps regardless of whether a Dept have Emps or not, or whether a Emp belong to a Dept or not,
then we can use the full outer join.
It will insert null values for the data which is missing in both the tables.

Query for Full Outer Join

SELECT Emp.Empid, 
       Emp.EmpFirstName, 
       Dept.DeptName 
  FROM Emp 
     FULL OUTER JOIN Dept dept 
    ON Emp.Deptid=Dept.Depttid

Result


Empid EmpFirstName  DeptName
1     Sachin         Batting
2     Zaheer         Bowling
3     Gambhir        Batting
4     Dhoni            NULL
NULL  NULL           Coach
NULL  NULL           Allrounder

Explanation

Since we have used the Full Outer Join, this query will give the name of all the Depts from the Dept table and the Emp id, Emp first name,of all the Emps from the Emp table.
If any Dept does not contain any Emp, then it insert NULL value in the columns Empid, EmpFirstName, EmpLastName columns and if any Emp doesn't belong to any Dept
then it insert NULL value in the column DeptName.
Here since Dhoni MS doesn't belong to any Dept, the result displays NULL value under the column DeptName.
Since the Depts Coach and Bowling don't contain any Emps, the result of the above query displays NULL values
under the columns Empid, EmpFirstName for the Depts Coach and Allrounder..

Cross Join

This join combines all the rows from the left table with every row from the right table.
This type of join is needed when we need to select all the possible combinations of rows and columns from both the tables.
 This type of join is generally not preferred as it takes lot of time and gives a huge result that is not often useful.

Query for the Cross Join

SELECT Emp.Empid,
       Emp.EmpFirstName,
       Dept.DeptName 
 FROM Emp Emp 
   CROSS JOIN Dept dept

Results

Empid EmpFirstName DeptName
1     Sachin        Bowling
2     Zaheer        Bowling
3     Gambhir       Bowling
4     Dhoni         Bowling
1     Sachin        Batting
2     Zaheer        Batting
3     Gambhir       Batting
4     Dhoni         Batting
1     Sachin        Coach
2     Zaheer        Coach
3     Gambhir       Coach
4     Dhoni         Coach
1     Sachin       Allrounder
2     Zaheer       Allrounder
3     Gambhir      Allrounder
4     Dhoni        Allrounder

Explanation

This Cross Join query will give combines all the rows from the Emp table with every row of the Dept table.
Since the Emp table contains 4 rows and the Dept table contains 4 rows, therefore this result will returns 4*4=16 rows.
This query doesn't contain any ON clause.

Conclusion

The above discussion can be summarized as joins are used to select data from more than one table in a single query.
The inner join is used to select only those rows that have common values in the column on which join is based.
The Left Outer Join is used to select the rows from the left hand side table regardless of whether the table on
the right hand side has common values or not.
Similarly the Right Outer join is used to select rows from the table on the right hand side regardless of whether
the table on the left hand side has common values or not. The Cross join is used to get rows from all the possible combinations of rows and columns from both the table. If should be used when it the only way left since it may run  for a very long time and returns a huge result set which may not be useful.

 Few Important Tips :

Joining of the tables should be avoided if it is based on the columns that have very few unique values.
To increase the JOIN performance it is better to limits the number of rows needed to be joined,
by including a WHERE clause in your query.
Join performance can also beincreased if the columns used for joining the tables have their own indexes.

No comments:

Post a Comment

subversion video