What is Join in SQL?
SQL Joins are simply a way to combine data from two or more tables based on a common field between them. SQL joins are of two types. We will discuss the types of SQL in detail.
Let’s proceed with an example demonstrating the process of Inner Join and Outer Join.
Student Table
EnrollNo | StudentName | Address |
1001 | geek1 | geeksquiz1 |
1002 | geek2 | geeksquiz2 |
1003 | geek3 | geeksquiz3 |
1004 | geek4 | geeksquiz4 |
StudentCourse Table
CourseID | EnrollNo |
1001 | |
1001 | |
1001 | |
1002 | |
1003 |
How To Use Outer Join?
Outer Join is performed by the selection of the records from all the tables given there. Left Outer Join works by selecting all records from the left table and matching records from the right table. Similarly, Right Outer Join works by selecting all records from the right table and matching records from the left table and the Full Outer Join returns all records if a match occurs in the left or the right table.
Conclusion
- Joins are used to Join two or more tables in the Database.
- There are mainly three types of Join – Inner Join, Natural Join, Outer Join.
- Inner joins are of two types – Theta Join and Equi Join.
- Outer joins are of Three types – Left Outer Join, Right Outer Join and Full Outer Join.
- Natural Join is performed only when there is at least one matching attribute in both the tables.
- Left Outer join always returns all the rows of left table irrespective of the Join condition.
- Right Outer Join always returns all the rows of right table irrespective of the Join condition.
- Full Outer Join always returns all the Rows of both the table irrespective of the join condition.
A NATURAL JOIN is a JOIN operation that creates an implicit join clause for you based on the common columns in the two tables being joined. Common columns are columns that have the same name in both tables.
A NATURAL JOIN can be an INNER join, a LEFT OUTER join, or a RIGHT OUTER join. The default is INNER join.
If the SELECT statement in which the NATURAL JOIN operation appears has an asterisk (*) in the select list, the asterisk will be expanded to the following list of columns (in this order):
An asterisk qualified by a table name (for example, COUNTRIES.*) will be expanded to every column of that table that is not a common column.
If a common column is referenced without being qualified by a table name, the column reference points to the column in the first (left) table if the join is an INNER JOIN or a LEFT OUTER JOIN. If it is a RIGHT OUTER JOIN, unqualified references to a common column point to the column in the second (right) table.
TableExpression NATURAL [ { LEFT | RIGHT } [ OUTER ] | INNER ] JOIN { TableViewOrFunctionExpression | ( TableExpression ) }
If the tables COUNTRIES and CITIES have two common columns named COUNTRY and COUNTRY_ISO_CODE, the following two SELECT statements are equivalent:
SELECT * FROM COUNTRIES NATURAL JOIN CITIES
SELECT * FROM COUNTRIES JOIN CITIES USING (COUNTRY, COUNTRY_ISO_CODE)
The following example is similar to the one above, but it also preserves unmatched rows from the first (left) table:
SELECT * FROM COUNTRIES NATURAL LEFT JOIN CITIES
Relational algebra
In database theory, relational algebra is a theory that uses algebraic structures for modeling data, and defining queries on it with a well founded semantics. The theory was introduced by Edgar F. Codd.
The main application of relational algebra is to provide a theoretical foundation for relational databases, particularly query languages for such databases, chief among which is SQL. Relational databases store tabular data represented as relations. Queries over relational databases often likewise return tabular data represented as relations.
The main purpose of relational algebra is to define operators that transform one or more input relations to an output relation. Given that these operators accept relations as input and produce relations as output, they can be combined and used to express complex queries that transform multiple input relations (whose data are stored in the database) into a single output relation (the query results).
Unary operators accept a single relation as input. Examples include operators to filter certain attributes (columns) or tuples (rows) from an input relation. Binary operators accept two relations as input and combine them into a single output relation. For example, taking all tuples found in either relation (union), removing tuples from the first relation found in the second relation (difference), extending the tuples of the first relation with tuples in the second relation matching certain conditions, and so forth.
Other more advanced operators can also be included, where the inclusion or exclusion of certain operators gives rise to a family of algebras.
Implementations[edit]
The first query language to be based on Codd’s algebra was Alpha, developed by Dr. Codd himself. Subsequently, ISBL was created, and this pioneering work has been acclaimed by many authorities[8] as having shown the way to make Codd’s idea into a useful language. Business System 12 was a short-lived industry-strength relational DBMS that followed the ISBL example.
In 1998 Chris Date and Hugh Darwen proposed a language called Tutorial D intended for use in teaching relational database theory, and its query language also draws on ISBL’s ideas. Rel is an implementation of Tutorial D.
Even the query language of SQL is loosely based on a relational algebra, though the operands in SQL (tables) are not exactly relations and several useful theorems about the relational algebra do not hold in the SQL counterpart (arguably to the detriment of optimisers and/or users). The SQL table model is a bag (multiset), rather than a set. For example, the expression is a theorem for relational algebra on sets, but not for relational algebra on bags; for a treatment of relational algebra on bags see chapter 5 of the “Complete” textbook by Garcia-Molina, Ullman and Widom.[6]
Joins and join-like operators[edit]
It has been suggested that this section be split out into another article titled Join (relational algebra). (Discuss) (March 2023) |
Natural join (⋈)[edit]
Natural join (⋈) is a binary operator that is written as (R ⋈ S) where R and S are relations.[a] The result of the natural join is the set of all combinations of tuples in R and S that are equal on their common attribute names. For an example consider the tables Employee and Dept and their natural join:[citation needed]
Note that neither the employee named Mary nor the Production department appear in the result.
This can also be used to define composition of relations. For example, the composition of Employee and Dept is their join as shown above, projected on all but the common attribute DeptName. In category theory, the join is precisely the fiber product.
The natural join is arguably one of the most important operators since it is the relational counterpart of the logical AND operator. Note that if the same variable appears in each of two predicates that are connected by AND, then that variable stands for the same thing and both appearances must always be substituted by the same value (this is a consequence of the idempotence of the logical AND). In particular, natural join allows the combination of relations that are associated by a foreign key. For example, in the above example a foreign key probably holds from Employee.DeptName to Dept.DeptName and then the natural join of Employee and Dept combines all employees with their departments. This works because the foreign key holds between attributes with the same name. If this is not the case such as in the foreign key from Dept.Manager to Employee.Name then these columns must be renamed before taking the natural join. Such a join is sometimes also referred to as an equijoin (see θ-join).
More formally the semantics of the natural join are defined as follows:
-
(1)
where Fun(t) is a predicate that is true for a relation t (in the mathematical sense) iff t is a function (that is, t does not map any attribute to multiple values). It is usually required that R and S must have at least one common attribute, but if this constraint is omitted, and R and S have no common attributes, then the natural join becomes exactly the Cartesian product.
The natural join can be simulated with Codd’s primitives as follows. Assume that c1,…,cm are the attribute names common to R and S, r1,…,rn are the attribute names unique to R and s1,…,sk are the attribute names unique to S. Furthermore, assume that the attribute names x1,…,xm are neither in R nor in S. In a first step the common attribute names in S can be renamed:
-
(2)
Then we take the Cartesian product and select the tuples that are to be joined:
-
(3)
Finally we take a projection to get rid of the renamed attributes:
-
(4)
θ-join and equijoin[edit]
Consider tables Car and Boat which list models of cars and boats and their respective prices. Suppose a customer wants to buy a car and a boat, but she does not want to spend more money for the boat than for the car. The θ-join (⋈θ) on the predicate CarPrice ≥ BoatPrice produces the flattened pairs of rows which satisfy the predicate. When using a condition where the attributes are equal, for example Price, then the condition may be specified as Price=Price or alternatively (Price) itself.
In order to combine tuples from two relations where the combination condition is not simply the equality of shared attributes it is convenient to have a more general form of join operator, which is the θ-join (or theta-join). The θ-join is a binary operator that is written as or where a and b are attribute names, θ is a binary relational operator in the set {<, ≤, =, ≠, >, ≥}, υ is a value constant, and R and S are relations. The result of this operation consists of all combinations of tuples in R and S that satisfy θ. The result of the θ-join is defined only if the headers of S and R are disjoint, that is, do not contain a common attribute.
The simulation of this operation in the fundamental operations is therefore as follows:
- R ⋈θ S = σθ(R × S)
In case the operator θ is the equality operator (=) then this join is also called an equijoin.
Note, however, that a computer language that supports the natural join and selection operators does not need θ-join as well, as this can be achieved by selection from the result of a natural join (which degenerates to Cartesian product when there are no shared attributes).
In SQL implementations, joining on a predicate is usually called an inner join, and the on keyword allows one to specify the predicate used to filter the rows. It is important to note: forming the flattened Cartesian product then filtering the rows is conceptually correct, but an implementation would use more sophisticated data structures to speed up the join query.
Semijoin (⋉ and ⋊)[edit]
The left semijoin is a joining similar to the natural join and written as where and are relations.[b] The result is the set of all tuples in for which there is a tuple in that is equal on their common attribute names. The difference from a natural join is that other columns of do not appear. For example, consider the tables Employee and Dept and their semijoin:[citation needed]
More formally the semantics of the semijoin can be defined as follows:
where is as in the definition of natural join.
The semijoin can be simulated using the natural join as follows. If are the attribute names of , then
Since we can simulate the natural join with the basic operators it follows that this also holds for the semijoin.
In Codd’s 1970 paper, semijoin is called restriction.[2]
Antijoin (▷)[edit]
The antijoin, written as R ▷ S where R and S are relations,[c] is similar to the semijoin, but the result of an antijoin is only those tuples in R for which there is no tuple in S that is equal on their common attribute names.[citation needed]
For an example consider the tables Employee and Dept and their antijoin:
The antijoin is formally defined as follows:
- R ▷ S = { t : t ∈ R ∧ ¬∃s ∈ S(Fun (t ∪ s))}
or
- R ▷ S = { t : t ∈ R, there is no tuple s of S that satisfies Fun (t ∪ s)}
where Fun (t ∪ s) is as in the definition of natural join.
The antijoin can also be defined as the complement of the semijoin, as follows:
-
R ▷ S = R − R ⋉ S
(5)
Given this, the antijoin is sometimes called the anti-semijoin, and the antijoin operator is sometimes written as semijoin symbol with a bar above it, instead of ▷.
Division (÷)[edit]
The division is a binary operation that is written as R ÷ S. Division is not implemented directly in SQL. The result consists of the restrictions of tuples in R to the attribute names unique to R, i.e., in the header of R but not in the header of S, for which it holds that all their combinations with tuples in S are present in R. For an example see the tables Completed, DBProject and their division:
If DBProject contains all the tasks of the Database project, then the result of the division above contains exactly the students who have completed both of the tasks in the Database project. More formally the semantics of the division is defined as follows:
-
R ÷ S = { t[a1,…,an] : t ∈ R ∧ ∀s ∈ S ( (t[a1,…,an] ∪ s) ∈ R) }
(6)
where {a1,…,an} is the set of attribute names unique to R and t[a1,…,an] is the restriction of t to this set. It is usually required that the attribute names in the header of S are a subset of those of R because otherwise the result of the operation will always be empty.
The simulation of the division with the basic operations is as follows. We assume that a1,…,an are the attribute names unique to R and b1,…,bm are the attribute names of S. In the first step we project R on its unique attribute names and construct all combinations with tuples in S:
- T := πa1,…,an(R) × S
In the prior example, T would represent a table such that every Student (because Student is the unique key / attribute of the Completed table) is combined with every given Task. So Eugene, for instance, would have two rows, Eugene → Database1 and Eugene → Database2 in T.
-
- EG: First, let’s pretend that “Completed” has a third attribute called “grade”. It’s unwanted baggage here, so we must project it off always. In fact in this step we can drop “Task” from R as well; the multiply puts it back on.
- T := πStudent(R) × S // This gives us every possible desired combination, including those that don’t actually exist in R, and excluding others (eg Fred | compiler1, which is not a desired combination)
In the next step we subtract R from T
- U := T − R
In U we have the possible combinations that “could have” been in R, but weren’t.
-
- EG: Again with projections — T and R need to have identical attribute names/headers.
- U := T − πStudent,Task(R) // This gives us a “what’s missing” list.
So if we now take the projection on the attribute names unique to R
then we have the restrictions of the tuples in R for which not all combinations with tuples in S were present in R:
-
V := πa1,…,an(U)
- EG: Project U down to just the attribute(s) in question (Student)
- V := πStudent(U)
So what remains to be done is take the projection of R on its unique attribute names and subtract those in V:
-
W := πa1,…,an(R) − V
- EG: W := πStudent(R) − V.
Common extensions[edit]
In practice the classical relational algebra described above is extended with various operations such as outer joins, aggregate functions and even transitive closure.[3]
Outer joins[edit]
Whereas the result of a join (or inner join) consists of tuples formed by combining matching tuples in the two operands, an outer join contains those tuples and additionally some tuples formed by extending an unmatched tuple in one of the operands by “fill” values for each of the attributes of the other operand. Outer joins are not considered part of the classical relational algebra discussed so far.[4]
The operators defined in this section assume the existence of a null value, ω, which we do not define, to be used for the fill values; in practice this corresponds to the NULL in SQL. In order to make subsequent selection operations on the resulting table meaningful, a semantic meaning needs to be assigned to nulls; in Codd’s approach the propositional logic used by the selection is extended to a three-valued logic, although we elide those details in this article.
Three outer join operators are defined: left outer join, right outer join, and full outer join. (The word “outer” is sometimes omitted.)
Left outer join (⟕)[edit]
The left outer join is written as R ⟕ S where R and S are relations.[d] The result of the left outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition (loosely speaking) to tuples in R that have no matching tuples in S.[citation needed]
For an example consider the tables Employee and Dept and their left outer join:
In the resulting relation, tuples in S which have no common values in common attribute names with tuples in R take a null value, ω.
Since there are no tuples in Dept with a DeptName of Finance or Executive, ωs occur in the resulting relation where tuples in Employee have a DeptName of Finance or Executive.
Let r1, r2, …, rn be the attributes of the relation R and let {(ω, …, ω)} be the singleton relation on the attributes that are unique to the relation S (those that are not attributes of R). Then the left outer join can be described in terms of the natural join (and hence using basic operators) as follows:
Right outer join (⟖)[edit]
The right outer join behaves almost identically to the left outer join, but the roles of the tables are switched.
The right outer join of relations R and S is written as R ⟖ S.[e] The result of the right outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that have no matching tuples in R.[citation needed]
For example, consider the tables Employee and Dept and their right outer join:
In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take a null value, ω.
Since there are no tuples in Employee with a DeptName of Production, ωs occur in the Name and EmpId attributes of the resulting relation where tuples in Dept had DeptName of Production.
Let s1, s2, …, sn be the attributes of the relation S and let {(ω, …, ω)} be the singleton relation on the attributes that are unique to the relation R (those that are not attributes of S). Then, as with the left outer join, the right outer join can be simulated using the natural join as follows:
Full outer join (⟗)[edit]
The outer join or full outer join in effect combines the results of the left and right outer joins.
The full outer join is written as R ⟗ S where R and S are relations.[f] The result of the full outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that have no matching tuples in R and tuples in R that have no matching tuples in S in their common attribute names.[citation needed]
For an example consider the tables Employee and Dept and their full outer join:
In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take a null value, ω. Tuples in S which have no common values in common attribute names with tuples in R also take a null value, ω.
The full outer join can be simulated using the left and right outer joins (and hence the natural join and set union) as follows:
- R ⟗ S = (R ⟕ S) ∪ (R ⟖ S)
Operations for domain computations[edit]
There is nothing in relational algebra introduced so far that would allow computations on the data domains (other than evaluation of propositional expressions involving equality). For example, it is not possible using only the algebra introduced so far to write an expression that would multiply the numbers from two columns, e.g. a unit price with a quantity to obtain a total price. Practical query languages have such facilities, e.g. the SQL SELECT allows arithmetic operations to define new columns in the result
SELECT unit_price * quantity AS total_price FROM t
, and a similar facility is provided more explicitly by Tutorial D’s
EXTEND
keyword.[5] In database theory, this is called extended projection.[6]: 213
Aggregation[edit]
Furthermore, computing various functions on a column, like the summing up of its elements, is also not possible using the relational algebra introduced so far. There are five aggregate functions that are included with most relational database systems. These operations are Sum, Count, Average, Maximum and Minimum. In relational algebra the aggregation operation over a schema (A1, A2, … An) is written as follows:
where each Aj’, 1 ≤ j ≤ k, is one of the original attributes Ai, 1 ≤ i ≤ n.
The attributes preceding the g are grouping attributes, which function like a “group by” clause in SQL. Then there are an arbitrary number of aggregation functions applied to individual attributes. The operation is applied to an arbitrary relation r. The grouping attributes are optional, and if they are not supplied, the aggregation functions are applied across the entire relation to which the operation is applied.
Let’s assume that we have a table named Account with three columns, namely Account_Number, Branch_Name and Balance. We wish to find the maximum balance of each branch. This is accomplished by Branch_NameGMax(Balance)(Account). To find the highest balance of all accounts regardless of branch, we could simply write GMax(Balance)(Account).
Grouping is often written as Branch_NameɣMax(Balance)(Account) instead.[6]
Transitive closure[edit]
Although relational algebra seems powerful enough for most practical purposes, there are some simple and natural operators on relations that cannot be expressed by relational algebra. One of them is the transitive closure of a binary relation. Given a domain D, let binary relation R be a subset of D×D. The transitive closure R+ of R is the smallest subset of D×D that contains R and satisfies the following condition:
It can be proved using the fact that there is no relational algebra expression E(R) taking R as a variable argument that produces R+.[7]
SQL however officially supports such fixpoint queries since 1999, and it had vendor-specific extensions in this direction well before that.
SELECT (σ)
The SELECT operation is used for selecting a subset of the tuples according to a given selection condition. Sigma(σ)Symbol denotes it. It is used as an expression to choose tuples which meet the selection condition. Select operator selects tuples that satisfy a given predicate.
σp(r)
is the predicate
stands for relation which is the name of the table
is prepositional logic
Example 1
σ topic = “Database” (Tutorials)
Output – Selects tuples from Tutorials where topic = ‘Database’.
Example 2
σ topic = “Database” and author = “guru99″( Tutorials)
Output – Selects tuples from Tutorials where the topic is ‘Database’ and ‘author’ is guru99.
Example 3
σ sales > 50000 (Customers)
Output – Selects tuples from Customers where sales is greater than 50000
Implementations[edit]
The first query language to be based on Codd’s algebra was Alpha, developed by Dr. Codd himself. Subsequently, ISBL was created, and this pioneering work has been acclaimed by many authorities[8] as having shown the way to make Codd’s idea into a useful language. Business System 12 was a short-lived industry-strength relational DBMS that followed the ISBL example.
In 1998 Chris Date and Hugh Darwen proposed a language called Tutorial D intended for use in teaching relational database theory, and its query language also draws on ISBL’s ideas. Rel is an implementation of Tutorial D.
Even the query language of SQL is loosely based on a relational algebra, though the operands in SQL (tables) are not exactly relations and several useful theorems about the relational algebra do not hold in the SQL counterpart (arguably to the detriment of optimisers and/or users). The SQL table model is a bag (multiset), rather than a set. For example, the expression is a theorem for relational algebra on sets, but not for relational algebra on bags; for a treatment of relational algebra on bags see chapter 5 of the “Complete” textbook by Garcia-Molina, Ullman and Widom.[6]
Types of Joins
There can be more than one way to join the database tables. So different types of Joins are:-
- Inner Join
- Natural Join
- Outer Join
Inner Join
It selects the values present in both the Table performing Inner join.
-
Inner Join is further classified into
- Theta Join
- Equi Join
Theta Join
Theta Join is used to join two tables based on some conditions. The condition can be on any attributes of the tables performing Theta join. Any comparison operator can be used in the condition.
A ⋈θ B where θ is the condition for join.
Let’s understand Theta Join with the Boys and Interest tables used above :
What if we want to find all the boys student in class 12th who like chess and have percentage greater than 70%. How can we find it out with the help of Theta join?
Theta Join – Boys ⋈(Boys.ID = Interest.ID and Interest.Sport = Chess and Boys.Percentage > 70 ) Interest So the condition here is Boys.ID = Interest.ID and Interest.Sport = Chess , so while performing join, we will have to check this condition every time two rows are joined.
The result of Theta Join will be:-
ID | Name | Percentage | Gender | Sport |
Tejan | 84 | Chess | ||
Ravi | 79 | Chess |
Equi Join
Equi join is same as Theta Join, but the only condition is it only uses equivalence condition while performing join between two tables.
A ⋈(… = …) B, where (… = … ) is the equivalence condition on any of the attributes of the joining table.
In the above example, what if we are told to find out all the students of class 12th who have interest in chess only?
We can perform Equi join as :
Equi join: Boys ⋈(Boys.ID = Interset.ID and Interest.Sport = Chess) Interest
Result after performing Equi join:
ID | Name | Percentage | Gender | Sport |
Tejan | 84 | Chess | ||
Rohan | 56 | Chess | ||
Ravi | 79 | Chess |
Natural Join
Natural join is also considered a type of inner join but it does not use any comparison operator for join condition. It joins the table only when the two tables have at least one common attribute with same name and domain.
In the result of the Natural Join the common attribute only appears once.
It will be more clear with help of an example :
What if we are told to find all the Students of class 12th and their sports interest we can apply Natural Join as :
Natural Join: Boys ⋈ Interest
So when we perform Natural Join on table Boys and table Interest they both have a common attribute ID and have the same domain.
So, the Result of Natural Join will be:
ID | Name | Percentage | Gender | Sport |
Amit | 75 | Chess | ||
Saiz | 65 | Cricket | ||
Tejan | 84 | Chess | ||
Rohit | 85 | Cricket | ||
Rohan | 56 | Chess | ||
Ravi | 79 | Chess |
In the table the common attribute ID is only displayed once in the result.
Outer Join
Outer Join in Relational algebra returns all the attributes of both the table depending on the condition. If some attribute value is not present for any one of the tables it returns NULL in the respective row of the table attribute.
-
It is further classified as:
- Left Outer Join
- Right Outer Join
- Full Outer Join
Let’s see how these Joins are performed.
Left Outer Join
It returns all the rows of the left table even if there is no matching row for it in the right table performing Left Outer Join.
A Left Outer Join B
Let’s perform Left Outer Join on table Boys and Interest and find out all the boys of class 12th and their sports interest.
If we perform Left Outer Join on table Boys and table Interest such that Boys.ID = Interest.ID . Then Result of the Join will be:
Boys.ID | Boys.Name | Boys.Percentage | Interest.ID | Interest.Name | Interest.Gender | Interest.Sport |
Rohan | 56 | Rohan | Chess | |||
Rohit | 85 | Rohan | Chess | |||
Amit | 75 | Rohan | Chess | |||
Ravi | 79 | Rohan | Chess | |||
Saiz | 65 | Rohan | Chess | |||
Tejan | 84 | Rohan | Chess | |||
Rishabh | 75 | NUll | NULL | NULL | NULL |
Clearly, we can observe that all the rows of the left table, i.e., table Boys is present in the result.
Right Outer Join
It returns all the rows of the second table even if there is no matching row for it in the first table performing Right Outer Join.
ARight Outer Join B
Let’s perform Right Outer Join on table Boys and Interest and find out all the boys of class 12th and their sports interest.
If we perform Right Outer Join on table Boys and table Interest such that Boys.ID = Interest.ID . Then Result of the join will be:
Boys.ID | Boys.Name | Boys.Percentage | Interest.ID | Interest.Name | Interest.Gender | Interest.Sport |
Rohan | 56 | Rohan | Chess | |||
Rohit | 85 | Rohan | Chess | |||
Amit | 75 | Rohan | Chess | |||
Ravi | 79 | Rohan | Chess | |||
Saiz | 65 | Rohan | Chess | |||
Tejan | 84 | Rohan | Chess | |||
NULL | NULL | NULL | 23 | Aman | Chess | |
NULL | NULL | NULL | 10 | Shreya | Badminton | |
NULL | NULL | NULL | 15 | Sakshi | Chess | |
NULL | NULL | NULL | 16 | Tejan | Chess | |
NULL | NULL | NULL | 35 | Shubhi | Cricket |
Clearly, we can observe that all the rows of the right table, i.e., table Interest is present in the result.
Full Outer Join
It returns all the rows of the first and second Table.
A Full Outer Join B
Let’s perform Full Outer Join on table Boys and Interest and find out all the boys of class 12th and their sports interest.
If we perform Full Outer Join on Table Boys and Table Interest such that Boys.ID = Interest.ID . Then the result of the join will be:
Boys.ID | Boys.Name | Boys.Percentage | Interest.ID | Interest.Name | Interest.Gender | Interest.Sport |
Rohan | 56 | Rohan | Chess | |||
Rohit | 85 | Rohan | Chess | |||
Amit | 75 | Rohan | Chess | |||
Ravi | 79 | Rohan | Chess | |||
Saiz | 65 | Rohan | Chess | |||
Tejan | 84 | Rohan | Chess | |||
Rishabh | 75 | NUll | NULL | NULL | NULL | |
NULL | NULL | NULL | 23 | Aman | Chess | |
NULL | NULL | NULL | 10 | Shreya | Badminton | |
NULL | NULL | NULL | 15 | Sakshi | Chess | |
NULL | NULL | NULL | 16 | Tejan | Chess | |
NULL | NULL | NULL | 35 | Shubhi | Cricket |
Clearly, we can observe that all the rows of the right table and left Table, i.e., Table B and A are present in the result.
Disadvantages of Inner Join
- Data Loss: Since inner joins only return rows that have matching values in both tables, some data may be lost if there are no matching values.
- Query Complexity: Inner joins can become complex and difficult to write and understand when working with multiple tables.
- Overlapping Data: In some cases, inner joins may return overlapping data that needs to be deduplicated in post-processing.
For more, refer to Inner Join in SQL.
Natural Join (⋈)
Natural join does not use any comparison operator. It does not concatenate the way a Cartesian product does. We can perform a Natural Join only if there is at least one common attribute that exists between two relations. In addition, the attributes must have the same name and domain.
Natural join acts on those matching attributes where the values of attributes in both the relations are same.
Courses | ||
CID | Course | Dept |
CS01 | Database | CS |
ME01 | Mechanics | ME |
EE01 | Electronics | EE |
HoD | |
Dept | Head |
CS | Alex |
ME | Maya |
EE | Mira |
Courses ⋈ HoD | |||
Dept | CID | Course | Head |
CS | CS01 | Database | Alex |
ME | ME01 | Mechanics | Maya |
EE | EE01 | Electronics | Mira |
Set Difference (-)
– Symbol denotes it. The result of A – B, is a relation which includes all tuples that are in A but not in B.
- The attribute name of A has to match with the attribute name in B.
- The two-operand relations A and B should be either compatible or Union compatible.
- It should be defined relation consisting of the tuples that are in relation A, but not in B.
Example
A-B
column 1 | column 2 |
What is Join in DBMS?
Joins in relational algebra are simply cartesian products followed by selection.
In the above example, if we combine both the Boys table and Interest table such that the ID of students in the Boys table is same as the IDs of students in Interest table, then it will be easy for us to filter out the desired result of all the boys student of class 12th who are intrested in Cricket.
If we perform Inner Join on both tables with one condition as :
Boys ⋈(Boys.ID = Interest.ID and Interest.Sport=Cricket) Interest
The join condition (Boys.ID = Interest.ID and Interest.Sport=Cricket) first performs Cartesian product on both tables and then makes selection to give only those class 12th boys who are interested in Cricket.
The Result of the above Relational algebra query will be :
ID | Name | Gender | Sport |
Amit | Cricket | ||
Saiz | Cricket | ||
Rohit | Cricket |
External links[edit]
This article’s use of external links may not follow Wikipedia’s policies or guidelines. (January 2017) |
- RAT. Software Relational Algebra Translator to SQL
- Lecture Videos: Relational Algebra Processing – An introduction to how database systems process relational algebra
- Lecture Notes: Relational Algebra – A quick tutorial to adapt SQL queries into relational algebra
- Relational – A graphic implementation of the relational algebra
-
Query Optimization
(Page deleted; Closest alternatives: Standford Query Optimization 2, Microsoft research Query Optimization in relational systems, Stanford paper: Query Optimization)This paper is an introduction into the use of the relational algebra in optimizing queries, and includes numerous citations for more in-depth study. - Relational Algebra System for Oracle and Microsoft SQL Server
- Pireal – An experimental educational tool for working with Relational Algebra
- DES – An educational tool for working with Relational Algebra and other formal languages
- RelaX – Relational Algebra Calculator (open-source software available as an online service without registration)
- RA: A Relational Algebra Interpreter
- Translating SQL to Relational Algebra
Relational algebra
In database theory, relational algebra is a theory that uses algebraic structures for modeling data, and defining queries on it with a well founded semantics. The theory was introduced by Edgar F. Codd.
The main application of relational algebra is to provide a theoretical foundation for relational databases, particularly query languages for such databases, chief among which is SQL. Relational databases store tabular data represented as relations. Queries over relational databases often likewise return tabular data represented as relations.
The main purpose of relational algebra is to define operators that transform one or more input relations to an output relation. Given that these operators accept relations as input and produce relations as output, they can be combined and used to express complex queries that transform multiple input relations (whose data are stored in the database) into a single output relation (the query results).
Unary operators accept a single relation as input. Examples include operators to filter certain attributes (columns) or tuples (rows) from an input relation. Binary operators accept two relations as input and combine them into a single output relation. For example, taking all tuples found in either relation (union), removing tuples from the first relation found in the second relation (difference), extending the tuples of the first relation with tuples in the second relation matching certain conditions, and so forth.
Other more advanced operators can also be included, where the inclusion or exclusion of certain operators gives rise to a family of algebras.
Left Outer Join(R S)
All the tuples from the Left relation, R, are included in the resulting relation. If there are tuples in R without any matching tuple in the Right relation S, then the S-attributes of the resulting relation are made NULL.
Left | |
100 | Database |
101 | Mechanics |
102 | Electronics |
Right | |
100 | Alex |
102 | Maya |
104 | Mira |
Courses | |||
100 | Database | 100 | Alex |
101 | Mechanics | — | — |
102 | Electronics | 102 | Maya |
Projection(π)
The projection eliminates all attributes of the input relation but those mentioned in the projection list. The projection method defines a relation that contains a vertical subset of Relation.
This helps to extract the values of specified attributes to eliminates duplicate values. (pi) symbol is used to choose attributes from a relation. This operator helps you to keep specific columns from a relation and discards the other columns.
Example of Projection:
Consider the following table
CustomerID | CustomerName | Status |
Active | ||
Amazon | Active | |
Apple | Inactive | |
Alibaba | Active |
Here, the projection of CustomerName and status will give
Π CustomerName, Status (Customers)
CustomerName | Status |
Active | |
Amazon | Active |
Apple | Inactive |
Alibaba | Active |
Disadvantages of Outer Join
- Slow Query Execution: Outer joins can be slower to execute than inner joins, especially when dealing with large data sets.
- Data Duplication: Outer joins can return duplicate data when the secondary table has multiple matching records.
- Data Quality: Outer joins may return inaccurate or incomplete data if the tables being joined have incomplete or inconsistent data.
Unlock the Power of Placement Preparation!Feeling lost in OS, DBMS, CN, SQL, and DSA chaos? Our Complete Interview Preparation Course is the ultimate guide to conquer placements. Trusted by over 100,000+ geeks, this course is your roadmap to interview triumph.Ready to dive in? Explore our Free Demo Content and join our Complete Interview Preparation course.
Last Updated :
08 Jun, 2023
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment…
Joins in DBMS
This article delves into the pivotal concept of ‘join’ in Database Management Systems (DBMS). A ‘join’ merges rows from multiple tables based on related columns, facilitating simultaneous data retrieval from interconnected tables. Exploring SQL’s JOIN clause, readers will grasp its syntax, various types, practical examples, and frequently asked questions. Essentially, JOIN in SQL encapsulates the act of amalgamating tables, enhancing data querying capabilities across databases.
Summary
Operation(Symbols) | Purpose |
Select(σ) | The SELECT operation is used for selecting a subset of the tuples according to a given selection condition |
Projection(π) | The projection eliminates all attributes of the input relation but those mentioned in the projection list. |
Union Operation(∪) | UNION is symbolized by symbol. It includes all tuples that are in tables A or in B. |
Set Difference(-) | – Symbol denotes it. The result of A – B, is a relation which includes all tuples that are in A but not in B. |
Intersection(∩) | Intersection defines a relation consisting of a set of all tuple that are in both A and B. |
Cartesian Product(X) | Cartesian operation is helpful to merge columns from two relations. |
Inner Join | Inner join, includes only those tuples that satisfy the matching criteria. |
Theta Join(θ) | The general case of JOIN operation is called a Theta join. It is denoted by symbol θ. |
EQUI Join | When a theta join uses only equivalence condition, it becomes a equi join. |
Natural Join(⋈) | Natural join can only be performed if there is a common attribute (column) between the relations. |
Outer Join | In an outer join, along with tuples that satisfy the matching criteria. |
Left Outer Join( | In the left outer join, operation allows keeping all tuple in the left relation. |
Right Outer join( | In the right outer join, operation allows keeping all tuple in the right relation. |
Full Outer Join( | In a full outer join, all tuples from both relations are included in the result irrespective of the matching condition. |
Inner Join and Outer Join are the types of join. The inner join has the work to return the common rows between the two tables, whereas the Outer Join has the work of returning the work of the inner join in addition to the rows which are not matched.
Let’s discuss both of them in detail in this article. Before moving ahead, let’s discuss what is Join in SQL.
In a relational database management system (RDBMS), there are different types of joins that can be used to combine data from two or more tables in a database. The two most common types of joins are Inner Join and Outer Join.
Basic SQL Relational Algebra Operations
Relational Algebra devided in various groups
Unary Relational Operations
- SELECT (symbol: σ)
- PROJECT (symbol: π)
- RENAME (symbol: ρ)
Relational Algebra Operations From Set Theory
- UNION (υ)
- INTERSECTION ( ),
- DIFFERENCE (-)
- CARTESIAN PRODUCT ( x )
Binary Relational Operations
- JOIN
- DIVISION
Let’s study them in detail with solutions:
Inner Join/Simple Join
In an INNER join, it allows retrieving data from two tables with the same ID.
An Inner Join returns only the matching rows between the two tables based on a specified condition. It combines data from two tables based on a common column between them, which is specified using the ON keyword in SQL. Only the rows that meet the join condition from both tables are returned. If a row in one table does not have a matching row in the other table, that row will not be included in the result set.
Inner Join
Syntax:
SELECT COLUMN1, COLUMN2 FROM
[TABLE 1] INNER JOIN [TABLE 2]
ON Condition;
The following is a join query that shows the names of students enrolled in different courses.
Query:
SELECT StudentCourse.CourseID,Student.StudentName
FROM Student
INNER JOIN StudentCourse
ON StudentCourse.EnrollNo = Student.EnrollNo
ORDER BY StudentCourse.CourseID;
Note: Inner is optional above. Simple JOIN is also considered as INNER JOIN The above query would produce the following result.
CourseID | StudentName |
geek1 | |
geek2 | |
geek1 | |
geek3 | |
geek1 |
Example:
For example, let’s say we have two tables, Table1 and Table2, with the following data:
Table 1
Table 2
ID | Address |
123 Main St. | |
456 Elm St. | |
789 Oak St. |
If we perform an Inner Join on these tables using the ID column, the result set would only include the matching rows from both tables, which are the rows with ID values of 1 and 2:
Query:
SELECT Table1.ID,
Table 1. Name, Table 2.Address
FROM Table1
INNER JOIN Table2
ON Table1.ID = Table2.ID
Output:
ID | Name | Address |
John | 123 Main St. | |
Sarah | 456 Elm St. |
Introduction[edit]
Relational algebra received little attention outside of pure mathematics until the publication of E.F. Codd’s relational model of data in 1970. Codd proposed such an algebra as a basis for database query languages. (See section Implementations.)
Relational algebra operates on homogeneous sets of tuples where we commonly interpret m to be the number of rows in a table and n to be the number of columns. All entries in each column have the same type. Five primitive operators of Codd’s algebra are the selection, the projection, the Cartesian product (also called the cross product or cross join), the set union, and the set difference.
Set operators[edit]
The relational algebra uses set union, set difference, and Cartesian product from set theory, but adds additional constraints to these operators.
For set union and set difference, the two relations involved must be union-compatible—that is, the two relations must have the same set of attributes. Because set intersection is defined in terms of set union and set difference, the two relations involved in set intersection must also be union-compatible.
For the Cartesian product to be defined, the two relations involved must have disjoint headers—that is, they must not have a common attribute name.
In addition, the Cartesian product is defined differently from the one in set theory in the sense that tuples are considered to be “shallow” for the purposes of the operation. That is, the Cartesian product of a set of n-tuples with a set of m-tuples yields a set of “flattened” (n + m)-tuples (whereas basic set theory would have prescribed a set of 2-tuples, each containing an n-tuple and an m-tuple). More formally, R × S is defined as follows:
The cardinality of the Cartesian product is the product of the cardinalities of its factors, that is, |R × S| = |R| × |S|.
Projection (Π)[edit]
A projection is a unary operation written as where is a set of attribute names. The result of such projection is defined as the set that is obtained when all tuples in R are restricted to the set .
Note: when implemented in SQL standard the “default projection” returns a multiset instead of a set, and the Π projection to eliminate duplicate data is obtained by the addition of the
DISTINCT
keyword.
Selection (σ)[edit]
A generalized selection is a unary operation written as where φ is a propositional formula that consists of atoms as allowed in the normal selection and the logical operators (and), (or) and (negation). This selection selects all those tuples in R for which φ holds.
To obtain a listing of all friends or business associates in an address book, the selection might be written as . The result would be a relation containing every attribute of every unique record where isFriend is true or where isBusinessContact is true.
Rename (ρ)[edit]
A rename is a unary operation written as where the result is identical to R except that the b attribute in all tuples is renamed to an a attribute. This is simply used to rename the attribute of a relation or the relation itself.
To rename the “isFriend” attribute to “isBusinessContact” in a relation, might be used.
There is also the notation, where R is renamed to x and the attributes are renamed to .[1]
Theta (θ) Join
Theta join combines tuples from different relations provided they satisfy the theta condition. The join condition is denoted by the symbol θ.
Notation
R1 ⋈θ R2
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such that the attributes don’t have anything in common, that is R1 ∩ R2 = Φ.
Theta join can use all kinds of comparison operators.
Student | ||
SID | Name | Std |
101 | Alex | 10 |
102 | Maria | 11 |
Subjects | |
Class | Subject |
10 | Math |
10 | English |
11 | Music |
11 | Sports |
Student_Detail −
STUDENT ⋈Student.Std = Subject.Class SUBJECT
Student_detail | ||||
SID | Name | Std | Class | Subject |
101 | Alex | 10 | 10 | Math |
101 | Alex | 10 | 10 | English |
102 | Maria | 11 | 11 | Music |
102 | Maria | 11 | 11 | Sports |
Joins and join-like operators[edit]
It has been suggested that this section be split out into another article titled Join (relational algebra). (Discuss) (March 2023) |
Natural join (⋈)[edit]
Natural join (⋈) is a binary operator that is written as (R ⋈ S) where R and S are relations.[a] The result of the natural join is the set of all combinations of tuples in R and S that are equal on their common attribute names. For an example consider the tables Employee and Dept and their natural join:[citation needed]
Note that neither the employee named Mary nor the Production department appear in the result.
This can also be used to define composition of relations. For example, the composition of Employee and Dept is their join as shown above, projected on all but the common attribute DeptName. In category theory, the join is precisely the fiber product.
The natural join is arguably one of the most important operators since it is the relational counterpart of the logical AND operator. Note that if the same variable appears in each of two predicates that are connected by AND, then that variable stands for the same thing and both appearances must always be substituted by the same value (this is a consequence of the idempotence of the logical AND). In particular, natural join allows the combination of relations that are associated by a foreign key. For example, in the above example a foreign key probably holds from Employee.DeptName to Dept.DeptName and then the natural join of Employee and Dept combines all employees with their departments. This works because the foreign key holds between attributes with the same name. If this is not the case such as in the foreign key from Dept.Manager to Employee.Name then these columns must be renamed before taking the natural join. Such a join is sometimes also referred to as an equijoin (see θ-join).
More formally the semantics of the natural join are defined as follows:
-
(1)
where Fun(t) is a predicate that is true for a relation t (in the mathematical sense) iff t is a function (that is, t does not map any attribute to multiple values). It is usually required that R and S must have at least one common attribute, but if this constraint is omitted, and R and S have no common attributes, then the natural join becomes exactly the Cartesian product.
The natural join can be simulated with Codd’s primitives as follows. Assume that c1,…,cm are the attribute names common to R and S, r1,…,rn are the attribute names unique to R and s1,…,sk are the attribute names unique to S. Furthermore, assume that the attribute names x1,…,xm are neither in R nor in S. In a first step the common attribute names in S can be renamed:
-
(2)
Then we take the Cartesian product and select the tuples that are to be joined:
-
(3)
Finally we take a projection to get rid of the renamed attributes:
-
(4)
θ-join and equijoin[edit]
Consider tables Car and Boat which list models of cars and boats and their respective prices. Suppose a customer wants to buy a car and a boat, but she does not want to spend more money for the boat than for the car. The θ-join (⋈θ) on the predicate CarPrice ≥ BoatPrice produces the flattened pairs of rows which satisfy the predicate. When using a condition where the attributes are equal, for example Price, then the condition may be specified as Price=Price or alternatively (Price) itself.
In order to combine tuples from two relations where the combination condition is not simply the equality of shared attributes it is convenient to have a more general form of join operator, which is the θ-join (or theta-join). The θ-join is a binary operator that is written as or where a and b are attribute names, θ is a binary relational operator in the set {<, ≤, =, ≠, >, ≥}, υ is a value constant, and R and S are relations. The result of this operation consists of all combinations of tuples in R and S that satisfy θ. The result of the θ-join is defined only if the headers of S and R are disjoint, that is, do not contain a common attribute.
The simulation of this operation in the fundamental operations is therefore as follows:
- R ⋈θ S = σθ(R × S)
In case the operator θ is the equality operator (=) then this join is also called an equijoin.
Note, however, that a computer language that supports the natural join and selection operators does not need θ-join as well, as this can be achieved by selection from the result of a natural join (which degenerates to Cartesian product when there are no shared attributes).
In SQL implementations, joining on a predicate is usually called an inner join, and the on keyword allows one to specify the predicate used to filter the rows. It is important to note: forming the flattened Cartesian product then filtering the rows is conceptually correct, but an implementation would use more sophisticated data structures to speed up the join query.
Semijoin (⋉ and ⋊)[edit]
The left semijoin is a joining similar to the natural join and written as where and are relations.[b] The result is the set of all tuples in for which there is a tuple in that is equal on their common attribute names. The difference from a natural join is that other columns of do not appear. For example, consider the tables Employee and Dept and their semijoin:[citation needed]
More formally the semantics of the semijoin can be defined as follows:
where is as in the definition of natural join.
The semijoin can be simulated using the natural join as follows. If are the attribute names of , then
Since we can simulate the natural join with the basic operators it follows that this also holds for the semijoin.
In Codd’s 1970 paper, semijoin is called restriction.[2]
Antijoin (▷)[edit]
The antijoin, written as R ▷ S where R and S are relations,[c] is similar to the semijoin, but the result of an antijoin is only those tuples in R for which there is no tuple in S that is equal on their common attribute names.[citation needed]
For an example consider the tables Employee and Dept and their antijoin:
The antijoin is formally defined as follows:
- R ▷ S = { t : t ∈ R ∧ ¬∃s ∈ S(Fun (t ∪ s))}
or
- R ▷ S = { t : t ∈ R, there is no tuple s of S that satisfies Fun (t ∪ s)}
where Fun (t ∪ s) is as in the definition of natural join.
The antijoin can also be defined as the complement of the semijoin, as follows:
-
R ▷ S = R − R ⋉ S
(5)
Given this, the antijoin is sometimes called the anti-semijoin, and the antijoin operator is sometimes written as semijoin symbol with a bar above it, instead of ▷.
Division (÷)[edit]
The division is a binary operation that is written as R ÷ S. Division is not implemented directly in SQL. The result consists of the restrictions of tuples in R to the attribute names unique to R, i.e., in the header of R but not in the header of S, for which it holds that all their combinations with tuples in S are present in R. For an example see the tables Completed, DBProject and their division:
If DBProject contains all the tasks of the Database project, then the result of the division above contains exactly the students who have completed both of the tasks in the Database project. More formally the semantics of the division is defined as follows:
-
R ÷ S = { t[a1,…,an] : t ∈ R ∧ ∀s ∈ S ( (t[a1,…,an] ∪ s) ∈ R) }
(6)
where {a1,…,an} is the set of attribute names unique to R and t[a1,…,an] is the restriction of t to this set. It is usually required that the attribute names in the header of S are a subset of those of R because otherwise the result of the operation will always be empty.
The simulation of the division with the basic operations is as follows. We assume that a1,…,an are the attribute names unique to R and b1,…,bm are the attribute names of S. In the first step we project R on its unique attribute names and construct all combinations with tuples in S:
- T := πa1,…,an(R) × S
In the prior example, T would represent a table such that every Student (because Student is the unique key / attribute of the Completed table) is combined with every given Task. So Eugene, for instance, would have two rows, Eugene → Database1 and Eugene → Database2 in T.
-
- EG: First, let’s pretend that “Completed” has a third attribute called “grade”. It’s unwanted baggage here, so we must project it off always. In fact in this step we can drop “Task” from R as well; the multiply puts it back on.
- T := πStudent(R) × S // This gives us every possible desired combination, including those that don’t actually exist in R, and excluding others (eg Fred | compiler1, which is not a desired combination)
In the next step we subtract R from T
- U := T − R
In U we have the possible combinations that “could have” been in R, but weren’t.
-
- EG: Again with projections — T and R need to have identical attribute names/headers.
- U := T − πStudent,Task(R) // This gives us a “what’s missing” list.
So if we now take the projection on the attribute names unique to R
then we have the restrictions of the tuples in R for which not all combinations with tuples in S were present in R:
-
V := πa1,…,an(U)
- EG: Project U down to just the attribute(s) in question (Student)
- V := πStudent(U)
So what remains to be done is take the projection of R on its unique attribute names and subtract those in V:
-
W := πa1,…,an(R) − V
- EG: W := πStudent(R) − V.
Right Outer Join ( A ⟖ B )
In the right outer join, operation allows keeping all tuple in the right relation. However, if there is no matching tuple is found in the left relation, then the attributes of the left relation in the join result are filled with null values.
AB
Num | Cube | Square |
18 | ||
75 |
Common extensions[edit]
In practice the classical relational algebra described above is extended with various operations such as outer joins, aggregate functions and even transitive closure.[3]
Outer joins[edit]
Whereas the result of a join (or inner join) consists of tuples formed by combining matching tuples in the two operands, an outer join contains those tuples and additionally some tuples formed by extending an unmatched tuple in one of the operands by “fill” values for each of the attributes of the other operand. Outer joins are not considered part of the classical relational algebra discussed so far.[4]
The operators defined in this section assume the existence of a null value, ω, which we do not define, to be used for the fill values; in practice this corresponds to the NULL in SQL. In order to make subsequent selection operations on the resulting table meaningful, a semantic meaning needs to be assigned to nulls; in Codd’s approach the propositional logic used by the selection is extended to a three-valued logic, although we elide those details in this article.
Three outer join operators are defined: left outer join, right outer join, and full outer join. (The word “outer” is sometimes omitted.)
Left outer join (⟕)[edit]
The left outer join is written as R ⟕ S where R and S are relations.[d] The result of the left outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition (loosely speaking) to tuples in R that have no matching tuples in S.[citation needed]
For an example consider the tables Employee and Dept and their left outer join:
In the resulting relation, tuples in S which have no common values in common attribute names with tuples in R take a null value, ω.
Since there are no tuples in Dept with a DeptName of Finance or Executive, ωs occur in the resulting relation where tuples in Employee have a DeptName of Finance or Executive.
Let r1, r2, …, rn be the attributes of the relation R and let {(ω, …, ω)} be the singleton relation on the attributes that are unique to the relation S (those that are not attributes of R). Then the left outer join can be described in terms of the natural join (and hence using basic operators) as follows:
Right outer join (⟖)[edit]
The right outer join behaves almost identically to the left outer join, but the roles of the tables are switched.
The right outer join of relations R and S is written as R ⟖ S.[e] The result of the right outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that have no matching tuples in R.[citation needed]
For example, consider the tables Employee and Dept and their right outer join:
In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take a null value, ω.
Since there are no tuples in Employee with a DeptName of Production, ωs occur in the Name and EmpId attributes of the resulting relation where tuples in Dept had DeptName of Production.
Let s1, s2, …, sn be the attributes of the relation S and let {(ω, …, ω)} be the singleton relation on the attributes that are unique to the relation R (those that are not attributes of S). Then, as with the left outer join, the right outer join can be simulated using the natural join as follows:
Full outer join (⟗)[edit]
The outer join or full outer join in effect combines the results of the left and right outer joins.
The full outer join is written as R ⟗ S where R and S are relations.[f] The result of the full outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that have no matching tuples in R and tuples in R that have no matching tuples in S in their common attribute names.[citation needed]
For an example consider the tables Employee and Dept and their full outer join:
In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take a null value, ω. Tuples in S which have no common values in common attribute names with tuples in R also take a null value, ω.
The full outer join can be simulated using the left and right outer joins (and hence the natural join and set union) as follows:
- R ⟗ S = (R ⟕ S) ∪ (R ⟖ S)
Operations for domain computations[edit]
There is nothing in relational algebra introduced so far that would allow computations on the data domains (other than evaluation of propositional expressions involving equality). For example, it is not possible using only the algebra introduced so far to write an expression that would multiply the numbers from two columns, e.g. a unit price with a quantity to obtain a total price. Practical query languages have such facilities, e.g. the SQL SELECT allows arithmetic operations to define new columns in the result
SELECT unit_price * quantity AS total_price FROM t
, and a similar facility is provided more explicitly by Tutorial D’s
EXTEND
keyword.[5] In database theory, this is called extended projection.[6]: 213
Aggregation[edit]
Furthermore, computing various functions on a column, like the summing up of its elements, is also not possible using the relational algebra introduced so far. There are five aggregate functions that are included with most relational database systems. These operations are Sum, Count, Average, Maximum and Minimum. In relational algebra the aggregation operation over a schema (A1, A2, … An) is written as follows:
where each Aj’, 1 ≤ j ≤ k, is one of the original attributes Ai, 1 ≤ i ≤ n.
The attributes preceding the g are grouping attributes, which function like a “group by” clause in SQL. Then there are an arbitrary number of aggregation functions applied to individual attributes. The operation is applied to an arbitrary relation r. The grouping attributes are optional, and if they are not supplied, the aggregation functions are applied across the entire relation to which the operation is applied.
Let’s assume that we have a table named Account with three columns, namely Account_Number, Branch_Name and Balance. We wish to find the maximum balance of each branch. This is accomplished by Branch_NameGMax(Balance)(Account). To find the highest balance of all accounts regardless of branch, we could simply write GMax(Balance)(Account).
Grouping is often written as Branch_NameɣMax(Balance)(Account) instead.[6]
Transitive closure[edit]
Although relational algebra seems powerful enough for most practical purposes, there are some simple and natural operators on relations that cannot be expressed by relational algebra. One of them is the transitive closure of a binary relation. Given a domain D, let binary relation R be a subset of D×D. The transitive closure R+ of R is the smallest subset of D×D that contains R and satisfies the following condition:
It can be proved using the fact that there is no relational algebra expression E(R) taking R as a variable argument that produces R+.[7]
SQL however officially supports such fixpoint queries since 1999, and it had vendor-specific extensions in this direction well before that.
Join Operations
Join operation is essentially a cartesian product followed by a selection criterion.
Join operation denoted by ⋈.
JOIN operation also allows joining variously related tuples from different relations.
Types of JOIN:
Various forms of join operation are:
Inner Joins:
- Theta join
- EQUI join
- Natural join
Outer join:
- Left Outer Join
- Right Outer Join
- Full Outer Join
Further reading[edit]
Practically any academic textbook on databases has a detailed treatment of the classic relational algebra.
- Imieliński, T.; Lipski, W. (1984). “The relational model of data and cylindric algebras”. Journal of Computer and System Sciences. 28: 80–102. doi:10.1016/0022-0000(84)90077-1. (For relationship with cylindric algebras).
How To Use Inner Join?
Inner Join is basically performed by just selecting the records having the common values or the matching values in both tables. In case of no common values, no data is shown in the output.
Syntax:
Select Table1.Col_Name, Table2.Col_Name….
From Table1
Inner Join Table2
on Table1.Common_Col = Table2.Common_Col;
If there are 3 Tables present in the database, then the Inner Join works as follows:
Select Table1.Col_Name, Table2.Col_Name, Table3.Col_Name….
From ((Table1
Inner Join Table2
on Table1.Common_Col = Table2.Common_Col)
Inner Join Table3
on Table1.Common_Col = Table3.Common_Col);
Union operation (υ)
UNION is symbolized by ∪ symbol. It includes all tuples that are in tables A or in B. It also eliminates duplicate tuples. So, set A UNION set B would be expressed as:
The result <- A ∪ B
For a union operation to be valid, the following conditions must hold –
- R and S must be the same number of attributes.
- Attribute domains need to be compatible.
- Duplicate tuples should be automatically removed.
Example
Consider the following tables.
column 1 | column 2 | column 1 | column 2 |
A ∪ B gives
column 1 | column 2 |
Conclusion
- Joins are used to Join two or more tables in the Database.
- There are mainly three types of Join – Inner Join, Natural Join, Outer Join.
- Inner joins are of two types – Theta Join and Equi Join.
- Outer joins are of Three types – Left Outer Join, Right Outer Join and Full Outer Join.
- Natural Join is performed only when there is at least one matching attribute in both the tables.
- Left Outer join always returns all the rows of left table irrespective of the Join condition.
- Right Outer Join always returns all the rows of right table irrespective of the Join condition.
- Full Outer Join always returns all the Rows of both the table irrespective of the join condition.
Basic idea about relational model and basic operators in Relational Algebra: Relational Model Basic Operators in Relational Algebra Extended operators are those operators which can be derived from basic operators. There are mainly three types of extended operators in Relational Algebra:
The relations used to understand extended operators are STUDENT, STUDENT_SPORTS, ALL_SPORTS and EMPLOYEE which are shown in Table 1, Table 2, Table 3 and Table 4 respectively. STUDENT
ROLL_NO | NAME | ADDRESS | PHONE | AGE |
RAM | DELHI | 9455123451 | 18 | |
RAMESH | GURGAON | 9652431543 | 18 | |
SUJIT | ROHTAK | 9156253131 | 20 | |
SURESH | DELHI | 9156768971 | 18 |
Table 1
STUDENT_SPORTS
ROLL_NO | SPORTS |
Badminton | |
Cricket | |
Badminton | |
Badminton |
Table 2
ALL_SPORTS
Table 3
EMPLOYEE
EMP_NO | NAME | ADDRESS | PHONE | AGE |
RAM | DELHI | 9455123451 | 18 | |
NARESH | HISAR | 9782918192 | 22 | |
SWETA | RANCHI | 9852617621 | 21 | |
SURESH | DELHI | 9156768971 | 18 |
Table 4
Intersection (?): Intersection on two relations R1 and R2 can only be computed if R1 and R2 are union compatible (These two relation should have same number of attributes and corresponding attributes in two relations have same domain). Intersection operator when applied on two relations as R1?R2 will give a relation with tuples which are in R1 as well as R2. Syntax:
Relation1 ? Relation2
Example: Find a person who is student as well as employee- STUDENT ? EMPLOYEE
In terms of basic operators (union and minus) :
STUDENT ? EMPLOYEE = STUDENT + EMPLOYEE – (STUDENT U EMPLOYEE)
RESULT:
ROLL_NO | NAME | ADDRESS | PHONE | AGE |
RAM | DELHI | 9455123451 | 18 | |
SURESH | DELHI | 9156768971 | 18 |
Conditional Join(?c): Conditional Join is used when you want to join two or more relation based on some conditions. Example: Select students whose ROLL_NO is greater than EMP_NO of employees
STUDENT?c STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
In terms of basic operators (cross product and selection) :
? (STUDENT.ROLL_NO>EMPLOYEE.EMP_NO)(STUDENT×EMPLOYEE)
RESULT:
ROLL_NO | NAME | ADDRESS | PHONE | AGE | EMP_NO | NAME | ADDRESS | PHONE | AGE |
RAMESH | GURGAON | 9652431543 | 18 | RAM | DELHI | 9455123451 | 18 | ||
SUJIT | ROHTAK | 9156253131 | 20 | RAM | DELHI | 9455123451 | 18 | ||
SURESH | DELHI | 9156768971 | 18 | RAM | DELHI | 9455123451 | 18 |
Equijoin(?): Equijoin is a special case of conditional join where only equality condition holds between a pair of attributes. As values of two attributes will be equal in result of equijoin, only one attribute will be appeared in result. Example: Select students whose ROLL_NO is equal to EMP_NO of employees.
STUDENT?STUDENT.ROLL_NO=EMPLOYEE.EMP_NOEMPLOYEE
In terms? of basic operators (cross product, selection and projection) :
?(STUDENT.ROLL_NO, STUDENT.NAME, STUDENT.ADDRESS, STUDENT.PHONE, STUDENT.AGE EMPLOYEE.NAME, EMPLOYEE.ADDRESS, EMPLOYEE.PHONE, EMPLOYEE>AGE)(? (STUDENT.ROLL_NO=EMPLOYEE.EMP_NO) (STUDENT×EMPLOYEE))
RESULT:
ROLL_NO | NAME | ADDRESS | PHONE | AGE | NAME | ADDRESS | PHONE | AGE |
RAM | DELHI | 9455123451 | 18 | RAM | DELHI | 9455123451 | 18 | |
SURESH | DELHI | 9156768971 | 18 | SURESH | DELHI | 9156768971 | 18 |
Natural Join(?): It is a special case of equijoin in which equality condition hold on all attributes which have same name in relations R and S (relations on which join operation is applied). While applying natural join on two relations, there is no need to write equality condition explicitly. Natural Join will also return the similar attributes only once as their value will be same in resulting relation. Example: Select students whose ROLL_NO is equal to ROLL_NO of STUDENT_SPORTS as:
STUDENT?STUDENT_SPORTS
In terms of basic operators (cross product, selection and projection) :
?(STUDENT.ROLL_NO, STUDENT.NAME, STUDENT.ADDRESS, STUDENT.PHONE, STUDENT.AGE STUDENT_SPORTS.SPORTS)(? (STUDENT.ROLL_NO=STUDENT_SPORTS.ROLL_NO) (STUDENT×STUDENT_SPORTS))
RESULT:
ROLL_NO | NAME | ADDRESS | PHONE | AGE | SPORTS |
RAM | DELHI | 9455123451 | 18 | Badminton | |
RAMESH | GURGAON | 9652431543 | 18 | Cricket | |
RAMESH | GURGAON | 9652431543 | 18 | Badminton | |
SURESH | DELHI | 9156768971 | 18 | Badminton |
Natural Join is by default inner join because the tuples which does not satisfy the conditions of join does not appear in result set. e.g.; The tuple having ROLL_NO 3 in STUDENT does not match with any tuple in STUDENT_SPORTS, so it has not been a part of result set.
Left Outer Join(?): When applying join on two relations R and S, some tuples of R or S does not appear in result set which does not satisfy the join conditions. But Left Outer Joins gives all tuples of R in the result set. The tuples of R which do not satisfy join condition will have values as NULL for attributes of S. Example:Select students whose ROLL_NO is greater than EMP_NO of employees and details of other students as well
STUDENT?STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
RESULT
ROLL_NO | NAME | ADDRESS | PHONE | AGE | EMP_NO | NAME | ADDRESS | PHONE | AGE |
RAMESH | GURGAON | 9652431543 | 18 | RAM | DELHI | 9455123451 | 18 | ||
SUJIT | ROHTAK | 9156253131 | 20 | RAM | DELHI | 9455123451 | 18 | ||
SURESH | DELHI | 9156768971 | 18 | RAM | DELHI | 9455123451 | 18 | ||
RAM | DELHI | 9455123451 | 18 | NULL | NULL | NULL | NULL | NULL |
Right Outer Join(?): When applying join on two relations R and S, some tuples of R or S does not appear in result set which does not satisfy the join conditions. But Right Outer Joins gives all tuples of S in the result set. The tuples of S which do not satisfy join condition will have values as NULL for attributes of R. Example: Select students whose ROLL_NO is greater than EMP_NO of employees and details of other Employees as well
STUDENT?STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
RESULT:
ROLL_NO | NAME | ADDRESS | PHONE | AGE | EMP_NO | NAME | ADDRESS | PHONE | AGE |
RAMESH | GURGAON | 9652431543 | 18 | RAM | DELHI | 9455123451 | 18 | ||
SUJIT | ROHTAK | 9156253131 | 20 | RAM | DELHI | 9455123451 | 18 | ||
SURESH | DELHI | 9156768971 | 18 | RAM | DELHI | 9455123451 | 18 | ||
NULL | NULL | NULL | NULL | NULL | NARESH | HISAR | 9782918192 | 22 | |
NULL | NULL | NULL | NULL | NULL | SWETA | RANCHI | 9852617621 | 21 | |
NULL | NULL | NULL | NULL | NULL | SURESH | DELHI | 9156768971 | 18 |
Full Outer Join(?): When applying join on two relations R and S, some tuples of R or S does not appear in result set which does not satisfy the join conditions. But Full Outer Joins gives all tuples of S and all tuples of R in the result set. The tuples of S which do not satisfy join condition will have values as NULL for attributes of R and vice versa. Example:Select students whose ROLL_NO is greater than EMP_NO of employees and details of other Employees as well and other Students as well
STUDENT?STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
RESULT:
ROLL_NO | NAME | ADDRESS | PHONE | AGE | EMP_NO | NAME | ADDRESS | PHONE | AGE |
RAMESH | GURGAON | 9652431543 | 18 | RAM | DELHI | 9455123451 | 18 | ||
SUJIT | ROHTAK | 9156253131 | 20 | RAM | DELHI | 9455123451 | 18 | ||
SURESH | DELHI | 9156768971 | 18 | RAM | DELHI | 9455123451 | 18 | ||
NULL | NULL | NULL | NULL | NULL | NARESH | HISAR | 9782918192 | 22 | |
NULL | NULL | NULL | NULL | NULL | SWETA | RANCHI | 9852617621 | 21 | |
NULL | NULL | NULL | NULL | NULL | SURESH | DELHI | 9156768971 | 18 | |
RAM | DELHI | 9455123451 | 18 | NULL | NULL | NULL | NULL | NULL |
Division Operator (÷): Division operator A÷B or A/B can be applied if and only if:
- Attributes of B is proper subset of Attributes of A.
- The relation returned by division operator will have attributes = (All attributes of A – All Attributes of B)
- The relation returned by division operator will return those tuples from relation A which are associated to every B’s tuple.
÷?
The resultant of A/B is
A ÷ B
Division can be expressed in terms of Cross Product , Set Difference and Projection.
In the above example , for A/B , compute all x values that are not disqualified by some y in B.
x value is disqualified if attaching y value from B, we obtain xy tuple that is not in A.
Disqualified x values: ?x(( ?x(A) × B ) – A)
So A/B = ?x( A ) ? all disqualified tuples
A/B = ?x( A ) ? ?x(( ?x(A) × B ) – A)
In the above example , disqualified tuples are
So, the resultant is
Advantages:
Expressive Power: Extended operators allow for more complex queries and transformations that cannot be easily expressed using basic relational algebra operations.
Data Reduction: Aggregation operators, such as SUM, AVG, COUNT, and MAX, can reduce the amount of data that needs to be processed and displayed.
Data Transformation: Extended operators can be used to transform data into different formats, such as pivoting rows into columns or vice versa.
More Efficient: Extended operators can be more efficient than expressing the same query in terms of basic relational algebra operations, since they can take advantage of specialized algorithms and optimizations.
Disadvantages:
Complexity: Extended operators can be more difficult to understand and use than basic relational algebra operations. They require a deeper understanding of the underlying data and the operators themselves.
Performance: Some extended operators, such as outer joins, can be expensive in terms of performance, especially when dealing with large data sets.
Non-standardized: There is no universal set of extended operators, and different relational database management systems may implement them differently or not at all.
Data Integrity: Some extended operators, such as aggregate functions, can introduce potential problems with data integrity if not used properly. For example, using AVG on a column that contains null values can result in unexpected or incorrect results.
Overview of Relational Algebra Operators Previous Year Gate Questions https://www.geeksforgeeks.org/gate-gate-cs-2012-question-50/ https://www.geeksforgeeks.org/gate-gate-cs-2012-question-43/ Article contributed by Sonal Tuteja.
Unlock the Power of Placement Preparation!Feeling lost in OS, DBMS, CN, SQL, and DSA chaos? Our Complete Interview Preparation Course is the ultimate guide to conquer placements. Trusted by over 100,000+ geeks, this course is your roadmap to interview triumph.Ready to dive in? Explore our Free Demo Content and join our Complete Interview Preparation course.
Last Updated :
08 May, 2023
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment…
Relational Algebra
RELATIONAL ALGEBRA is a widely used procedural query language. It collects instances of relations as input and gives occurrences of relations as output. It uses various operations to perform this action. SQL Relational algebra query operations are performed recursively on a relation. The output of these operations is a new relation, which might be formed from one or more input relations.
See also[edit]
- Cartesian product
- D4 (programming language) (an implementation of D)
- Database
- Logic of relatives
- Object-role modeling
- Projection (mathematics)
- Projection (relational algebra)
- Projection (set theory)
- Relation
- Relation (database)
- Relation algebra
- Relation composition
- Relation construction
- Relational calculus
- Relational database
- Relational model
- Theory of relations
- Triadic relation
- Tuple relational calculus
- SQL
- Datalog
- Codd’s theorem
Use of algebraic properties for query optimization[edit]
Relational database management systems often include a query optimizer which attempts to determine the most efficient way to execute a given query. Query optimizers enumerate possible query plans, estimate their cost, and pick the plan with the lowest estimated cost. If queries are represented by operators from relational algebra, the query optimizer can enumerate possible query plans by rewriting the initial query using the algebraic properties of these operators.
Queries can be represented as a tree, where
- the internal nodes are operators,
- leaves are relations,
- subtrees are subexpressions.
The primary goal of the query optimizer is to transform expression trees into equivalent expression trees, where the average size of the relations yielded by subexpressions in the tree is smaller than it was before the optimization. The secondary goal is to try to form common subexpressions within a single query, or if there is more than one query being evaluated at the same time, in all of those queries. The rationale behind the second goal is that it is enough to compute common subexpressions once, and the results can be used in all queries that contain that subexpression.
Here are a set of rules that can be used in such transformations.
Selection[edit]
Rules about selection operators play the most important role in query optimization. Selection is an operator that very effectively decreases the number of rows in its operand, so if the selections in an expression tree are moved towards the leaves, the internal relations (yielded by subexpressions) will likely shrink.
Basic selection properties[edit]
Selection is idempotent (multiple applications of the same selection have no additional effect beyond the first one), and commutative (the order selections are applied in has no effect on the eventual result).
Breaking up selections with complex conditions[edit]
A selection whose condition is a conjunction of simpler conditions is equivalent to a sequence of selections with those same individual conditions, and selection whose condition is a disjunction is equivalent to a union of selections. These identities can be used to merge selections so that fewer selections need to be evaluated, or to split them so that the component selections may be moved or optimized separately.
Selection and cross product[edit]
Cross product is the costliest operator to evaluate. If the input relations have N and M rows, the result will contain rows. Therefore, it is important to decrease the size of both operands before applying the cross product operator.
This can be effectively done if the cross product is followed by a selection operator, e.g. . Considering the definition of join, this is the most likely case. If the cross product is not followed by a selection operator, we can try to push down a selection from higher levels of the expression tree using the other selection rules.
In the above case the condition A is broken up in to conditions B, C and D using the split rules about complex selection conditions, so that and B contains attributes only from R, C contains attributes only from P, and D contains the part of A that contains attributes from both R and P. Note, that B, C or D are possibly empty. Then the following holds:
Selection and set operators[edit]
Selection is distributive over the set difference, intersection, and union operators. The following three rules are used to push selection below set operations in the expression tree. For the set difference and the intersection operators, it is possible to apply the selection operator to just one of the operands following the transformation. This can be beneficial where one of the operands is small, and the overhead of evaluating the selection operator outweighs the benefits of using a smaller relation as an operand.
Selection and projection[edit]
Selection commutes with projection if and only if the fields referenced in the selection condition are a subset of the fields in the projection. Performing selection before projection may be useful if the operand is a cross product or join. In other cases, if the selection condition is relatively expensive to compute, moving selection outside the projection may reduce the number of tuples which must be tested (since projection may produce fewer tuples due to the elimination of duplicates resulting from omitted fields).
Projection[edit]
Basic projection properties[edit]
Projection is idempotent, so that a series of (valid) projections is equivalent to the outermost projection.
Projection and set operators[edit]
Projection is distributive over set union.
Projection does not distribute over intersection and set difference. Counterexamples are given by:
and
where b is assumed to be distinct from b’.
Rename[edit]
Basic rename properties[edit]
Successive renames of a variable can be collapsed into a single rename. Rename operations which have no variables in common can be arbitrarily reordered with respect to one another, which can be exploited to make successive renames adjacent so that they can be collapsed.
Rename and set operators[edit]
Rename is distributive over set difference, union, and intersection.
Product and union[edit]
Cartesian product is distributive over union.
Further reading[edit]
Practically any academic textbook on databases has a detailed treatment of the classic relational algebra.
- Imieliński, T.; Lipski, W. (1984). “The relational model of data and cylindric algebras”. Journal of Computer and System Sciences. 28: 80–102. doi:10.1016/0022-0000(84)90077-1. (For relationship with cylindric algebras).
Cartesian Product(X) in DBMS
Cartesian Product in DBMS is an operation used to merge columns from two relations. Generally, a cartesian product is never a meaningful operation when it performs alone. However, it becomes meaningful when it is followed by other operations. It is also called Cross Product or Cross Join.
Example – Cartesian product
σ column 2 = ‘1’ (A X B)
Output – The above example shows all rows from relation A and B whose column 2 has value 1
column 1 | column 2 |
Full Outer Join ( A ⟗ B)
In a full outer join, all tuples from both relations are included in the result, irrespective of the matching condition.
AB
Num | Cube | Square |
18 | ||
16 | ||
75 |
Notes[edit]
- ^ In Unicode, the bowtie symbol is ⋈ (U+22C8).
- ^ In Unicode, the ltimes symbol is ⋉ (U+22C9). The rtimes symbol is ⋊ (U+22CA)
- ^ In Unicode, the Antijoin symbol is ▷ (U+25B7).
- ^ In Unicode, the Left outer join symbol is ⟕ (U+27D5).
- ^ In Unicode, the Right outer join symbol is ⟖ (U+27D6).
- ^ In Unicode, the Full Outer join symbol is ⟗ (U+27D7).
See also[edit]
- Cartesian product
- D4 (programming language) (an implementation of D)
- Database
- Logic of relatives
- Object-role modeling
- Projection (mathematics)
- Projection (relational algebra)
- Projection (set theory)
- Relation
- Relation (database)
- Relation algebra
- Relation composition
- Relation construction
- Relational calculus
- Relational database
- Relational model
- Theory of relations
- Triadic relation
- Tuple relational calculus
- SQL
- Datalog
- Codd’s theorem
Full Outer Join: ( R S)
All the tuples from both participating relations are included in the resulting relation. If there are no matching tuples for both relations, their respective unmatched attributes are made NULL.
Courses | |||
100 | Database | 100 | Alex |
101 | Mechanics | — | — |
102 | Electronics | 102 | Maya |
— | — | 104 | Mira |
- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer’s Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
What is join operation in relational algebra (DBMS)?
Join operation combines the relation R1 and R2 with respect to a condition. It is denoted by ⋈.
The different types of join operation are as follows −
-
Theta join
-
Natural join
-
Outer join − It is further classified into following types −
-
Left outer join.
-
Right outer join.
-
Full outer join.
-
What is Join in DBMS?
Joins in relational algebra are simply cartesian products followed by selection.
In the above example, if we combine both the Boys table and Interest table such that the ID of students in the Boys table is same as the IDs of students in Interest table, then it will be easy for us to filter out the desired result of all the boys student of class 12th who are intrested in Cricket.
If we perform Inner Join on both tables with one condition as :
Boys ⋈(Boys.ID = Interest.ID and Interest.Sport=Cricket) Interest
The join condition (Boys.ID = Interest.ID and Interest.Sport=Cricket) first performs Cartesian product on both tables and then makes selection to give only those class 12th boys who are interested in Cricket.
The Result of the above Relational algebra query will be :
ID | Name | Gender | Sport |
Amit | Cricket | ||
Saiz | Cricket | ||
Rohit | Cricket |
Theta Join
The general case of JOIN operation is called a Theta join. It is denoted by symbol θ
Example
A ⋈θ B
Theta join can use any conditions in the selection criteria.
For example:
A ⋈ A.column 2 > B.column 2 (B)
column 1 | column 2 |
External links[edit]
This article’s use of external links may not follow Wikipedia’s policies or guidelines. (January 2017) |
- RAT. Software Relational Algebra Translator to SQL
- Lecture Videos: Relational Algebra Processing – An introduction to how database systems process relational algebra
- Lecture Notes: Relational Algebra – A quick tutorial to adapt SQL queries into relational algebra
- Relational – A graphic implementation of the relational algebra
-
Query Optimization
(Page deleted; Closest alternatives: Standford Query Optimization 2, Microsoft research Query Optimization in relational systems, Stanford paper: Query Optimization)This paper is an introduction into the use of the relational algebra in optimizing queries, and includes numerous citations for more in-depth study. - Relational Algebra System for Oracle and Microsoft SQL Server
- Pireal – An experimental educational tool for working with Relational Algebra
- DES – An educational tool for working with Relational Algebra and other formal languages
- RelaX – Relational Algebra Calculator (open-source software available as an online service without registration)
- RA: A Relational Algebra Interpreter
- Translating SQL to Relational Algebra
- DBMS Tutorial
- DBMS – Home
- DBMS – Overview
- DBMS – Architecture
- DBMS – Data Models
- DBMS – Data Schemas
- DBMS – Data Independence
- Entity Relationship Model
- DBMS – ER Model Basic Concepts
- DBMS – ER Diagram Representation
- DBMS – Generalization, Aggregation
- Relational Model
- DBMS – Codd’s Rules
- DBMS – Relational Data Model
- DBMS – Relational Algebra
- DBMS – ER to Relational Model
- Relational Database Design
- DBMS – Database Normalization
- DBMS – Database Joins
- Storage and File Structure
- DBMS – Storage System
- DBMS – File Structure
- Indexing and Hashing
- DBMS – Indexing
- DBMS – Hashing
- Transaction And Concurrency
- DBMS – Transaction
- DBMS – Concurrency Control
- DBMS – Deadlock
- Backup and Recovery
- DBMS – Data Backup
- DBMS – Data Recovery
- DBMS Useful Resources
- DBMS – Quick Guide
- DBMS – Useful Resources
- DBMS – Discussion
DBMS – Joins
We understand the benefits of taking a Cartesian product of two relations, which gives us all the possible tuples that are paired together. But it might not be feasible for us in certain cases to take a Cartesian product where we encounter huge relations with thousands of tuples having a considerable large number of attributes.
Join is a combination of a Cartesian product followed by a selection process. A Join operation pairs two tuples from different relations, if and only if a given join condition is satisfied.
We will briefly describe various join types in the following sections.
Right Outer Join: ( R S )
All the tuples from the Right relation, S, are included in the resulting relation. If there are tuples in S without any matching tuple in R, then the R-attributes of resulting relation are made NULL.
Courses | |||
100 | Database | 100 | Alex |
102 | Electronics | 102 | Maya |
— | — | 104 | Mira |
Left Outer Join(A ⟕ B)
In the left outer join, operation allows keeping all tuple in the left relation. However, if there is no matching tuple is found in right relation, then the attributes of right relation in the join result are filled with null values.
Consider the following 2 Tables
Num | Square |
16 |
Num | Cube |
18 | |
75 |
AB
Num | Square | Cube |
18 | ||
16 |
Advantages of Outer Join
- Increased Data Retrieval: Outer joins can retrieve more data than inner joins since they include non-matching rows in the result set.
- Data Integrity: Outer joins can help maintain data integrity by ensuring that all records in the primary table are returned, even if there are no corresponding records in the secondary table.
- Data Analysis: Outer joins can be useful for data analysis, especially when exploring relationships between data sets and identifying trends and patterns.
NATURAL JOIN (⋈)
Natural join can only be performed if there is a common attribute (column) between the relations. The name and type of the attribute must be same.
Example
Consider the following two tables
Num | Square |
Num | Cube |
27 |
C ⋈ D
Num | Square | Cube |
27 |
Outer Joins
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join includes only those tuples with matching attributes and the rest are discarded in the resulting relation. Therefore, we need to use outer joins to include all the tuples from the participating relations in the resulting relation. There are three kinds of outer joins − left outer join, right outer join, and full outer join.
Notes[edit]
- ^ In Unicode, the bowtie symbol is ⋈ (U+22C8).
- ^ In Unicode, the ltimes symbol is ⋉ (U+22C9). The rtimes symbol is ⋊ (U+22CA)
- ^ In Unicode, the Antijoin symbol is ▷ (U+25B7).
- ^ In Unicode, the Left outer join symbol is ⟕ (U+27D5).
- ^ In Unicode, the Right outer join symbol is ⟖ (U+27D6).
- ^ In Unicode, the Full Outer join symbol is ⟗ (U+27D7).
EQUI join
When a theta join uses only equivalence condition, it becomes a equi join.
For example:
A ⋈ A.column 2 = B.column 2 (B)
column 1 | column 2 |
EQUI join is the most difficult operations to implement efficiently using SQL in an RDBMS and one reason why RDBMS have essential performance problems.
Types of Joins
There can be more than one way to join the database tables. So different types of Joins are:-
- Inner Join
- Natural Join
- Outer Join
Inner Join
It selects the values present in both the Table performing Inner join.
-
Inner Join is further classified into
- Theta Join
- Equi Join
Theta Join
Theta Join is used to join two tables based on some conditions. The condition can be on any attributes of the tables performing Theta join. Any comparison operator can be used in the condition.
A ⋈θ B where θ is the condition for join.
Let’s understand Theta Join with the Boys and Interest tables used above :
What if we want to find all the boys student in class 12th who like chess and have percentage greater than 70%. How can we find it out with the help of Theta join?
Theta Join – Boys ⋈(Boys.ID = Interest.ID and Interest.Sport = Chess and Boys.Percentage > 70 ) Interest So the condition here is Boys.ID = Interest.ID and Interest.Sport = Chess , so while performing join, we will have to check this condition every time two rows are joined.
The result of Theta Join will be:-
ID | Name | Percentage | Gender | Sport |
Tejan | 84 | Chess | ||
Ravi | 79 | Chess |
Equi Join
Equi join is same as Theta Join, but the only condition is it only uses equivalence condition while performing join between two tables.
A ⋈(… = …) B, where (… = … ) is the equivalence condition on any of the attributes of the joining table.
In the above example, what if we are told to find out all the students of class 12th who have interest in chess only?
We can perform Equi join as :
Equi join: Boys ⋈(Boys.ID = Interset.ID and Interest.Sport = Chess) Interest
Result after performing Equi join:
ID | Name | Percentage | Gender | Sport |
Tejan | 84 | Chess | ||
Rohan | 56 | Chess | ||
Ravi | 79 | Chess |
Natural Join
Natural join is also considered a type of inner join but it does not use any comparison operator for join condition. It joins the table only when the two tables have at least one common attribute with same name and domain.
In the result of the Natural Join the common attribute only appears once.
It will be more clear with help of an example :
What if we are told to find all the Students of class 12th and their sports interest we can apply Natural Join as :
Natural Join: Boys ⋈ Interest
So when we perform Natural Join on table Boys and table Interest they both have a common attribute ID and have the same domain.
So, the Result of Natural Join will be:
ID | Name | Percentage | Gender | Sport |
Amit | 75 | Chess | ||
Saiz | 65 | Cricket | ||
Tejan | 84 | Chess | ||
Rohit | 85 | Cricket | ||
Rohan | 56 | Chess | ||
Ravi | 79 | Chess |
In the table the common attribute ID is only displayed once in the result.
Outer Join
Outer Join in Relational algebra returns all the attributes of both the table depending on the condition. If some attribute value is not present for any one of the tables it returns NULL in the respective row of the table attribute.
-
It is further classified as:
- Left Outer Join
- Right Outer Join
- Full Outer Join
Let’s see how these Joins are performed.
Left Outer Join
It returns all the rows of the left table even if there is no matching row for it in the right table performing Left Outer Join.
A Left Outer Join B
Let’s perform Left Outer Join on table Boys and Interest and find out all the boys of class 12th and their sports interest.
If we perform Left Outer Join on table Boys and table Interest such that Boys.ID = Interest.ID . Then Result of the Join will be:
Boys.ID | Boys.Name | Boys.Percentage | Interest.ID | Interest.Name | Interest.Gender | Interest.Sport |
Rohan | 56 | Rohan | Chess | |||
Rohit | 85 | Rohan | Chess | |||
Amit | 75 | Rohan | Chess | |||
Ravi | 79 | Rohan | Chess | |||
Saiz | 65 | Rohan | Chess | |||
Tejan | 84 | Rohan | Chess | |||
Rishabh | 75 | NUll | NULL | NULL | NULL |
Clearly, we can observe that all the rows of the left table, i.e., table Boys is present in the result.
Right Outer Join
It returns all the rows of the second table even if there is no matching row for it in the first table performing Right Outer Join.
ARight Outer Join B
Let’s perform Right Outer Join on table Boys and Interest and find out all the boys of class 12th and their sports interest.
If we perform Right Outer Join on table Boys and table Interest such that Boys.ID = Interest.ID . Then Result of the join will be:
Boys.ID | Boys.Name | Boys.Percentage | Interest.ID | Interest.Name | Interest.Gender | Interest.Sport |
Rohan | 56 | Rohan | Chess | |||
Rohit | 85 | Rohan | Chess | |||
Amit | 75 | Rohan | Chess | |||
Ravi | 79 | Rohan | Chess | |||
Saiz | 65 | Rohan | Chess | |||
Tejan | 84 | Rohan | Chess | |||
NULL | NULL | NULL | 23 | Aman | Chess | |
NULL | NULL | NULL | 10 | Shreya | Badminton | |
NULL | NULL | NULL | 15 | Sakshi | Chess | |
NULL | NULL | NULL | 16 | Tejan | Chess | |
NULL | NULL | NULL | 35 | Shubhi | Cricket |
Clearly, we can observe that all the rows of the right table, i.e., table Interest is present in the result.
Full Outer Join
It returns all the rows of the first and second Table.
A Full Outer Join B
Let’s perform Full Outer Join on table Boys and Interest and find out all the boys of class 12th and their sports interest.
If we perform Full Outer Join on Table Boys and Table Interest such that Boys.ID = Interest.ID . Then the result of the join will be:
Boys.ID | Boys.Name | Boys.Percentage | Interest.ID | Interest.Name | Interest.Gender | Interest.Sport |
Rohan | 56 | Rohan | Chess | |||
Rohit | 85 | Rohan | Chess | |||
Amit | 75 | Rohan | Chess | |||
Ravi | 79 | Rohan | Chess | |||
Saiz | 65 | Rohan | Chess | |||
Tejan | 84 | Rohan | Chess | |||
Rishabh | 75 | NUll | NULL | NULL | NULL | |
NULL | NULL | NULL | 23 | Aman | Chess | |
NULL | NULL | NULL | 10 | Shreya | Badminton | |
NULL | NULL | NULL | 15 | Sakshi | Chess | |
NULL | NULL | NULL | 16 | Tejan | Chess | |
NULL | NULL | NULL | 35 | Shubhi | Cricket |
Clearly, we can observe that all the rows of the right table and left Table, i.e., Table B and A are present in the result.
Theta join
If we join R1 and R2 other than the equal to condition then it is called theta join/ non-equi join.
Example
Consider R1 table
RegNo | Branch | Section |
CSE | ||
ECE | ||
CIVIL | ||
IT | ||
IT |
Table R2
Name | RegNo |
Bhanu | |
Priya |
R1 ⋈ R2 with condition R1.regno > R2.regno
RegNo | Branch | Section | Name | Regno |
CIVIL | Bhanu | |||
IT | Bhanu | |||
IT | Bhanu | |||
IT | Priya |
In the join operation, we select those rows from the cartesian product where R1.regno>R2.regno.
Join operation = select operation + cartesian product operation
Natural join
If we join R1 and R2 on equal condition then it is called natural join or equi join. Generally, join is referred to as natural join.
Natural join of R1 and R2 is −
{ we select those tuples from cartesian product where R1.regno=R2.regno}
R1 ⋈ R2
Regno | Branch | Section | Name |
Bhanu | |||
priya |
Outer join
It is an extension of natural join to deal with missing values of relation.
Consider R1 and R2 shown below −
Table R1
RegNo | Branch | Section |
CSE | ||
ECE | ||
CIVIL | ||
IT | ||
IT |
Table R2
Name | Regno |
Bhanu | |
Priya | |
Hari |
Outer join is of three types. These are explained below −
Left outer join
It is denoted by R1 ⋈ R2.
RegNo | Branch | Section | Name | Regno |
Bhanu | ||||
Priya | ||||
NULL | NULL | |||
NULL | NULL | |||
NULL | NULL |
Here all the tuples of R1(left table) appear in output.
The mismatching values of R2 are filled with NULL.
Left outer join = natural join + mismatch / extra tuple of R1
Right outer join
It is denoted by R1 ⋈ R2
Here all the tuples of R2 (right table) appear in output. The mismatching values of R1 are filled with NULL.
RegNo | Branch | Section | Name | Regno |
Bhanu | ||||
Priya | ||||
NULL | NULL | NULL | Hari |
Right outer join = natural join+ mismatch/extra tuple of R2.
Full outer join
It is denoted by R1 ⋈ R2.
Full outer join=left outer join U right outer join.
RegNo | Branch | Section | Name | Regno |
Bhanu | ||||
Priya | ||||
NULL | NULL | |||
NULL | NULL | |||
NULL | NULL | |||
NULL | NULL | NULL | Hari |
Example
Given below is the picture of full outer join −
Kickstart Your Career
Get certified by completing the course
Get Started
Joins in DBMS
This article delves into the pivotal concept of ‘join’ in Database Management Systems (DBMS). A ‘join’ merges rows from multiple tables based on related columns, facilitating simultaneous data retrieval from interconnected tables. Exploring SQL’s JOIN clause, readers will grasp its syntax, various types, practical examples, and frequently asked questions. Essentially, JOIN in SQL encapsulates the act of amalgamating tables, enhancing data querying capabilities across databases.
Introduction[edit]
Relational algebra received little attention outside of pure mathematics until the publication of E.F. Codd’s relational model of data in 1970. Codd proposed such an algebra as a basis for database query languages. (See section Implementations.)
Relational algebra operates on homogeneous sets of tuples where we commonly interpret m to be the number of rows in a table and n to be the number of columns. All entries in each column have the same type. Five primitive operators of Codd’s algebra are the selection, the projection, the Cartesian product (also called the cross product or cross join), the set union, and the set difference.
Set operators[edit]
The relational algebra uses set union, set difference, and Cartesian product from set theory, but adds additional constraints to these operators.
For set union and set difference, the two relations involved must be union-compatible—that is, the two relations must have the same set of attributes. Because set intersection is defined in terms of set union and set difference, the two relations involved in set intersection must also be union-compatible.
For the Cartesian product to be defined, the two relations involved must have disjoint headers—that is, they must not have a common attribute name.
In addition, the Cartesian product is defined differently from the one in set theory in the sense that tuples are considered to be “shallow” for the purposes of the operation. That is, the Cartesian product of a set of n-tuples with a set of m-tuples yields a set of “flattened” (n + m)-tuples (whereas basic set theory would have prescribed a set of 2-tuples, each containing an n-tuple and an m-tuple). More formally, R × S is defined as follows:
The cardinality of the Cartesian product is the product of the cardinalities of its factors, that is, |R × S| = |R| × |S|.
Projection (Π)[edit]
A projection is a unary operation written as where is a set of attribute names. The result of such projection is defined as the set that is obtained when all tuples in R are restricted to the set .
Note: when implemented in SQL standard the “default projection” returns a multiset instead of a set, and the Π projection to eliminate duplicate data is obtained by the addition of the
DISTINCT
keyword.
Selection (σ)[edit]
A generalized selection is a unary operation written as where φ is a propositional formula that consists of atoms as allowed in the normal selection and the logical operators (and), (or) and (negation). This selection selects all those tuples in R for which φ holds.
To obtain a listing of all friends or business associates in an address book, the selection might be written as . The result would be a relation containing every attribute of every unique record where isFriend is true or where isBusinessContact is true.
Rename (ρ)[edit]
A rename is a unary operation written as where the result is identical to R except that the b attribute in all tuples is renamed to an a attribute. This is simply used to rename the attribute of a relation or the relation itself.
To rename the “isFriend” attribute to “isBusinessContact” in a relation, might be used.
There is also the notation, where R is renamed to x and the attributes are renamed to .[1]
References[edit]
-
^ Silberschatz, Abraham; Henry F. Korth; S. Sudarshan (2020). Database system concepts (Seventh ed.). New York. p. 56. ISBN 978-0-07-802215-9. OCLC 1080554130.
{{cite book}}
: CS1 maint: location missing publisher (link) - ^ Codd, E.F. (June 1970). “A Relational Model of Data for Large Shared Data Banks”. Communications of the ACM. 13 (6): 377–387. doi:10.1145/362384.362685. S2CID 207549016.
- ^ M. Tamer Özsu; Patrick Valduriez (2011). Principles of Distributed Database Systems (3rd ed.). Springer. p. 46. ISBN 978-1-4419-8833-1.
- ^ Patrick O’Neil; Elizabeth O’Neil (2001). Database: Principles, Programming, and Performance, Second Edition. Morgan Kaufmann. p. 120. ISBN 978-1-55860-438-4.
- ^ C. J. Date (2011). SQL and Relational Theory: How to Write Accurate SQL Code. O’Reilly Media, Inc. pp. 133–135. ISBN 978-1-4493-1974-8.
- ^ a b c Hector Garcia-Molina; Jeffrey D. Ullman; Jennifer Widom (2009). Database systems: the complete book (2nd ed.). Pearson Prentice Hall. ISBN 978-0-13-187325-4.
- ^ Aho, Alfred V.; Jeffrey D. Ullman (1979). “Universality of data retrieval languages”. Proceedings of the 6th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages: 110–119. doi:10.1145/567752.567763. S2CID 3242505.
- ^ C. J. Date. “Edgar F. Codd – A.M. Turing Award Laureate”. amturing.acm.org. Retrieved 2020-12-27.
References[edit]
-
^ Silberschatz, Abraham; Henry F. Korth; S. Sudarshan (2020). Database system concepts (Seventh ed.). New York. p. 56. ISBN 978-0-07-802215-9. OCLC 1080554130.
{{cite book}}
: CS1 maint: location missing publisher (link) - ^ Codd, E.F. (June 1970). “A Relational Model of Data for Large Shared Data Banks”. Communications of the ACM. 13 (6): 377–387. doi:10.1145/362384.362685. S2CID 207549016.
- ^ M. Tamer Özsu; Patrick Valduriez (2011). Principles of Distributed Database Systems (3rd ed.). Springer. p. 46. ISBN 978-1-4419-8833-1.
- ^ Patrick O’Neil; Elizabeth O’Neil (2001). Database: Principles, Programming, and Performance, Second Edition. Morgan Kaufmann. p. 120. ISBN 978-1-55860-438-4.
- ^ C. J. Date (2011). SQL and Relational Theory: How to Write Accurate SQL Code. O’Reilly Media, Inc. pp. 133–135. ISBN 978-1-4493-1974-8.
- ^ a b c Hector Garcia-Molina; Jeffrey D. Ullman; Jennifer Widom (2009). Database systems: the complete book (2nd ed.). Pearson Prentice Hall. ISBN 978-0-13-187325-4.
- ^ Aho, Alfred V.; Jeffrey D. Ullman (1979). “Universality of data retrieval languages”. Proceedings of the 6th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages: 110–119. doi:10.1145/567752.567763. S2CID 3242505.
- ^ C. J. Date. “Edgar F. Codd – A.M. Turing Award Laureate”. amturing.acm.org. Retrieved 2020-12-27.
Advantages of Inner Join
- Reduced Data Duplication: Inner joins only return rows that have matching values in both tables being joined, which can reduce the amount of duplicate data returned in the result set.
- Efficient Query Execution: Since inner joins only involve rows that match in both tables, they can be more efficient in terms of query execution time compared to outer joins.
- Data Accuracy: Inner joins only return rows that have matching values in both tables, which can improve data accuracy by excluding irrelevant or mismatched data.
Outer Join
An Outer Join returns all the rows from one table and matching rows from the other table based on a specified condition. It combines data from two tables based on a common column between them, which is also specified using the ON keyword in SQL. In addition to the matching rows, it also includes rows from one table that do not have matching rows in the other table.
Outer Join is of three types:
- Left outer join
- Right outer join
- Full Join
Left Outer join
Left Outer Join returns all rows of a table on the left side of the join. For the rows for which there is no matching row on the right side, the result contains NULL on the right side.
Left Outer Join returns all the rows from the left table and matching rows from the right table. If a row in the left table does not have a matching row in the right table, the result set will include NULL values for the columns in the right table.
Left Outer Join
Syntax:
SELECT T1.C1, T2.C2
FROM TABLE T1
LEFT JOIN TABLE T2
ON T1.C1= T2.C1;
Query:
SELECT Student.StudentName,StudentCourse.CourseID
FROM Student
LEFT OUTER JOIN StudentCourse
ON StudentCourse.EnrollNo = Student.EnrollNo
ORDER BY StudentCourse.CourseID;
Note: OUTER is optional above. Simple LEFT JOIN is also considered as LEFT OUTER JOIN
StudentName | CourseID |
geek4 | NULL |
geek2 | |
geek1 | |
geek1 | |
geek3 | |
geek1 |
For more, refer to Inner Join in SQL.
Right Outer Join
Right Outer Join is similar to Left Outer Join (Right replaces Left everywhere). Right Outer Join returns all the rows from the right table and matching rows from the left table. If a row in the right table does not have a matching row in the left table, the result set will include NULL values for the columns in the left table.
Right Outer Join
Syntax:
SELECT T1.C1, T2.C2
FROM TABLE T1
RIGHT JOIN TABLE T2
ON T1.C1= T2.C1;
Example:
Table Record
Roll_Number | Name | Age |
Harsh | 18 | |
Ankesh | 19 | |
Rupesh | 18 | |
Vaibhav | 15 | |
Naveen | 13 | |
Shubham | 15 | |
Ankit | 19 | |
Bhupesh | 18 |
Table Course
Course_ID | Roll_Number |
10 | |
11 |
Query:
SELECT Record.NAME,Course.COURSE_ID
FROM Record
RIGHT JOIN Course
ON Course.Roll_Number = Record.Roll_Number;
Output:
Name | Course_ID |
Harsh | |
Ankesh | |
Rupesh | |
Vaibhav | |
Naveen | |
NULL | |
NULL | |
NULL |
For more, refer to Right Join in SQL.
Full Outer Join
Full Outer Join contains the results of both the Left and Right outer joins. It is also known as cross-join. It will provide a mixture of two tables.
Full Outer Join returns all the rows from both tables, including matching and non-matching rows. If a row in one table does not have a matching row in the other table, the result set will include NULL values for the columns in the table that do not have a match.
Full Outer Join
Syntax:
SELECT * FROM T1
CROSS-JOIN T2;
Query:
SELECT Record.NAME,Course.COURSE_ID
FROM Record
FULL JOIN Course
ON Course.Roll_Number = Record.Roll_Number;
Output:
Name | Course_ID |
Harsh | |
Ankesh | |
Rupesh | |
Vaibhav | |
Naveen | |
Shubham | NULL |
Ankit | NULL |
Bhupesh | NULL |
NULL | |
NULL | |
NULL |
For more, refer to Outer Join in SQL.
Use of algebraic properties for query optimization[edit]
Relational database management systems often include a query optimizer which attempts to determine the most efficient way to execute a given query. Query optimizers enumerate possible query plans, estimate their cost, and pick the plan with the lowest estimated cost. If queries are represented by operators from relational algebra, the query optimizer can enumerate possible query plans by rewriting the initial query using the algebraic properties of these operators.
Queries can be represented as a tree, where
- the internal nodes are operators,
- leaves are relations,
- subtrees are subexpressions.
The primary goal of the query optimizer is to transform expression trees into equivalent expression trees, where the average size of the relations yielded by subexpressions in the tree is smaller than it was before the optimization. The secondary goal is to try to form common subexpressions within a single query, or if there is more than one query being evaluated at the same time, in all of those queries. The rationale behind the second goal is that it is enough to compute common subexpressions once, and the results can be used in all queries that contain that subexpression.
Here are a set of rules that can be used in such transformations.
Selection[edit]
Rules about selection operators play the most important role in query optimization. Selection is an operator that very effectively decreases the number of rows in its operand, so if the selections in an expression tree are moved towards the leaves, the internal relations (yielded by subexpressions) will likely shrink.
Basic selection properties[edit]
Selection is idempotent (multiple applications of the same selection have no additional effect beyond the first one), and commutative (the order selections are applied in has no effect on the eventual result).
Breaking up selections with complex conditions[edit]
A selection whose condition is a conjunction of simpler conditions is equivalent to a sequence of selections with those same individual conditions, and selection whose condition is a disjunction is equivalent to a union of selections. These identities can be used to merge selections so that fewer selections need to be evaluated, or to split them so that the component selections may be moved or optimized separately.
Selection and cross product[edit]
Cross product is the costliest operator to evaluate. If the input relations have N and M rows, the result will contain rows. Therefore, it is important to decrease the size of both operands before applying the cross product operator.
This can be effectively done if the cross product is followed by a selection operator, e.g. . Considering the definition of join, this is the most likely case. If the cross product is not followed by a selection operator, we can try to push down a selection from higher levels of the expression tree using the other selection rules.
In the above case the condition A is broken up in to conditions B, C and D using the split rules about complex selection conditions, so that and B contains attributes only from R, C contains attributes only from P, and D contains the part of A that contains attributes from both R and P. Note, that B, C or D are possibly empty. Then the following holds:
Selection and set operators[edit]
Selection is distributive over the set difference, intersection, and union operators. The following three rules are used to push selection below set operations in the expression tree. For the set difference and the intersection operators, it is possible to apply the selection operator to just one of the operands following the transformation. This can be beneficial where one of the operands is small, and the overhead of evaluating the selection operator outweighs the benefits of using a smaller relation as an operand.
Selection and projection[edit]
Selection commutes with projection if and only if the fields referenced in the selection condition are a subset of the fields in the projection. Performing selection before projection may be useful if the operand is a cross product or join. In other cases, if the selection condition is relatively expensive to compute, moving selection outside the projection may reduce the number of tuples which must be tested (since projection may produce fewer tuples due to the elimination of duplicates resulting from omitted fields).
Projection[edit]
Basic projection properties[edit]
Projection is idempotent, so that a series of (valid) projections is equivalent to the outermost projection.
Projection and set operators[edit]
Projection is distributive over set union.
Projection does not distribute over intersection and set difference. Counterexamples are given by:
and
where b is assumed to be distinct from b’.
Rename[edit]
Basic rename properties[edit]
Successive renames of a variable can be collapsed into a single rename. Rename operations which have no variables in common can be arbitrarily reordered with respect to one another, which can be exploited to make successive renames adjacent so that they can be collapsed.
Rename and set operators[edit]
Rename is distributive over set difference, union, and intersection.
Product and union[edit]
Cartesian product is distributive over union.
Keywords searched by users: inner join relational algebra
Categories: Chi tiết 13 Inner Join Relational Algebra
See more here: kientrucannam.vn
See more: https://kientrucannam.vn/vn/