SQL Subqueries | The Most Detailed Explanation and Guide Ever
Hello nerds! Today I want to tell you about another important element in the SQL query language, namely subqueries. Subqueries are a very important thing for those who want to learn how to pull more complex data from the database and analyze it in more detail. I will try to write the most detailed and understandable guide on how to figure it out and how you can understand it. (by the way, a very detailed explanation of the logic of the join operator in SQL was recently published on our website. I advise you to read it).
In SQL, a subquery is a query nested within another query. The subquery is used to retrieve data that will be used by the main query as a filter condition or a value in a comparison. In simpler terms, a subquery allows us to filter data by adding just a few rows, instead of using the with statement, which newbies are so easily confused with. Let's look at the example of a small database. In the future, all examples of subqueries will be based on this database.
Now let's set ourselves the task of finding out all the information about employees whose wages are above average. To get started, let's just write a query that calculates the average salary for all employees.
SELECT AVG(salary) FROM employees;
Now that such a simple query is built, we can use it as a condition for comparison, a condition for inequality. SQL makes it easy for us to do this. It is enough to put the above query in brackets.
SELECT * FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);
What have we done? In fact, we combined two requests - we asked: to calculate the average salary for employees and display those who have this salary above the average. For simplicity, I would recommend that you always read such queries from a subquery. Thus, you will immediately understand which information you filter out first, and which is just general.
Someone could already guess that subqueries can and should be used in another way. You can use them in two parts of the query - after the WHERE argument and after the SELECT argument. Let's take them in order.
It is not necessary to memorize their names. The main thing is to understand their essence and how the two types of subqueries differ from each other. In a correlated subquery, the subquery references a column from the outer query. The subquery is executed once for each row in the outer query. Correlated subqueries can be slower than non-correlated subqueries because they have to be executed multiple times. In simple terms, this kind of subquery is used after the SELECT argument and can easily be replaced with a GROUP BY argument.
SELECT name, (SELECT COUNT(*) FROM employees WHERE name = e.name) AS names_count FROM employees e;
This query counts the number of employees with the same name. As I said, it could well be replaced by a GROUP BY. Here everyone decides what is more convenient. Personally, subqueries have always seemed more convenient and understandable to me. Here's what the solution to this problem would look like if you wanted to use GROUP BY:
SELECT name, AVG(salary) AS avg_salary FROM employees GROUP BY name;
GROUP BY is faster and more capacious, but again, everyone chooses what to use and in what specific case.
This type of subquery is not so easy to replace with an equivalent. In a non-correlated subquery, the subquery does not reference any column from the outer query. The subquery is executed once and its result is used by the main query.
In simple words, we use this type of subquery after the statement WHERE or FROM and this opens up huge opportunities for us to filter a huge array of data.
For example, we can find out employees whose salary is higher than the average, as at the very beginning of our article, or calculate what is the biggest difference between salaries, that is, which employees have the largest delta:
SELECT MAX(salary_delta) AS largest_salary_delta FROM ( SELECT MAX(salary) - MIN(salary) AS salary_delta FROM employees GROUP BY id ) subquery;
In this query, we first calculate the difference of all charges (we read the query from the subquery, as I recommended), and then from all the differences we select the maximum.
I will give another query example, but this time we will build a subquery after the keyword WHERE.
SELECT name, AVG(salary) WHERE (SELECT COUNT(name) FROM employees) > 1 FROM employees;
So, the entire query will return the names of all employees and their average salary, but only if there is more than one employee in the "employees" table. I'll explain in as much detail as possible:
SELECT name, AVG(salary)
This part of the code specifies the columns to be selected in the query result. It selects the "name" column and the average salary of all employees.
WHERE (SELECT COUNT(name) FROM employees) > 1
This is a WHERE condition that checks if the number of employees in the "employees" table is greater than one. It does this by using a subquery that counts the number of rows in the "employees" table. This condition is used to ensure that the AVG function is not called when there is only one employee in the table, which would result in a division by zero error.
This specifies the table from which the data is being selected. In this case, it is the "employees" table.
Feel free to understand the code, even if everyone says that it is very simple. Ask someone who is busy or, if you are still embarrassed to seem dull (which I do not advise you), seek help from AI, who will explain any code to you in great detail and clearly (but there may be errors, understand this).
You know that I love explaining technical stuff with the Euler diagram. Now, as another example, I will try to convey the received material to you visually, through these very diagrams.
Imagine we have two sets. In one set, there are numbers that are a multiple of three, that is, they are divisible by 3 without a remainder, and in the other set, there are numbers that are a multiple of 5, that is, they are divisible by 5 without a remainder. It is logical to assume that the intersection of these sets will be numbers that are multiples of both 5 and 3, that is, 15, 30, 45, and so on, as shown in my figure:
Subqueries - Euler Diagrams
This problem can be easily solved using subqueries.
SELECT * FROM B WHERE (SELECT * FROM A WHERE number%5==0);
It is a SQL query that selects all columns and rows from table B, where the result of a subquery is true. This subquery selects all columns and rows from table A where the value in the "number" column is divisible by 5 without any remainder. The main query then filters the results from table B based on the result of the subquery. Specifically, only rows from table B are returned where the subquery returns one or more rows.
I really hope that my explanation seemed to you much clearer than most on the net. First of all, I tried to explain to you the logic and principle by which subqueries are built, and not the syntax itself - you can see this for yourself. I always adhere to the idea - first understand the logic, and then write with your hands. Good luck!