PostgreSQL provides some powerful grouping functions for complex datasets. Finding the first row in each group based on an order by clause is a common task in group queries. We group the data based on certain criteria to meet our requirements.
Two methods are provided by PostgreSQL to accomplish this task:
- Using DISTINCT ON
- Using RANK function
Table of Contents
Preparing data for the Queries
Let’s take an example. We have an employee table with four columns:
- Employee Id
- Name
- Salary
- Department Id
A foreign key called Department ID refers to the Department table. The office is divided into four departments, to which each employee is assigned. Below is the data.
Now, we need details of the one employee from each department who has the maximum salary in their department. To get these details, we need to group by department ID and apply the maximum function to the salary column.
When we apply the group by salary column, it will only show the maximum salary from each department, not the details of the first row in each group.
To get the first row in each group, we can use the DISTINCT ON or RANK functions.
Get first row in each group using DISTINCT ON in PostgreSQL
PostgreSQL provides a special keyword, DISTINCT ON, which looks like an extension of DISTINCT.
DISTINCT ON will show only the first row in each group. It is mainly used for removing duplicate rows, but it can also be used to get the first row in each group.
DISTINCT ON Example
We need to pass a parameter to the DISTINCT ON function of the desired column name for which grouping is based. In this example, we pass the department_id column as a parameter to a DISTINCT ON function. Also, we need to sort the salary column in descending order.
Don’t add comma after the closing parenthesis of distinct on function.
SELECT DISTINCT ON (department_id) employee_id, name, salary, department_id
FROM employee ORDER BY department_id, salary desc
In the above screenshot, DISTINCT ON provides the details of an employee in each department, which has a maximum salary in that department.
Get the first row from each group using RANK function in PostgreSQL
The rank function internally generates a virtual table and gives a numerical rank for each partition value based on the ORDER BY clause. For example, if we apply order by descending clause to the salary column and partition by department id, then rank function group the department id, sort the data in descending order based on salary, and generate a maximum salary employee in the first row and give 1st rank to that employee, and give 2nd rank to the second employee whose salary is lower than the 1st rank employee.
Read more on the RANK function in the official PostgreSQL documentation here.
After partition, we can apply the where clause to the rank column to get the desired row. To get the first row, we can apply rank=1 in the where clause.
SELECT * FROM
(SELECT employee_id, name, salary,department_id,
RANK() OVER (PARTITION BY department_id ORDER BY salary desc) as rank
from employee) as pos where rank=1
Summary
PostgreSQL provides two methods to get the first row in each group: the DISTINCT ON and RANK functions. DISTINCT ON shows the first row from each group, and RANK assigns a rank based on a specific ordering.
Read More: