Sorting out and removing duplicates¶
Objectives¶
Write queries that display results in a particular order.
Write queries that eliminate duplicate values from data.
Key Points¶
The records in a database table are not intrinsically ordered: if we want to display them in some order, we must specify that explicitly with
ORDER BY
.The values in a database are not guaranteed to be unique: if we want to eliminate duplicates, we must specify that explicitly as well using
DISTINCT
.
Distinct values¶
Let’s start by selecting the quantities that have been measured
from the Survey
table.
SELECT quant FROM Survey;
quant
-----
rad
sal
rad
sal
rad
sal
temp
rad
sal
temp
rad
temp
sal
rad
sal
temp
sal
rad
sal
sal
rad
This result makes it difficult to see all the different types of quant in the table.
Let’s eliminate redundant outputs using DISTINCT
.
SELECT DISTINCT quant FROM Survey;
quant
-----
rad
sal
temp
If we select more than one column, distinct sets of values are returned.
SELECT DISTINCT taken, quant FROM Survey;
taken quant
----- -----
619 rad
619 sal
622 rad
622 sal
734 rad
734 sal
734 temp
735 rad
735 sal
735 temp
751 rad
751 temp
751 sal
752 rad
752 sal
752 temp
837 rad
837 sal
844 rad
Sorting output¶
Now, let’s identify scientists using Person
table. We’ll add ORDER BY
to sort our data alphabetically.
SELECT * FROM Person ORDER BY id;
id personal family
-------- --------- --------
danforth Frank Danforth
dyer William Dyer
lake Anderson Lake
pb Frank Pabodie
roe Valentina Roerich
We can sort in DESC
for descending or ASC
for ascending order.
SELECT * FROM Person ORDER BY id DESC;
id personal family
-------- --------- --------
roe Valentina Roerich
pb Frank Pabodie
lake Anderson Lake
dyer William Dyer
danforth Frank Danforth
If we want to look at which scientists measured quantities during each visit, let’s look at the Survey table, and sort on several fields at once.
SELECT taken, person, quant FROM Survey
ORDER BY taken ASC, person DESC;
taken person quant
----- ------ -----
619 dyer rad
619 dyer sal
622 dyer rad
622 dyer sal
734 pb rad
734 pb temp
734 lake sal
735 pb rad
735 sal
735 temp
751 pb rad
751 pb temp
751 lake sal
752 roe sal
752 lake rad
752 lake sal
752 lake temp
837 roe sal
837 lake rad
837 lake sal
844 roe rad
Putting it together¶
It seems some scientists specialize in certain types of measurements. Let’s remove duplicates to check.
SELECT DISTINCT quant, person FROM Survey ORDER BY quant ASC;
quant person
----- ------
rad dyer
rad pb
rad lake
rad roe
sal dyer
sal lake
sal
sal roe
temp pb
temp
temp lake
Practice: Finding distinct dates¶
Write a query that selects distinct dates from the Visited
table.
Solution
SELECT DISTINCT dated FROM Visited;
dated
----------
1927-02-08
1927-02-10
1930-01-07
1930-01-12
1930-02-26
1932-01-14
1932-03-22