Filtering¶
Objectives¶
Write queries that select records to satisfy our conditions.
Explain the order in which the clauses in a query are executed.
Key Points¶
Use
WHERE
to specify conditions that records must meet in order to be included in a query’s results.Use
AND
,OR
, andNOT
to combine tests.Filtering is done on whole records, so conditions can use fields that are not actually displayed.
Write queries incrementally.
Wildcard: A character used in pattern matching. In SQL’s like operator, the wildcard “%” matches zero or more characters, so that
%able%
matches “fixable” and “tablets”.
Filtering with WHERE¶
We want to see when a particular site was visited.
We can select these records from Visited
table using a
WHERE
clause in our query.
SELECT * FROM Visited WHERE site = 'DR-1';
id site dated
--- ---- ----------
619 DR-1 1927-02-08
622 DR-1 1927-02-10
844 DR-1 1932-03-22
To explain what SQL is doing above:
Check each row in
Visited
to see what satisfiesWHERE
then,It uses the column names following
SELECT
to determine which columns to display.
This processing order means we can filter records using WHERE
based on values in
columns that aren’t displayed.
SELECT id FROM Visited WHERE site = 'DR-1';
id
---
619
622
844
Combining filters¶
We can use other boolean operators (=
, >
, <
) to filter our data.
We can combine these conditional tests with AND
or OR
.
For example, we can get information on DR-1 site collected before 1930.
SELECT * FROM Visited WHERE site='DR-1' AND dated < '1930-01-01';
id site dated
--- ---- ----------
619 DR-1 1927-02-08
622 DR-1 1927-02-10
AND
means both conditions must be true.
OR
means at least one condition has to be true.
SELECT * FROM Survey WHERE person = 'lake' OR person = 'roe';
taken person quant reading
----- ------ ----- -------
734 lake sal 0.05
751 lake sal 0.1
752 lake rad 2.19
752 lake sal 0.09
752 lake temp -16.0
752 roe sal 41.6
837 lake rad 1.46
837 lake sal 0.21
837 roe sal 22.5
844 roe rad 11.25
Parentheses¶
Another way to write this is to see if a value is in a set.
SELECT * FROM Survey WHERE person IN ('lake','roe');
taken person quant reading
----- ------ ----- -------
734 lake sal 0.05
751 lake sal 0.1
752 lake rad 2.19
752 lake sal 0.09
752 lake temp -16.0
752 roe sal 41.6
837 lake rad 1.46
837 lake sal 0.21
837 roe sal 22.5
844 roe rad 11.25
And be careful about parentheses if you are putting together a lot of tests!
SELECT * FROM Survey
WHERE quant = 'sal'
AND person = 'lake' OR person='roe';
taken person quant reading
----- ------ ----- -------
734 lake sal 0.05
751 lake sal 0.1
752 lake sal 0.09
752 roe sal 41.6
837 lake sal 0.21
837 roe sal 22.5
844 roe rad 11.25
This gives us all the measurements by 'roe'
.
What we probably meant is this:
SELECT * FROM Survey WHERE quant = 'sal' AND (person = 'lake' OR person='roe');
taken person quant reading
----- ------ ----- -------
734 lake sal 0.05
751 lake sal 0.1
752 lake sal 0.09
752 roe sal 41.6
837 lake sal 0.21
837 roe sal 22.5
Partial matches¶
We can filter by partial matches using LIKE
keyword.
The percent (%
) acts like a wildcard, matching any characters in that place:
SELECT * FROM Visited WHERE site LIKE 'DR%';
id site dated
--- ---- ----------
619 DR-1 1927-02-08
622 DR-1 1927-02-10
734 DR-3 1930-01-07
735 DR-3 1930-01-12
751 DR-3 1930-02-26
752 DR-3
844 DR-1 1932-03-22
DISTINCT and WHERE¶
Finally, we can use DISTINCT
and WHERE
to give a second level of filtering.
SELECT DISTINCT person, quant FROM Survey
WHERE person='lake' OR person = 'roe';
person quant
------ -----
lake sal
lake rad
lake temp
roe sal
roe rad
But remember, DISTINCT
is applied to the values displayed in the chosen columns,
not to all the rows being processed.
Remember, when trying to write queries, start simple and add more clauses as you go!
Practice: Fix this query¶
Suppose we want to select all sites that lie within 48 degrees of the equator. Our first query is:
SELECT * FROM Site WHERE (lat > -48) OR (lat < 48);
Explain why this is wrong, and rewrite the query so that it is correct.
Solution
Because we used OR
, a site on the South Pole for example
will still meet the second criteria and thus be included.
Instead, we want to restrict this to sites that meet both criteria:
SELECT * FROM Site WHERE (lat > -48) AND (lat < 48);name lat long ---- ------ ------- DR-3 -47.15 -126.72
Practice: Matching patterns¶
Which of these expressions are true?
'a' LIKE 'a'
'a' LIKE '%a'
'beta' LIKE '%a'
'alpha' LIKE 'a%%'
'alpha' LIKE 'a%p%'
Solution
'a' LIKE 'a'
: True because these are the same character.'a' LIKE '%a'
: True because the wildcard can match zero or more characters.'beta' LIKE '%a'
: True because the%
matchesbet
and thea
matches thea
.'alpha' LIKE 'a%%'
: True because the first wildcard matcheslpha
and the second wildcard matches zero characters (or vice versa).'alpha' LIKE 'a%p%'
: True because the first wildcard matchesl
and the second wildcard matchesha
.