Aggregates: sums, averages, and other summary values¶
Objectives¶
Write queries to compute aggregated VALUES.
Explain how missing data is handled.
Key Points¶
Use aggregation functions to combine multiple values:
min
andmax
,avg
,count
, andsum
.Aggregation functions ignore
null
values.Aggregation happens after filtering.
Use
GROUP BY
to combine subsets separately.If no aggregation function is specified for a field, the query may return an arbitrary value for that field.
Minimum and maximum¶
We want to calculate ranges and averages.
SELECT dated FROM Visited;
dated
----------
1927-02-08
1927-02-10
1930-01-07
1930-01-12
1930-02-26
-null-
1932-01-14
1932-03-22
To combine them, we will use an aggregation function like min
or max
.
SELECT min(dated) FROM Visited;
min(dated)
----------
1927-02-08
SELECT max(dated) FROM Visited;
max(dated)
----------
1932-03-22
Average, count, and sum¶
We can also use avg
, count
, and sum
.
SELECT avg(reading) FROM Survey WHERE quant='sal';
avg(reading)
----------------
7.20333333333333
SELECT count(reading) FROM Survey WHERE quant='sal';
count(reading)
--------------
9
SELECT sum(reading) FROM Survey WHERE quant='sal';
sum(reading)
------------
64.83
Multiple aggregations¶
We can make multiple aggregations in the same query, as well.
SELECT min(reading), max(reading) FROM Survey
WHERE quant='sal' AND READING <=1.0;
min(reading) max(reading)
------------ ------------
0.05 0.21
SELECT min(reading), max(reading), max(reading)-min(reading)
FROM Survey WHERE quant='sal' AND READING <=1.0;
min(reading) max(reading) max(reading)-min(reading)
------------ ------------ -------------------------
0.05 0.21 0.16
We can also combine aggregated results with raw results, though results may look strange.
SELECT person, count(*) FROM Survey WHERE quant='sal' AND reading <=1.0;
person count(*)
------ --------
dyer 7
Aggregations and NULL¶
If we try to run an aggregation when looking for information that
is not in our tables, we will get a NULL
output.
SELECT person, max(reading) FROM Survey where quant = 'missing';
person max(reading)
------ ------------
-null- -null-
When aggregating over a field with a NULL
value, the NULL
value gets skipped over. The default is to filter it out.
SELECT min(dated) FROM Visited;
min(dated)
----------
1927-02-08
Grouping results¶
GROUP BY
groups records with the same value together so that
aggregation processes each batch separately.
SELECT person, count(reading), round(avg(reading), 2)
From Survey
Where quant='rad'
GROUP BY person;
person count(reading) round(avg(reading), 2)
------ -------------- ----------------------
dyer 2 8.81
lake 2 1.82
pb 3 6.66
roe 1 11.25
We can also group by multiple fields at once.
SELECT person, quant, count(reading), round(avg(reading),2)
FROM Survey
GROUP BY person, quant;
person quant count(reading) round(avg(reading), 2)
------ ----- -------------- ----------------------
-null- sal 1 0.06
-null- temp 1 -26.0
dyer rad 2 8.81
dyer sal 2 0.11
lake rad 2 1.82
lake sal 4 0.11
lake temp 1 -16.0
pb rad 3 6.66
pb temp 2 -20.0
roe rad 1 11.25
roe sal 2 32.05
We can finish this up by removing measurements from unknown people and ordering our output.
SELECT person, quant, count(reading), round(avg(reading),2)
FROM Survey
WHERE person IS NOT NULL
GROUP BY person, quant
ORDER BY person, quant;
person quant count(reading) round(avg(reading), 2)
------ ----- -------------- ----------------------
dyer rad 2 8.81
dyer sal 2 0.11
lake rad 2 1.82
lake sal 4 0.11
lake temp 1 -16.0
pb rad 3 6.66
pb temp 2 -20.0
roe rad 1 11.25
roe sal 2 32.05
Practice: Counting temperature readings¶
How many temperature readings did Frank Pabodie record, and what was their average value?
Solution
SELECT count(reading), avg(reading) FROM Survey WHERE quant = 'temp'
AND person = 'pb';
count(reading) avg(reading)
-------------- ------------
2 -20.0