Dealing with missing data

Detecting missing values

  • When you have no data, the empty database field contains the word NULL

  • Because NULL is not a number, it is not possible to use = , < or > to find or compare missing values.

  • To determine if a column contains a NULL value , use IS NULL and IS NOT NULL

Removing missing values

SELECT *  
FROM Incidents
WHERE Shape IS NOT NULL

Imputing missing values (I)

What if you want to replace the missing values with another value instead of omitting them? You can do this using the ISNULL() function. Here we replace all the missing values in the Shape column using the word 'Saucer'

SELECT  Shape, ISNULL(Shape, 'Saucer') AS NewShape
FROM Incidents

You can also use ISNULL() to replace values from a different column instead of a specified word.

-- Check the IncidentState column for missing values and replace them with the City column
SELECT IncidentState, ISNULL(IncidentState,City) AS Location
FROM Incidents
-- Filter to only return missing values from IncidentState
WHERE IncidentState IS NULL

Imputing missing values (II)

What if you want to replace missing values in one column with another and want to check the replacement column to make sure it doesn't have any missing values? To do that you need to use the COALESCE statement.

SELECT Shape, City, COALESCE(Shape, City, 'Unknown') as NewShape
FROM Incidents

Replace missing values in Country with the first non-missing value from IncidentState or City, in that order. Name the new column Location

-- Replace missing values 
SELECT Country, COALESCE(Country, IncidentState, City) AS Location
FROM Incidents
WHERE Country IS NULL