Missing Data

Missing Data

In Julia, missing values in data are represented using the special object missing, which is the single instance of the type Missing.

julia> missing
missing

julia> typeof(missing)
Missing

The Missing type lets users create Vectors and DataFrame columns with missing values. Here we create a vector with a missing value and the element-type of the returned vector is Union{Missing, Int64}.

julia> x = [1, 2, missing]
3-element Array{Union{Missing, Int64},1}:
 1
 2
  missing

julia> eltype(x)
Union{Missing, Int64}

julia> Union{Missing, Int}
Union{Missing, Int64}

julia> eltype(x) == Union{Missing, Int}
true

missing values can be excluded when performing operations by using skipmissing, which returns a memory-efficient iterator.

julia> skipmissing(x)
Base.SkipMissing{Array{Union{Missing, Int64},1}}(Union{Missing, Int64}[1, 2, missing])

The output of skipmissing can be passed directly into functions as an argument. For example, we can find the sum of all non-missing values or collect the non-missing values into a new missing-free vector.

julia> sum(skipmissing(x))
3

julia> collect(skipmissing(x))
2-element Array{Int64,1}:
 1
 2

The function coalesce can be used to replace missing values with another value (note the dot, indicating that the replacement should be applied to all entries in x):

julia> coalesce.(x, 0)
3-element Array{Int64,1}:
 1
 2
 0

The Missings.jl package provides a few convenience functions to work with missing values.

The function Missings.replace returns an iterator which replaces missing elements with another value:

julia> using Missings

julia> Missings.replace(x, 1)
Missings.EachReplaceMissing{Array{Union{Missing, Int64},1},Int64}(Union{Missing, Int64}[1, 2, missing], 1)

julia> collect(Missings.replace(x, 1))
3-element Array{Int64,1}:
 1
 2
 1

julia> collect(Missings.replace(x, 1)) == coalesce.(x, 1)
true

The function Missings.T returns the element-type T in Union{T, Missing}.

julia> eltype(x)
Union{Int64, Missing}

julia> Missings.T(eltype(x))
Int64

The missings function constructs Vectors and Arrays supporting missing values, using the optional first argument to specify the element-type.

julia> missings(1)
1-element Array{Missing,1}:
 missing

julia> missings(3)
3-element Array{Missing,1}:
 missing
 missing
 missing

julia> missings(1, 3)
1×3 Array{Missing,2}:
 missing  missing  missing

julia> missings(Int, 1, 3)
1×3 Array{Union{Missing, Int64},2}:
 missing  missing  missing

See the Julia manual for more information about missing values.