Sorting
Sorting is a fundamental component of data analysis. Basic sorting is trivial: just calling sort! will sort all columns, in place:
julia> using DataFrames, CSV
julia> iris = DataFrame(CSV.File(joinpath(dirname(pathof(DataFrames)), "../docs/src/assets/iris.csv")))
150×5 DataFrame
Row │ SepalLength SepalWidth PetalLength PetalWidth Species
│ Float64 Float64 Float64 Float64 String
─────┼──────────────────────────────────────────────────────────────────
1 │ 5.1 3.5 1.4 0.2 Iris-setosa
2 │ 4.9 3.0 1.4 0.2 Iris-setosa
3 │ 4.7 3.2 1.3 0.2 Iris-setosa
4 │ 4.6 3.1 1.5 0.2 Iris-setosa
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
148 │ 6.5 3.0 5.2 2.0 Iris-virginica
149 │ 6.2 3.4 5.4 2.3 Iris-virginica
150 │ 5.9 3.0 5.1 1.8 Iris-virginica
143 rows omitted
julia> sort!(iris)
150×5 DataFrame
Row │ SepalLength SepalWidth PetalLength PetalWidth Species
│ Float64 Float64 Float64 Float64 String
─────┼──────────────────────────────────────────────────────────────────
1 │ 4.3 3.0 1.1 0.1 Iris-setosa
2 │ 4.4 2.9 1.4 0.2 Iris-setosa
3 │ 4.4 3.0 1.3 0.2 Iris-setosa
4 │ 4.4 3.2 1.3 0.2 Iris-setosa
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
148 │ 7.7 3.0 6.1 2.3 Iris-virginica
149 │ 7.7 3.8 6.7 2.2 Iris-virginica
150 │ 7.9 3.8 6.4 2.0 Iris-virginica
143 rows omittedObserve that all columns are taken into account lexicographically when sorting the DataFrame.
You can also call the sort function to create a new DataFrame with freshly allocated sorted vectors.
In sorting DataFrames, you may want to sort different columns with different options. Here are some examples showing most of the possible options:
julia> sort!(iris, rev = true)
150×5 DataFrame
Row │ SepalLength SepalWidth PetalLength PetalWidth Species
│ Float64 Float64 Float64 Float64 String
─────┼──────────────────────────────────────────────────────────────────
1 │ 7.9 3.8 6.4 2.0 Iris-virginica
2 │ 7.7 3.8 6.7 2.2 Iris-virginica
3 │ 7.7 3.0 6.1 2.3 Iris-virginica
4 │ 7.7 2.8 6.7 2.0 Iris-virginica
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
148 │ 4.4 3.0 1.3 0.2 Iris-setosa
149 │ 4.4 2.9 1.4 0.2 Iris-setosa
150 │ 4.3 3.0 1.1 0.1 Iris-setosa
143 rows omitted
julia> sort!(iris, [:Species, :SepalWidth])
150×5 DataFrame
Row │ SepalLength SepalWidth PetalLength PetalWidth Species
│ Float64 Float64 Float64 Float64 String
─────┼──────────────────────────────────────────────────────────────────
1 │ 4.5 2.3 1.3 0.3 Iris-setosa
2 │ 4.4 2.9 1.4 0.2 Iris-setosa
3 │ 5.0 3.0 1.6 0.2 Iris-setosa
4 │ 4.9 3.0 1.4 0.2 Iris-setosa
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
148 │ 7.2 3.6 6.1 2.5 Iris-virginica
149 │ 7.9 3.8 6.4 2.0 Iris-virginica
150 │ 7.7 3.8 6.7 2.2 Iris-virginica
143 rows omitted
julia> sort!(iris, [order(:Species, by=length), order(:SepalLength, rev=true)])
150×5 DataFrame
Row │ SepalLength SepalWidth PetalLength PetalWidth Species
│ Float64 Float64 Float64 Float64 String
─────┼───────────────────────────────────────────────────────────────────
1 │ 5.8 4.0 1.2 0.2 Iris-setosa
2 │ 5.7 3.8 1.7 0.3 Iris-setosa
3 │ 5.7 4.4 1.5 0.4 Iris-setosa
4 │ 5.5 3.5 1.3 0.2 Iris-setosa
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
148 │ 5.0 2.0 3.5 1.0 Iris-versicolor
149 │ 5.0 2.3 3.3 1.0 Iris-versicolor
150 │ 4.9 2.4 3.3 1.0 Iris-versicolor
143 rows omittedKeywords used above include rev (to sort in reverse), and by (to apply a function to values before comparing them). Each keyword can either be a single value, a vector with values corresponding to individual columns, or a selector: :, Cols, All, Not, Between, or Regex.
As an alternative to using a vector values you can use order to specify an ordering for a particular column within a set of columns.
The following two examples show two ways to sort the iris dataset with the same result: :Species will be ordered in reverse order, and within groups, rows will be sorted by increasing :PetalLength:
julia> sort!(iris, [:Species, :PetalLength], rev=(true, false))
150×5 DataFrame
Row │ SepalLength SepalWidth PetalLength PetalWidth Species
│ Float64 Float64 Float64 Float64 String
─────┼──────────────────────────────────────────────────────────────────
1 │ 4.9 2.5 4.5 1.7 Iris-virginica
2 │ 6.2 2.8 4.8 1.8 Iris-virginica
3 │ 6.0 3.0 4.8 1.8 Iris-virginica
4 │ 6.3 2.7 4.9 1.8 Iris-virginica
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
148 │ 5.1 3.3 1.7 0.5 Iris-setosa
149 │ 5.1 3.8 1.9 0.4 Iris-setosa
150 │ 4.8 3.4 1.9 0.2 Iris-setosa
143 rows omitted
julia> sort!(iris, [order(:Species, rev=true), :PetalLength])
150×5 DataFrame
Row │ SepalLength SepalWidth PetalLength PetalWidth Species
│ Float64 Float64 Float64 Float64 String
─────┼──────────────────────────────────────────────────────────────────
1 │ 4.9 2.5 4.5 1.7 Iris-virginica
2 │ 6.2 2.8 4.8 1.8 Iris-virginica
3 │ 6.0 3.0 4.8 1.8 Iris-virginica
4 │ 6.3 2.7 4.9 1.8 Iris-virginica
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
148 │ 5.1 3.3 1.7 0.5 Iris-setosa
149 │ 5.1 3.8 1.9 0.4 Iris-setosa
150 │ 4.8 3.4 1.9 0.2 Iris-setosa
143 rows omitted