Importing and Exporting Data (I/O)
CSV Files
For reading and writing tabular data from CSV and other delimited text files, use the CSV.jl package.
If you have not used the CSV.jl package before then you may need to install it first:
using Pkg
Pkg.add("CSV")The CSV.jl functions are not loaded automatically and must be imported into the session.
using CSVA dataset can now be read from a CSV file at path input using
DataFrame(CSV.File(input))A DataFrame can be written to a CSV file at path output using
df = DataFrame(x=1, y=2)
CSV.write(output, df)The behavior of CSV functions can be adapted via keyword arguments. For more information, see ?CSV.File, ?CSV.read and ?CSV.write, or checkout the online CSV.jl documentation.
In simple cases, when compilation latency of CSV.jl might be an issue, using the DelimitedFiles module from the Julia standard library can be considered. Here is an example showing how to read in the data and perform its post-processing:
julia> using DelimitedFiles, DataFrames
julia> path = joinpath(pkgdir(DataFrames), "docs", "src", "assets", "iris.csv");
julia> data, header = readdlm(path, ',', header=true);
julia> iris_raw = DataFrame(data, vec(header))
150×5 DataFrame
Row │ SepalLength SepalWidth PetalLength PetalWidth Species
│ Any Any Any Any Any
─────┼──────────────────────────────────────────────────────────────────
1 │ 5.1 3.5 1.4 0.2 Iris-setosa
2 │ 4.9 3.0 1.4 0.2 Iris-setosa
3 │ 4.7 3.2 1.3 0.2 Iris-setosa
4 │ 4.6 3.1 1.5 0.2 Iris-setosa
5 │ 5.0 3.6 1.4 0.2 Iris-setosa
6 │ 5.4 3.9 1.7 0.4 Iris-setosa
7 │ 4.6 3.4 1.4 0.3 Iris-setosa
8 │ 5.0 3.4 1.5 0.2 Iris-setosa
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
144 │ 6.8 3.2 5.9 2.3 Iris-virginica
145 │ 6.7 3.3 5.7 2.5 Iris-virginica
146 │ 6.7 3.0 5.2 2.3 Iris-virginica
147 │ 6.3 2.5 5.0 1.9 Iris-virginica
148 │ 6.5 3.0 5.2 2.0 Iris-virginica
149 │ 6.2 3.4 5.4 2.3 Iris-virginica
150 │ 5.9 3.0 5.1 1.8 Iris-virginica
135 rows omitted
julia> iris = identity.(iris_raw)
150×5 DataFrame
Row │ SepalLength SepalWidth PetalLength PetalWidth Species
│ Float64 Float64 Float64 Float64 SubStrin…
─────┼──────────────────────────────────────────────────────────────────
1 │ 5.1 3.5 1.4 0.2 Iris-setosa
2 │ 4.9 3.0 1.4 0.2 Iris-setosa
3 │ 4.7 3.2 1.3 0.2 Iris-setosa
4 │ 4.6 3.1 1.5 0.2 Iris-setosa
5 │ 5.0 3.6 1.4 0.2 Iris-setosa
6 │ 5.4 3.9 1.7 0.4 Iris-setosa
7 │ 4.6 3.4 1.4 0.3 Iris-setosa
8 │ 5.0 3.4 1.5 0.2 Iris-setosa
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
144 │ 6.8 3.2 5.9 2.3 Iris-virginica
145 │ 6.7 3.3 5.7 2.5 Iris-virginica
146 │ 6.7 3.0 5.2 2.3 Iris-virginica
147 │ 6.3 2.5 5.0 1.9 Iris-virginica
148 │ 6.5 3.0 5.2 2.0 Iris-virginica
149 │ 6.2 3.4 5.4 2.3 Iris-virginica
150 │ 5.9 3.0 5.1 1.8 Iris-virginica
135 rows omittedObserve that in our example:
headeris aMatrixtherefore we had to passvec(header)to theDataFrameconstructor;- we broadcasted the
identityfunction over theiris_rawdata frame to perform narrowing ofeltypeof columns ofiris_raw; the reason is that read in by thereaddlmfunction is stored into adataMatrixso all columns iniris_rawinitially have the sameeltype– in this case it had to beAnyas some of the columns are numeric and some are string.
All such operations (and many more) are automatically handled by CSV.jl.
Similarly, you can use the writedlm function from the DelimitedFiles module to save a data frame like this:
writedlm("test.csv", Iterators.flatten(([names(iris)], eachrow(iris))), ',')As you can see the code required to transform iris into a proper input to the writedlm function so that you can create the CSV file having the expected format is not easy. Therefore CSV.jl is the preferred package to write CSV files for data stored in data frames.
DataFrame Definition in Source Code
Sometimes, you might want to define a DataFrame with data directly in the source code. For better readability you probably want to define it in a way which resembles a table. This can be done with CSV.jl like this:
df = CSV.read("""
name,age,children
John,54.0,0
Sally,34.0,2
Roger,79.0,4
""" |> IOBuffer, DataFrame)This will result in the following DataFrame:
3×3 DataFrame
Row │ name age children
│ String7 Float64 Int64
─────┼────────────────────────────
1 │ John 54.0 0
2 │ Sally 34.0 2
3 │ Roger 79.0 4As the readability in the source code of the above format is not great, you can make use of CSV.jl's options to find a format which works for you, e.g. using Unicode box drawing characters
data = """
┌───────┬──────┬──────────┐
│ name │ age │ children │
├───────┼──────┼──────────┤
│ John │ 54.0 │ 0 │
│ Sally │ 34.0 │ 2 │
│ Roger │ 79.0 │ 4 │
└───────┴──────┴──────────┘
"""You can then create the above DataFrame object with
df = CSV.read(data |> IOBuffer, DataFrame, normalizenames=true,
header=2, skipto=4, footerskip=1, delim='│',
drop=(i, name) -> startswith(name |> String, "Column")
) .|> x -> x isa AbstractString ? strip(x) : xOther formats
Other data formats are supported for reading and writing in the following packages (non exhaustive list):
- Apache Arrow (including Feather v2): Arrow.jl
- Apache Feather (v1): Feather.jl
- Apache Avro: Avro.jl
- JSON: JSONTables.jl
- Parquet: Parquet2.jl
- Stata, SAS and SPSS: ReadStatTables.jl (alternatively Queryverse users can choose StatFiles.jl)
- reading R data files (.rda, .RData): RData.jl
- Microsoft Excel (XLSX): XLSX.jl
- Copying/pasting to clipboard, for sending data to and from spreadsheets: ClipData.jl