Importing and Exporting Data (I/O)
CSV Files
For reading and writing tabular data from CSV and other delimited text files, use the CSV.jl package.
If you have not used the CSV.jl package before then you may need to install it first:
using Pkg
Pkg.add("CSV")
The CSV.jl functions are not loaded automatically and must be imported into the session.
using CSV
A dataset can now be read from a CSV file at path input
using
DataFrame(CSV.File(input))
A DataFrame
can be written to a CSV file at path output
using
df = DataFrame(x=1, y=2)
CSV.write(output, df)
The behavior of CSV functions can be adapted via keyword arguments. For more information, see ?CSV.File
, ?CSV.read
and ?CSV.write
, or checkout the online CSV.jl documentation.
In simple cases, when compilation latency of CSV.jl might be an issue, using the DelimitedFiles
module from the Julia standard library can be considered. Here is an example showing how to read in the data and perform its post-processing:
julia> using DelimitedFiles, DataFrames
julia> path = joinpath(pkgdir(DataFrames), "docs", "src", "assets", "iris.csv");
julia> data, header = readdlm(path, ',', header=true);
julia> iris_raw = DataFrame(data, vec(header))
150×5 DataFrame
Row │ SepalLength SepalWidth PetalLength PetalWidth Species
│ Any Any Any Any Any
─────┼──────────────────────────────────────────────────────────────────
1 │ 5.1 3.5 1.4 0.2 Iris-setosa
2 │ 4.9 3.0 1.4 0.2 Iris-setosa
3 │ 4.7 3.2 1.3 0.2 Iris-setosa
4 │ 4.6 3.1 1.5 0.2 Iris-setosa
5 │ 5.0 3.6 1.4 0.2 Iris-setosa
6 │ 5.4 3.9 1.7 0.4 Iris-setosa
7 │ 4.6 3.4 1.4 0.3 Iris-setosa
8 │ 5.0 3.4 1.5 0.2 Iris-setosa
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
144 │ 6.8 3.2 5.9 2.3 Iris-virginica
145 │ 6.7 3.3 5.7 2.5 Iris-virginica
146 │ 6.7 3.0 5.2 2.3 Iris-virginica
147 │ 6.3 2.5 5.0 1.9 Iris-virginica
148 │ 6.5 3.0 5.2 2.0 Iris-virginica
149 │ 6.2 3.4 5.4 2.3 Iris-virginica
150 │ 5.9 3.0 5.1 1.8 Iris-virginica
135 rows omitted
julia> iris = identity.(iris_raw)
150×5 DataFrame
Row │ SepalLength SepalWidth PetalLength PetalWidth Species
│ Float64 Float64 Float64 Float64 SubStrin…
─────┼──────────────────────────────────────────────────────────────────
1 │ 5.1 3.5 1.4 0.2 Iris-setosa
2 │ 4.9 3.0 1.4 0.2 Iris-setosa
3 │ 4.7 3.2 1.3 0.2 Iris-setosa
4 │ 4.6 3.1 1.5 0.2 Iris-setosa
5 │ 5.0 3.6 1.4 0.2 Iris-setosa
6 │ 5.4 3.9 1.7 0.4 Iris-setosa
7 │ 4.6 3.4 1.4 0.3 Iris-setosa
8 │ 5.0 3.4 1.5 0.2 Iris-setosa
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
144 │ 6.8 3.2 5.9 2.3 Iris-virginica
145 │ 6.7 3.3 5.7 2.5 Iris-virginica
146 │ 6.7 3.0 5.2 2.3 Iris-virginica
147 │ 6.3 2.5 5.0 1.9 Iris-virginica
148 │ 6.5 3.0 5.2 2.0 Iris-virginica
149 │ 6.2 3.4 5.4 2.3 Iris-virginica
150 │ 5.9 3.0 5.1 1.8 Iris-virginica
135 rows omitted
Observe that in our example:
header
is aMatrix
therefore we had to passvec(header)
to theDataFrame
constructor;- we broadcasted the
identity
function over theiris_raw
data frame to perform narrowing ofeltype
of columns ofiris_raw
; the reason is that read in by thereaddlm
function is stored into adata
Matrix
so all columns iniris_raw
initially have the sameeltype
– in this case it had to beAny
as some of the columns are numeric and some are string.
All such operations (and many more) are automatically handled by CSV.jl.
Similarly, you can use the writedlm
function from the DelimitedFiles
module to save a data frame like this:
writedlm("test.csv", Iterators.flatten(([names(iris)], eachrow(iris))), ',')
As you can see the code required to transform iris
into a proper input to the writedlm
function so that you can create the CSV file having the expected format is not easy. Therefore CSV.jl is the preferred package to write CSV files for data stored in data frames.
Other formats
Other data formats are supported for reading and writing in the following packages (non exhaustive list):
- Apache Arrow (including Feather v2): Arrow.jl
- Apache Feather (v1): Feather.jl
- Apache Avro: Avro.jl
- JSON: JSONTables.jl
- Parquet: Parquet2.jl
- Stata, SAS and SPSS: ReadStatTables.jl (alternatively Queryverse users can choose StatFiles.jl)
- reading R data files (.rda, .RData): RData.jl
- Microsoft Excel (XLSX): XLSX.jl
- Copying/pasting to clipboard, for sending data to and from spreadsheets: ClipData.jl