Importing and Exporting (I/O)
Importing data from tabular data files
To read data from a CSV-like file, use the readtable
function:
#DataFrames.readtable
— Function.
Read data from a tabular-file format (CSV, TSV, ...)
readtable(filename,[keywordoptions])
Arguments
filename::AbstractString
: the filename to be read
Keyword Arguments
header::Bool
– Use the information from the file's header line to determine column names. Defaults totrue
.separator::Char
– Assume that fields are split by theseparator
character. If not specified, it will be guessed from the filename:.csv
defaults to','
,.tsv
defaults to' '
,.wsv
defaults to' '
.quotemark::Vector{Char}
– Assume that fields contained inside of twoquotemark
characters are quoted, which disables processing of separators and linebreaks. Set toChar[]
to disable this feature and slightly improve performance. Defaults to['"']
.decimal::Char
– Assume that the decimal place in numbers is written using thedecimal
character. Defaults to'.'
.nastrings::Vector{String}
– Translate any of the strings into this vector into anNA
. Defaults to["", "NA"]
.truestrings::Vector{String}
– Translate any of the strings into this vector into a Booleantrue
. Defaults to["T", "t", "TRUE", "true"]
.falsestrings::Vector{String}
– Translate any of the strings into this vector into a Booleanfalse
. Defaults to["F", "f", "FALSE", "false"]
.makefactors::Bool
– Convert string columns intoPooledDataVector
's for use as factors. Defaults tofalse
.nrows::Int
– Read onlynrows
from the file. Defaults to-1
, which indicates that the entire file should be read.names::Vector{Symbol}
– Use the values in this array as the names for all columns instead of or in lieu of the names in the file's header. Defaults to[]
, which indicates that the header should be used if present or that numeric names should be invented if there is no header.eltypes::Vector
– Specify the types of all columns. Defaults to[]
.allowcomments::Bool
– Ignore all text inside comments. Defaults tofalse
.commentmark::Char
– Specify the character that starts comments. Defaults to'#'
.ignorepadding::Bool
– Ignore all whitespace on left and right sides of a field. Defaults totrue
.skipstart::Int
– Specify the number of initial rows to skip. Defaults to0
.skiprows::Vector{Int}
– Specify the indices of lines in the input to ignore. Defaults to[]
.skipblanks::Bool
– Skip any blank lines in input. Defaults totrue
.encoding::Symbol
– Specify the file's encoding as either:utf8
or:latin1
. Defaults to:utf8
.normalizenames::Bool
– Ensure that column names are valid Julia identifiers. For instance this renames a column named"a b"
to"a_b"
which can then be accessed with:a_b
instead ofSymbol("a b")
. Defaults totrue
.
Result
::DataFrame
Examples
df=readtable("data.csv")df=readtable("data.tsv")df=readtable("data.wsv")df=readtable("data.txt",separator='')df=readtable("data.txt",header=false)
readtable
requires that you specify the path of the file that you would like to read as a String
. To read data from a non-file source, you may also supply an IO
object. It supports many additional keyword arguments: these are documented in the section on advanced I/O operations.
Exporting data to a tabular data file
To write data to a CSV file, use the writetable
function:
#DataFrames.writetable
— Function.
Write data to a tabular-file format (CSV, TSV, ...)
writetable(filename,df,[keywordoptions])
Arguments
filename::AbstractString
: the filename to be createddf::AbstractDataFrame
: the AbstractDataFrame to be written
Keyword Arguments
separator::Char
– The separator character that you would like to use. Defaults to the output ofgetseparator(filename)
, which uses commas for files that end in.csv
, tabs for files that end in.tsv
and a single space for files that end in.wsv
.quotemark::Char
– The character used to delimit string fields. Defaults to'"'
.header::Bool
– Should the file contain a header that specifies the column names fromdf
. Defaults totrue
.nastring::AbstractString
– What to write in place of missing data. Defaults to"NA"
.
Result
::DataFrame
Examples
df=DataFrame(A=1:10)writetable("output.csv",df)writetable("output.dat",df,separator=',',header=false)writetable("output.dat",df,quotemark='',separator=',')writetable("output.dat",df,header=false)
Supplying DataFrame
s inline with non-standard string literals
You can also provide CSV-like tabular data in a non-standard string literal to construct a new DataFrame
, as in the following:
df=csv""" name, age, squidPerWeek Alice, 36, 3.14 Bob, 24, 0 Carol, 58, 2.71 Eve, 49, 7.77 """
The csv
string literal prefix indicates that the data are supplied in standard comma-separated value format. Common alternative formats are also available as string literals. For semicolon-separated values, with comma as a decimal, use csv2
:
df=csv2""" name; age; squidPerWeek Alice; 36; 3,14 Bob; 24; 0 Carol; 58; 2,71 Eve; 49; 7,77 """
For whitespace-separated values, use wsv
:
df=wsv""" name age squidPerWeek Alice 36 3.14 Bob 24 0 Carol 58 2.71 Eve 49 7.77 """
And for tab-separated values, use tsv
:
df=tsv""" name age squidPerWeek Alice 36 3.14 Bob 24 0 Carol 58 2.71 Eve 49 7.77 """