Importing and Exporting (I/O)
Importing data from tabular data files
To read data from a CSV-like file, use the readtable function:
#DataFrames.readtable — Function.
Read data from a tabular-file format (CSV, TSV, ...)
readtable(filename,[keywordoptions])
Arguments
filename::AbstractString: the filename to be read
Keyword Arguments
header::Bool– Use the information from the file's header line to determine column names. Defaults totrue.separator::Char– Assume that fields are split by theseparatorcharacter. If not specified, it will be guessed from the filename:.csvdefaults to',',.tsvdefaults to' ',.wsvdefaults to' '.quotemark::Vector{Char}– Assume that fields contained inside of twoquotemarkcharacters are quoted, which disables processing of separators and linebreaks. Set toChar[]to disable this feature and slightly improve performance. Defaults to['"'].decimal::Char– Assume that the decimal place in numbers is written using thedecimalcharacter. Defaults to'.'.nastrings::Vector{String}– Translate any of the strings into this vector into anNA. Defaults to["", "NA"].truestrings::Vector{String}– Translate any of the strings into this vector into a Booleantrue. Defaults to["T", "t", "TRUE", "true"].falsestrings::Vector{String}– Translate any of the strings into this vector into a Booleanfalse. Defaults to["F", "f", "FALSE", "false"].makefactors::Bool– Convert string columns intoPooledDataVector's for use as factors. Defaults tofalse.nrows::Int– Read onlynrowsfrom the file. Defaults to-1, which indicates that the entire file should be read.names::Vector{Symbol}– Use the values in this array as the names for all columns instead of or in lieu of the names in the file's header. Defaults to[], which indicates that the header should be used if present or that numeric names should be invented if there is no header.eltypes::Vector– Specify the types of all columns. Defaults to[].allowcomments::Bool– Ignore all text inside comments. Defaults tofalse.commentmark::Char– Specify the character that starts comments. Defaults to'#'.ignorepadding::Bool– Ignore all whitespace on left and right sides of a field. Defaults totrue.skipstart::Int– Specify the number of initial rows to skip. Defaults to0.skiprows::Vector{Int}– Specify the indices of lines in the input to ignore. Defaults to[].skipblanks::Bool– Skip any blank lines in input. Defaults totrue.encoding::Symbol– Specify the file's encoding as either:utf8or:latin1. Defaults to:utf8.normalizenames::Bool– Ensure that column names are valid Julia identifiers. For instance this renames a column named"a b"to"a_b"which can then be accessed with:a_binstead ofSymbol("a b"). Defaults totrue.
Result
::DataFrame
Examples
df=readtable("data.csv")df=readtable("data.tsv")df=readtable("data.wsv")df=readtable("data.txt",separator='')df=readtable("data.txt",header=false)
readtable requires that you specify the path of the file that you would like to read as a String. To read data from a non-file source, you may also supply an IO object. It supports many additional keyword arguments: these are documented in the section on advanced I/O operations.
Exporting data to a tabular data file
To write data to a CSV file, use the writetable function:
#DataFrames.writetable — Function.
Write data to a tabular-file format (CSV, TSV, ...)
writetable(filename,df,[keywordoptions])
Arguments
filename::AbstractString: the filename to be createddf::AbstractDataFrame: the AbstractDataFrame to be written
Keyword Arguments
separator::Char– The separator character that you would like to use. Defaults to the output ofgetseparator(filename), which uses commas for files that end in.csv, tabs for files that end in.tsvand a single space for files that end in.wsv.quotemark::Char– The character used to delimit string fields. Defaults to'"'.header::Bool– Should the file contain a header that specifies the column names fromdf. Defaults totrue.nastring::AbstractString– What to write in place of missing data. Defaults to"NA".
Result
::DataFrame
Examples
df=DataFrame(A=1:10)writetable("output.csv",df)writetable("output.dat",df,separator=',',header=false)writetable("output.dat",df,quotemark='',separator=',')writetable("output.dat",df,header=false)
Supplying DataFrames inline with non-standard string literals
You can also provide CSV-like tabular data in a non-standard string literal to construct a new DataFrame, as in the following:
df=csv""" name, age, squidPerWeek Alice, 36, 3.14 Bob, 24, 0 Carol, 58, 2.71 Eve, 49, 7.77 """
The csv string literal prefix indicates that the data are supplied in standard comma-separated value format. Common alternative formats are also available as string literals. For semicolon-separated values, with comma as a decimal, use csv2:
df=csv2""" name; age; squidPerWeek Alice; 36; 3,14 Bob; 24; 0 Carol; 58; 2,71 Eve; 49; 7,77 """
For whitespace-separated values, use wsv:
df=wsv""" name age squidPerWeek Alice 36 3.14 Bob 24 0 Carol 58 2.71 Eve 49 7.77 """
And for tab-separated values, use tsv:
df=tsv""" name age squidPerWeek Alice 36 3.14 Bob 24 0 Carol 58 2.71 Eve 49 7.77 """