edit

Data Manipulation

Joins

#Base.joinFunction.

Join two DataFrames

join(df1::AbstractDataFrame,df2::AbstractDataFrame;on::Union{Symbol,Vector{Symbol}}=Symbol[],kind::Symbol=:inner)

Arguments

  • df1, df2 : the two AbstractDataFrames to be joined

Keyword Arguments

  • on : a Symbol or Vector{Symbol}, the column(s) used as keys when joining; required argument except for kind = :cross
  • kind : the type of join, options include:

    • :inner : only include rows with keys that match in both df1 and df2, the default
    • :outer : include all rows from df1 and df2
    • :left : include all rows from df1
    • :right : include all rows from df2
    • :semi : return rows of df1 that match with the keys in df2
    • :anti : return rows of df1 that do not match with the keys in df2
    • :cross : a full Cartesian product of the key combinations; every row of df1 is matched with every row of df2

NAs are filled in where needed to complete joins.

Result

  • ::DataFrame : the joined DataFrame

Examples

name=DataFrame(ID=[1,2,3],Name=["John Doe","Jane Doe","Joe Blogs"])job=DataFrame(ID=[1,2,4],Job=["Lawyer","Doctor","Farmer"])join(name,job,on=:ID)join(name,job,on=:ID,kind=:outer)join(name,job,on=:ID,kind=:left)join(name,job,on=:ID,kind=:right)join(name,job,on=:ID,kind=:semi)join(name,job,on=:ID,kind=:anti)join(name,job,kind=:cross)

source

Reshaping

#DataFrames.meltFunction.

Stacks a DataFrame; convert from a wide to long format; see stack.

source

#DataFrames.stackFunction.

Stacks a DataFrame; convert from a wide to long format

stack(df::AbstractDataFrame,measure_vars,id_vars)stack(df::AbstractDataFrame,measure_vars)stack(df::AbstractDataFrame)melt(df::AbstractDataFrame,id_vars,measure_vars)melt(df::AbstractDataFrame,id_vars)

Arguments

  • df : the AbstractDataFrame to be stacked
  • measure_vars : the columns to be stacked (the measurement variables), a normal column indexing type, like a Symbol, Vector{Symbol}, Int, etc.; for melt, defaults to all variables that are not id_vars
  • id_vars : the identifier columns that are repeated during stacking, a normal column indexing type; for stack defaults to all variables that are not measure_vars

If neither measure_vars or id_vars are given, measure_vars defaults to all floating point columns.

Result

  • ::DataFrame : the long-format dataframe with column :value holding the values of the stacked columns (measure_vars), with column :variable a Vector of Symbols with the measure_vars name, and with columns for each of the id_vars.

See also stackdf and meltdf for stacking methods that return a view into the original DataFrame. See unstack for converting from long to wide format.

Examples

d1=DataFrame(a=repeat([1:3;],inner=[4]),b=repeat([1:4;],inner=[3]),c=randn(12),d=randn(12),e=map(string,'a':'l'))d1s=stack(d1,[:c,:d])d1s2=stack(d1,[:c,:d],[:a])d1m=melt(d1,[:a,:b,:e])

source

#DataFrames.unstackFunction.

Unstacks a DataFrame; convert from a long to wide format

unstack(df::AbstractDataFrame,rowkey,colkey,value)unstack(df::AbstractDataFrame,colkey,value)unstack(df::AbstractDataFrame)

Arguments

  • df : the AbstractDataFrame to be unstacked
  • rowkey : the column with a unique key for each row, if not given, find a key by grouping on anything not a colkey or value
  • colkey : the column holding the column names in wide format, defaults to :variable
  • value : the value column, defaults to :value

Result

  • ::DataFrame : the wide-format dataframe

Examples

wide=DataFrame(id=1:12,a=repeat([1:3;],inner=[4]),b=repeat([1:4;],inner=[3]),c=randn(12),d=randn(12))long=stack(wide)wide0=unstack(long)wide1=unstack(long,:variable,:value)wide2=unstack(long,:id,:variable,:value)

Note that there are some differences between the widened results above.

source

#DataFrames.stackdfFunction.

A stacked view of a DataFrame (long format)

Like stack and melt, but a view is returned rather than data copies.

stackdf(df::AbstractDataFrame,measure_vars,id_vars)stackdf(df::AbstractDataFrame,measure_vars)meltdf(df::AbstractDataFrame,id_vars,measure_vars)meltdf(df::AbstractDataFrame,id_vars)

Arguments

  • df : the wide AbstractDataFrame
  • measure_vars : the columns to be stacked (the measurement variables), a normal column indexing type, like a Symbol, Vector{Symbol}, Int, etc.; for melt, defaults to all variables that are not id_vars
  • id_vars : the identifier columns that are repeated during stacking, a normal column indexing type; for stack defaults to all variables that are not measure_vars

Result

  • ::DataFrame : the long-format dataframe with column :value holding the values of the stacked columns (measure_vars), with column :variable a Vector of Symbols with the measure_vars name, and with columns for each of the id_vars.

The result is a view because the columns are special AbstractVectors that return indexed views into the original DataFrame.

Examples

d1=DataFrame(a=repeat([1:3;],inner=[4]),b=repeat([1:4;],inner=[3]),c=randn(12),d=randn(12),e=map(string,'a':'l'))d1s=stackdf(d1,[:c,:d])d1s2=stackdf(d1,[:c,:d],[:a])d1m=meltdf(d1,[:a,:b,:e])

source

#DataFrames.meltdfFunction.

A stacked view of a DataFrame (long format); see stackdf

source