# The Formula, ModelFrame and ModelMatrix Types

In regression analysis, we often want to describe the relationship between a response variable and one or more input variables in terms of main effects and interactions. To facilitate the specification of a regression model in terms of the columns of a `DataFrame`

, the DataFrames package provides a `Formula`

type, which is created using the `@formula`

macro in Julia:

fm=@formula(Z~X+Y)

A `Formula`

object can be used to transform a `DataFrame`

into a `ModelFrame`

object:

df=DataFrame(X=randn(10),Y=randn(10),Z=randn(10))mf=ModelFrame(@formula(Z~X+Y),df)

A `ModelFrame`

object is just a simple wrapper around a `DataFrame`

. For modeling purposes, one generally wants to construct a `ModelMatrix`

, which constructs a `Matrix{Float64}`

that can be used directly to fit a statistical model:

mm=ModelMatrix(ModelFrame(@formula(Z~X+Y),df))

Note that `mm`

contains an additional column consisting entirely of `1.0`

values. This is used to fit an intercept term in a regression model.

In addition to specifying main effects, it is possible to specify interactions using the `&`

operator inside a `Formula`

:

mm=ModelMatrix(ModelFrame(@formula(Z~X+Y+X&Y),df))

If you would like to specify both main effects and an interaction term at once, use the `*`

operator inside a `Formula`

:

mm=ModelMatrix(ModelFrame(@formula(Z~X*Y),df))

You can control how categorical variables (e.g., `PooledDataArray`

columns) are converted to `ModelMatrix`

columns by specifying *contrasts* when you construct a `ModelFrame`

:

mm=ModelMatrix(ModelFrame(@formula(Z~X*Y),df,contrasts=Dict(:X=>HelmertCoding())))

Contrasts can also be modified in an existing `ModelFrame`

:

mf=ModelFrame(@formula(Z~X*Y),df)contrasts!(mf,X=HelmertCoding())

The construction of model matrices makes it easy to formulate complex statistical models. These are used to good effect by the GLM Package.