edit

Querying frameworks

Query.jl

The Query.jl package provides advanced data manipulation capabilities for DataFrames (and many other data structures). This section provides a short introduction to the package, the Query.jl documentation has a more comprehensive documentation of the package.

To get started, install the Query.jl package:

Pkg.add("Query")

A query is started with the @from macro and consists of a series of query commands. Query.jl provides commands that can filter, project, join, group, flatten and group data from a DataFrame. A query can return an iterator, or one can materialize the results of a query into a variety of data structures, including a new DataFrame.

A simple example of a query looks like this:

```@example 1 using DataFrames, Query

df = DataFrame(name=["John", "Sally", "Roger"], age=[54., 34., 79.], children=[0, 2, 4])

q1 = @from i in df begin @where i.age > 40 @select {number_of_children=i.children, i.name} @collect DataFrame end

Thequerystartswiththe`@from`macro.Thefirstargument`i`isthenameoftherangevariablethatwillbeusedtorefertoanindividualrowinlaterquerycommands.Thenextargument`df`isthedatasourcethatonewantstoquery.The`@where`commandinthisquerywillfilterthesourcedatabyapplyingthefiltercondition`i.age>40`.Thisfiltersoutanyrowsinwhichthe`age`columnisnotlargerthan40.The`@select`commandthenprojectsthecolumnsofthesourcedataontoanewcolumnstructure.Theexamplehereappliesthreespecificmodifications:1)itonlykeepsasubsetofthecolumnsinthesource`DataFrame`,i.e.the`age`columnwillnotbepartofthetransformeddata;2)itchangestheorderofthetwocolumnsthatareselected;and3)itrenamesoneofthecolumnsthatisselectedfrom`children`to`number_of_children`.Theexamplequeryusesthe`{}`syntaxtoachievethis.A`{}`inaQuery.jlexpressioninstantiatesanew[NamedTuple](https://github.com/blackrock/NamedTuples.jl),i.e.itisashortcutforwriting`@NT(number_of_children=>i.children,name=>i.name)`.The`@collect`statementdeterminesthedatastructurethatthequeryreturns.Inthisexampletheresultsarereturnedasa`DataFrame`.Aquerywithouta`@collect`statementreturnsastandardjuliaiteratorthatcanbeusedwithanynormaljulialanguageconstructthatcandealwithiterators.Thefollowingcodereturnsajuliaiteratorforthequeryresults:```@example1q2=@fromiindfbegin@wherei.age>40@select{number_of_children=i.children,i.name}endnothing#hide

One can loop over the results using a standard julia for statement:

```@example 1 total_children = 0 for i in q2 total_children += i.number_of_children end

println("Total number of children: $(get(total_children))")

Or one can use a comprehension to extract the name of a subset of rows:


```@example 1
y = [i.name for i in q2 if i.number_of_children > 0]

The last example (extracting only the name and applying a second filter) could of course be completely expressed as a query expression:

@example 1 q3 = @from i in df begin @where i.age > 40 && i.children > 0 @select i.name @collect end

A query that ends with a @collect statement without a specific type will materialize the query results into an array. Note also the difference in the @select statement: The previous queries all used the {} syntax in the @select statement to project results into a tabular format. The last query instead just selects a single value from each row in the @select statement.

These examples only scratch the surface of what one can do with Query.jl, and the interested reader is referred to the Query.jl documentation for more information.