Internal API

The functions, methods and types listed on this page are internal to DataFrames and are not considered to be part of the public API.

compacttype(T::Type, maxwidth::Int=8, initial::Bool=true)

Return compact string representation of type T.

For displaying data frame we do not want string representation of type to be longer than maxwidth. This function implements rules how type names are cropped if they are longer than maxwidth.


Generate standardized names for columns of a DataFrame. The first name will be :x1, the second :x2, etc.

                        rowid::Union{Integer, Nothing},

Calculate, for each column of an AbstractDataFrame, the maximum string width used to render the name of that column, its type, and the longest entry in that column – among the rows of the data frame will be rendered to IO. The widths for all columns are returned as a vector.

Return a Vector{Int} giving the maximum string widths required to render each column, including that column's name and type.

NOTE: The last entry of the result vector is the string width of the implicit row ID column contained in every AbstractDataFrame.


  • df::AbstractDataFrame: The data frame whose columns will be printed.
  • io::IO: The IO to which df is to be printed
  • `rowindices1::AbstractVector{Int}: A set of indices of the first chunk of the AbstractDataFrame that would be rendered to IO.
  • `rowindices2::AbstractVector{Int}: A set of indices of the second chunk of the AbstractDataFrame that would be rendered to IO. Can be empty if the AbstractDataFrame would be printed without any ellipses.
  • rowlabel::AbstractString: The label that will be used when rendered the numeric ID's of each row. Typically, this will be set to "Row".
  • rowid: Used to handle showing DataFrameRow.
  • show_eltype: Whether to print the column type under the column name in the heading.
  • buffer: buffer passed around to avoid reallocations in ourstrwidth
DataFrames.ourshow(io::IO, x::Any, truncstring::Int)

Render a value to an IO object compactly using print. truncstring indicates the approximate number of text characters width to truncate the output (if it is a non-positive value then no truncation is applied).

DataFrames.ourstrwidth(io::IO, x::Any, buffer::IOBuffer, truncstring::Int)

Determine the number of characters that would be used to print a value.

@spawn_for_chunks basesize for i in range ... end

Parallelize a for loop by spawning separate tasks iterating each over a chunk of at least basesize elements in range.

A number of task higher than Threads.nthreads() may be spawned, since that can allow for a more efficient load balancing in case some threads are busy (nested parallelism).

default_table_transformation(df_sel::AbstractDataFrame, fun)

This is a default implementation called when AsTable(...) => fun is requested. The df_sel argument is a data frame storing columns selected by AsTable(...) selector.

table_transformation(df_sel::AbstractDataFrame, fun)

This is the function called when AsTable(...) => fun is requested. The df_sel argument is a data frame storing columns selected by the AsTable(...) selector.

By default it calls default_table_transformation. However, it is allowed to add special methods for specific types of fun, as long as the result matches what would be produced by default_table_transformation, except that it is allowed to perform eltype conversion of the resulting vectors or value type promotions that are consistent with promote_type.

It is guaranteed that df_sel has at least one column.

The main use of special table_transformation methods is to provide more efficient than the default implementations of requested fun transformation.

This function might become a part of the public API of DataFrames.jl in the future, currently it should be considered experimental.

Fast paths are implemented within DataFrames.jl for the following functions fun:

  • sum, ByRow(sum),ByRow(sum∘skipmissing)`
  • length, ByRow(length), ByRow(length∘skipmissing)
  • mean, ByRow(mean),ByRow(mean∘skipmissing)`
  • ByRow(var),ByRow(var∘skipmissing)`
  • ByRow(std),ByRow(std∘skipmissing)`
  • ByRow(median),ByRow(median∘skipmissing)`
  • minimum, ByRow(minimum), ByRow(minimum∘skipmissing)
  • maximum, ByRow(maximum), ByRow(maximum∘skipmissing)
  • fun∘collect and ByRow(fun∘collect) where fun is any function

Note that in order to improve the performance ByRow(sum), ByRow(sum∘skipmissing), ByRow(mean), and ByRow(mean∘skipmissing) perform all operations in the target element type. In some very rare cases (like mixing very large Int64 values and Float64 values) it can lead to a result different from the one that would be obtained by calling the function outside of DataFrames.jl. The way to avoid this precision loss is to use an anonymous function, e.g. instead of ByRow(sum) use ByRow(x -> sum(x)). However, in general for such scenarios even standard aggregation functions should not be considered to provide reliable output, and users are recommended to switch to higher precision calculations. An example of a case when standard sum is affected by the situation discussed is:

julia> sum(Any[typemax(Int), typemax(Int), 1.0])

julia> sum(Any[1.0, typemax(Int), typemax(Int)])

Trait returning a Bool indicator if function fun is only reading the passed argument. Such a function guarantees not to modify nor return in any form the passed argument. By default false is returned.

This function might become a part of the public API of DataFrames.jl in the future, currently it should be considered experimental. Adding a method to isreadonly for a specific function fun will improve performance of AsTable(...) => ByRow(fun∘collect) operation.