Internals
The functions, methods and types listed on this page are internal to DataFrames and are not considered to be part of the public API.
DataFrames.compacttype
— Functioncompacttype(T::Type, maxwidth::Int=8, initial::Bool=true)
Return compact string representation of type T
.
For displaying data frame we do not want string representation of type to be longer than maxwidth
. This function implements rules how type names are cropped if they are longer than maxwidth
.
DataFrames.gennames
— Functiongennames(n::Integer)
Generate standardized names for columns of a DataFrame. The first name will be :x1
, the second :x2
, etc.
DataFrames.getmaxwidths
— FunctionDataFrames.getmaxwidths(df::AbstractDataFrame,
io::IO,
rowindices1::AbstractVector{Int},
rowindices2::AbstractVector{Int},
rowlabel::Symbol,
rowid::Union{Integer, Nothing},
show_eltype::Bool,
buffer::IOBuffer)
Calculate, for each column of an AbstractDataFrame, the maximum string width used to render the name of that column, its type, and the longest entry in that column – among the rows of the data frame will be rendered to IO. The widths for all columns are returned as a vector.
Return a Vector{Int}
giving the maximum string widths required to render each column, including that column's name and type.
NOTE: The last entry of the result vector is the string width of the implicit row ID column contained in every AbstractDataFrame
.
Arguments
df::AbstractDataFrame
: The data frame whose columns will be printed.io::IO
: TheIO
to whichdf
is to be printed- `rowindices1::AbstractVector{Int}: A set of indices of the first chunk of the AbstractDataFrame that would be rendered to IO.
- `rowindices2::AbstractVector{Int}: A set of indices of the second chunk of the AbstractDataFrame that would be rendered to IO. Can be empty if the AbstractDataFrame would be printed without any ellipses.
rowlabel::AbstractString
: The label that will be used when rendered the numeric ID's of each row. Typically, this will be set to "Row".rowid
: Used to handle showingDataFrameRow
.show_eltype
: Whether to print the column type under the column name in the heading.buffer
: buffer passed around to avoid reallocations inourstrwidth
DataFrames.ourshow
— FunctionDataFrames.ourshow(io::IO, x::Any, truncstring::Int)
Render a value to an IO
object compactly using print. truncstring
indicates the approximate number of text characters width to truncate the output (if it is a non-positive value then no truncation is applied).
DataFrames.ourstrwidth
— FunctionDataFrames.ourstrwidth(io::IO, x::Any, buffer::IOBuffer, truncstring::Int)
Determine the number of characters that would be used to print a value.
DataFrames.@spawn_for_chunks
— Macro@spawn_for_chunks basesize for i in range ... end
Parallelize a for
loop by spawning separate tasks iterating each over a chunk of at least basesize
elements in range
.
A number of task higher than Threads.nthreads()
may be spawned, since that can allow for a more efficient load balancing in case some threads are busy (nested parallelism).
DataFrames.default_table_transformation
— Functiondefault_table_transformation(df_sel::AbstractDataFrame, fun)
This is a default implementation called when AsTable(...) => fun
is requested. The df_sel
argument is a data frame storing columns selected by AsTable(...)
selector.
DataFrames.table_transformation
— Functiontable_transformation(df_sel::AbstractDataFrame, fun)
This is the function called when AsTable(...) => fun
is requested. The df_sel
argument is a data frame storing columns selected by the AsTable(...)
selector.
By default it calls default_table_transformation
. However, it is allowed to add special methods for specific types of fun
, as long as the result matches what would be produced by default_table_transformation
, except that it is allowed to perform eltype
conversion of the resulting vectors or value type promotions that are consistent with promote_type
.
It is guaranteed that df_sel
has at least one column.
The main use of special table_transformation
methods is to provide more efficient than the default implementations of requested fun
transformation.
This function might become a part of the public API of DataFrames.jl in the future, currently it should be considered experimental.
Fast paths are implemented within DataFrames.jl for the following functions fun
:
sum
,ByRow(sum),
ByRow(sum∘skipmissing)`length
,ByRow(length)
,ByRow(length∘skipmissing)
mean
,ByRow(mean),
ByRow(mean∘skipmissing)`ByRow(var),
ByRow(var∘skipmissing)`ByRow(std),
ByRow(std∘skipmissing)`ByRow(median),
ByRow(median∘skipmissing)`minimum
,ByRow(minimum)
,ByRow(minimum∘skipmissing)
maximum
,ByRow(maximum)
,ByRow(maximum∘skipmissing)
fun∘collect
andByRow(fun∘collect)
wherefun
is any function
Note that in order to improve the performance ByRow(sum)
, ByRow(sum∘skipmissing)
, ByRow(mean)
, and ByRow(mean∘skipmissing)
perform all operations in the target element type. In some very rare cases (like mixing very large Int64
values and Float64
values) it can lead to a result different from the one that would be obtained by calling the function outside of DataFrames.jl. The way to avoid this precision loss is to use an anonymous function, e.g. instead of ByRow(sum)
use ByRow(x -> sum(x))
. However, in general for such scenarios even standard aggregation functions should not be considered to provide reliable output, and users are recommended to switch to higher precision calculations. An example of a case when standard sum
is affected by the situation discussed is:
julia> sum(Any[typemax(Int), typemax(Int), 1.0])
-1.0
julia> sum(Any[1.0, typemax(Int), typemax(Int)])
1.8446744073709552e19
DataFrames.isreadonly
— Functionisreadonly(fun)
Trait returning a Bool
indicator if function fun
is only reading the passed argument. Such a function guarantees not to modify nor return in any form the passed argument. By default false
is returned.
This function might become a part of the public API of DataFrames.jl in the future, currently it should be considered experimental. Adding a method to isreadonly
for a specific function fun
will improve performance of AsTable(...) => ByRow(fun∘collect)
operation.