Resample
Let's assume that you have some time series with arbitrary timestamps and you need to bring the values to a new aligned grid with a fixed period in time. For example, like this:
value: 1.0 2.0 3.0 4.0 5.0 6.0 7.0 grid: - - X - - X - - - - - X - - - - - - - - X - - X - - X - - X - - time: 1 2 4 7 8 9 10
To do this you will need a resampling method, which we will now look at in more detail. The resampling method can be broken down into four steps:
- selecting the origin of the time grid.
- dividing the new grid into sub-intervals.
- defining the parameters of these sub-intervals, such as the closed side and the label side.
- Aggregation of values falling into a new subinterval.
Origin of the new time grid
First you need to select the origin
of the new grid:
ORIGIN_OF_WINDOW (default)
: Origin of coordinates for the corresponding time type (Beginning of the year for dates, zero for numbers).START_OF_WINDOW
: The first timestamp in the current time series.END_OF_WINDOW
: The last timestamp in the current time series.
value: 1.0 2.0 3.0 4.0 5.0 6.0 7.0 grid: X - - X - - X - - - - - X - - - - - - - - X - - X - - X - - X - - time: 0 1 2 4 7 8 9 10 ^ ^ ^ | | | | START_OF_WINDOW END_OF_WINDOW ORIGIN_OF_WINDOW (default)
Period of the new time grid
Then, relative to the selected origin, a grid with a step of the specified period
will be created
value: 1.0 2.0 3.0 4.0 5.0 6.0 7.0 grid: X - - X - - X - - - - - X - - - - - - - - X - - X - - X - - X - - time: 0 1 2 4 7 8 9 10
period = 2: | - - - - - | - - - - - | - - - - - | - - - - - | - - - - - | - - 0 2 4 6 8 10 period = 3: | - - - - - - - - | - - - - - - - - | - - - - - - - - | - - - - - 0 3 6 9 ^ | ORIGIN_OF_WINDOW (default)
period = 2: - - - | - - - - - | - - - - - | - - - - - | - - - - - | - - - - - 1 3 5 7 9 period = 3: - - - | - - - - - - - - | - - - - - - - - | - - - - - - - - | - - 1 4 7 10 ^ | START_OF_WINDOW
period = 2: | - - - - - | - - - - - | - - - - - | - - - - - | - - - - - | - - 0 2 4 6 8 10 period = 3: - - - | - - - - - - - - | - - - - - - - - | - - - - - - - - | - - 1 4 7 10 ^ | END_OF_WINDOW
Parameters of sub-intervals
Finally, it only remains to determine which side of the subintervals will be closed, as well as on which side the aggregation of the values falling within the interval will occur. The following example shows how parameters CLOSED_LEFT
and CLOSED_RIGHT
, as well as LABEL_LEFT
and LABEL_RIGHT
, determine the behavior of the subintervals.
An example of the obtained sub-intervals with parameters origin = ORIGIN_OF_WINDOW
and period = 2
:
value: 1.0 2.0 3.0 1.0 2.0 3.0 grid: - - X - - - - X - - - - X - - ➤ - - | - - - - X - - - - | - - - - X - - - - | - - time: 1 2 3 0 2 4
Possible options for decomposing values into new sub-intervals for subsequent aggregation:
CLOSED_LEFT (default) | CLOSED_RIGHT | |
---|---|---|
LABEL_LEFT (default) |
value: 1.0 2.0 3.0 grid: [ - - - - - )[ - - - - - ) time: ⤷ 0 ⤷ 2 |
value: 1.0 2.0 3.0 grid: ( - - - - - ]( - - - - - ] time: ⤷ 0 ⤷ 2 |
LABEL_RIGHT |
value: 1.0 2.0 3.0 grid: [ - - - - - )[ - - - - - ) time: 2 ⤶ 4 ⤶ |
value: 1.0 2.0 3.0 grid: ( - - - - - ]( - - - - - ] time: 2 ⤶ 4 ⤶ |
Values aggregation
Finally, the old values that fall into the new subinterval can be aggregated by applying some function, such as sum
, maximum
, minimum
, mean
, median
, etc.
API
TimeArrays.ta_resample
— Functionta_resample(f::Function, t_array::TimeArray{T,V}, period::PeriodLike; kw...) -> TimeArray
Brings the values of t_array
to a new time grid with new period
using the function f
on intermediate values of the old grid.
Function f
must accept a vector with elements of type V
as input and return a new value, which will be assigned to the corresponding timestamp of the each window. If it is impossible to calculate a new value based on the received vector (for example, the vector is empty), then the NaN
value or an equivalent value (for custom types) must be returned.
Keyword arguments
origin::ORIGIN_TYPE = ORIGIN_OF_WINDOW
: Start of new grid (Possible values:ORIGIN_OF_WINDOW
,START_OF_WINDOW
,END_OF_WINDOW
).closed::CLOSED_SIDE = CLOSED_LEFT
: Closed side of the half-open subintervals of new grid (Possible values:CLOSED_LEFT
,CLOSED_RIGHT
).label::LABEL_SIDE = LABEL_LEFT
: Label side of the subintervals of new grid (Possible values:LABEL_LEFT
,LABEL_RIGHT
).
For more information see resample section
.
Examples
julia> t_array = TimeArray{Int64,Int64}([(i, i) for i in 3:13])
11-element TimeArray{Int64, Int64}:
TimeTick(3, 3)
TimeTick(4, 4)
TimeTick(5, 5)
TimeTick(6, 6)
TimeTick(7, 7)
TimeTick(8, 8)
TimeTick(9, 9)
TimeTick(10, 10)
TimeTick(11, 11)
TimeTick(12, 12)
TimeTick(13, 13)
julia> ta_resample(sum, t_array, 4, closed = CLOSED_LEFT, label = LABEL_LEFT)
4-element TimeArray{Int64, Int64}:
TimeTick(0, 3)
TimeTick(4, 22)
TimeTick(8, 38)
TimeTick(12, 25)
julia> ta_resample(sum, t_array, 4, closed = CLOSED_LEFT, label = LABEL_RIGHT)
4-element TimeArray{Int64, Int64}:
TimeTick(4, 3)
TimeTick(8, 22)
TimeTick(12, 38)
TimeTick(16, 25)
julia> ta_resample(sum, t_array, 4, closed = CLOSED_RIGHT, label = LABEL_LEFT)
4-element TimeArray{Int64, Int64}:
TimeTick(0, 7)
TimeTick(4, 26)
TimeTick(8, 42)
TimeTick(12, 13)
julia> ta_resample(sum, t_array, 4, closed = CLOSED_RIGHT, label = LABEL_RIGHT)
4-element TimeArray{Int64, Int64}:
TimeTick(4, 7)
TimeTick(8, 26)
TimeTick(12, 42)
TimeTick(16, 13)
julia> using Dates
julia> t_array = TimeArray{DateTime,Float64}([
TimeTick(DateTime("2024-01-01"), 1.0),
TimeTick(DateTime("2024-01-02"), 2.0),
TimeTick(DateTime("2024-01-03"), 3.0),
TimeTick(DateTime("2024-01-09"), 4.0),
TimeTick(DateTime("2024-01-12"), 5.0),
TimeTick(DateTime("2024-01-13"), 6.0),
TimeTick(DateTime("2024-01-20"), 7.0),
]);
julia> ta_resample(x -> isempty(x) ? NaN : maximum(x), t_array, Day(3))
7-element TimeArray{DateTime, Float64}:
TimeTick(2024-01-01T00:00:00, 3.0)
TimeTick(2024-01-04T00:00:00, NaN)
TimeTick(2024-01-07T00:00:00, 4.0)
TimeTick(2024-01-10T00:00:00, 5.0)
TimeTick(2024-01-13T00:00:00, 6.0)
TimeTick(2024-01-16T00:00:00, NaN)
TimeTick(2024-01-19T00:00:00, 7.0)