Resample

Let's assume that you have some time series with arbitrary timestamps and you need to bring the values to a new aligned grid with a fixed period in time. For example, like this:


 value:         1.0   2.0         3.0               4.0   5.0   6.0   7.0
 grid:      - - X - - X - - - - - X - - - - - - - - X - - X - - X - - X - -
 time:          1     2           4                 7     8     9     10

To do this you will need a resampling method, which we will now look at in more detail. The resampling method can be broken down into four steps:

  • selecting the origin of the time grid.
  • dividing the new grid into sub-intervals.
  • defining the parameters of these sub-intervals, such as the closed side and the label side.
  • Aggregation of values ​​falling into a new subinterval.

Origin of the new time grid

First you need to select the origin of the new grid:

  • ORIGIN_OF_WINDOW (default): Origin of coordinates for the corresponding time type (Beginning of the year for dates, zero for numbers).
  • START_OF_WINDOW: The first timestamp in the current time series.
  • END_OF_WINDOW: The last timestamp in the current time series.

 value:        1.0   2.0         3.0               4.0   5.0   6.0   7.0
 grid:    X - - X - - X - - - - - X - - - - - - - - X - - X - - X - - X - -
 time:    0     1     2           4                 7     8     9     10
          ^     ^                                                     ^
          |     |                                                     |
          |     START_OF_WINDOW                           END_OF_WINDOW
          ORIGIN_OF_WINDOW (default)

Period of the new time grid

Then, relative to the selected origin, a grid with a step of the specified period will be created


 value:            1.0   2.0         3.0               4.0   5.0   6.0   7.0
 grid:        X - - X - - X - - - - - X - - - - - - - - X - - X - - X - - X - -
 time:        0     1     2           4                 7     8     9     10


 period = 2:  | - - - - - | - - - - - | - - - - - | - - - - - | - - - - - | - -
              0           2           4           6           8           10

 period = 3:  | - - - - - - - - | - - - - - - - - | - - - - - - - - | - - - - -
              0                 3                 6                 9
              ^
              |
              ORIGIN_OF_WINDOW (default)


 period = 2:  - - - | - - - - - | - - - - - | - - - - - | - - - - - | - - - - - 
                    1           3           5           7           9           

 period = 3:  - - - | - - - - - - - - | - - - - - - - - | - - - - - - - - | - - 
                    1                 4                 7                 10
                    ^
                    |
                    START_OF_WINDOW


 period = 2:  | - - - - - | - - - - - | - - - - - | - - - - - | - - - - - | - -
              0           2           4           6           8           10

 period = 3:  - - - | - - - - - - - - | - - - - - - - - | - - - - - - - - | - - 
                    1                 4                 7                 10
                                                                          ^
                                                                          |
                                                              END_OF_WINDOW

Parameters of sub-intervals

Finally, it only remains to determine which side of the subintervals will be closed, as well as on which side the aggregation of the values ​​falling within the interval will occur. The following example shows how parameters CLOSED_LEFT and CLOSED_RIGHT, as well as LABEL_LEFT and LABEL_RIGHT, determine the behavior of the subintervals.

An example of the obtained sub-intervals with parameters origin = ORIGIN_OF_WINDOW and period = 2:


 value:      1.0       2.0       3.0                       1.0       2.0       3.0
 grid:    - - X - - - - X - - - - X - -   ➤   - - | - - - - X - - - - | - - - - X - - - - | - -
 time:        1         2         3               0                   2                   4

Possible options for decomposing values ​​into new sub-intervals for subsequent aggregation:

CLOSED_LEFT (default) CLOSED_RIGHT
LABEL_LEFT
(default)

 value:       1.0       2.0   3.0
 grid:   [ - - - - - )[ - - - - - )  
 time:    ⤷ 0          ⤷ 2 


 value:    1.0   2.0       3.0
 grid:   ( - - - - - ]( - - - - - ]  
 time:    ⤷ 0          ⤷ 2 

LABEL_RIGHT

 value:       1.0       2.0   3.0
 grid:   [ - - - - - )[ - - - - - )  
 time:            2 ⤶          4 ⤶ 


 value:    1.0   2.0       3.0
 grid:   ( - - - - - ]( - - - - - ]  
 time:            2 ⤶          4 ⤶

Values aggregation

Finally, the old values ​​that fall into the new subinterval can be aggregated by applying some function, such as sum, maximum, minimum, mean, median, etc.

API

TimeArrays.ta_resampleFunction
ta_resample(f::Function, t_array::TimeArray{T,V}, period::PeriodLike; kw...) -> TimeArray

Brings the values of t_array to a new time grid with new period using the function f on intermediate values of the old grid.

Function f must accept a vector with elements of type V as input and return a new value, which will be assigned to the corresponding timestamp of the each window. If it is impossible to calculate a new value based on the received vector (for example, the vector is empty), then the NaN value or an equivalent value (for custom types) must be returned.

Keyword arguments

  • origin::ORIGIN_TYPE = ORIGIN_OF_WINDOW: Start of new grid (Possible values: ORIGIN_OF_WINDOW, START_OF_WINDOW, END_OF_WINDOW).
  • closed::CLOSED_SIDE = CLOSED_LEFT: Closed side of the half-open subintervals of new grid (Possible values: CLOSED_LEFT, CLOSED_RIGHT).
  • label::LABEL_SIDE = LABEL_LEFT: Label side of the subintervals of new grid (Possible values: LABEL_LEFT, LABEL_RIGHT).

For more information see resample section.

Examples

julia> t_array = TimeArray{Int64,Int64}([(i, i) for i in 3:13])
11-element TimeArray{Int64, Int64}:
 TimeTick(3, 3)
 TimeTick(4, 4)
 TimeTick(5, 5)
 TimeTick(6, 6)
 TimeTick(7, 7)
 TimeTick(8, 8)
 TimeTick(9, 9)
 TimeTick(10, 10)
 TimeTick(11, 11)
 TimeTick(12, 12)
 TimeTick(13, 13)

julia> ta_resample(sum, t_array, 4, closed = CLOSED_LEFT, label = LABEL_LEFT)
4-element TimeArray{Int64, Int64}:
 TimeTick(0, 3)
 TimeTick(4, 22)
 TimeTick(8, 38)
 TimeTick(12, 25)

julia> ta_resample(sum, t_array, 4, closed = CLOSED_LEFT, label = LABEL_RIGHT)
4-element TimeArray{Int64, Int64}:
 TimeTick(4, 3)
 TimeTick(8, 22)
 TimeTick(12, 38)
 TimeTick(16, 25)

julia> ta_resample(sum, t_array, 4, closed = CLOSED_RIGHT, label = LABEL_LEFT)
4-element TimeArray{Int64, Int64}:
 TimeTick(0, 7)
 TimeTick(4, 26)
 TimeTick(8, 42)
 TimeTick(12, 13)

julia> ta_resample(sum, t_array, 4, closed = CLOSED_RIGHT, label = LABEL_RIGHT)
4-element TimeArray{Int64, Int64}:
 TimeTick(4, 7)
 TimeTick(8, 26)
 TimeTick(12, 42)
 TimeTick(16, 13)
julia> using Dates

julia> t_array = TimeArray{DateTime,Float64}([
           TimeTick(DateTime("2024-01-01"), 1.0),
           TimeTick(DateTime("2024-01-02"), 2.0),
           TimeTick(DateTime("2024-01-03"), 3.0),
           TimeTick(DateTime("2024-01-09"), 4.0),
           TimeTick(DateTime("2024-01-12"), 5.0),
           TimeTick(DateTime("2024-01-13"), 6.0),
           TimeTick(DateTime("2024-01-20"), 7.0),
       ]);

julia> ta_resample(x -> isempty(x) ? NaN : maximum(x), t_array, Day(3))
7-element TimeArray{DateTime, Float64}:
 TimeTick(2024-01-01T00:00:00, 3.0)
 TimeTick(2024-01-04T00:00:00, NaN)
 TimeTick(2024-01-07T00:00:00, 4.0)
 TimeTick(2024-01-10T00:00:00, 5.0)
 TimeTick(2024-01-13T00:00:00, 6.0)
 TimeTick(2024-01-16T00:00:00, NaN)
 TimeTick(2024-01-19T00:00:00, 7.0)
source