---
title: "Introduction to disaggR"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to disaggR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.show="hold",
  fig.width = 7,
  fig.height = 3,
  out.width = "98%",
  dpi = 150
)
```

```{r,echo=FALSE}
library(disaggR)
```

The  French quarterly  accounts  are  constructed  using  an  indirect  method,  based  on  two  major  data  series:  the annual national accounts, and the short-term outlook data compiled from various sources. Hence, a temporal disaggregation methodology is needed.

**disaggR** isn't the only package on the CRAN providing such methods. [tempdisagg](https://cran.r-project.org/package=tempdisagg) relies on a formula interface, supports most time series classes and provides compatibility with irregular disaggregation setting (e.g. months to days).

disaggR has different goals:

* implementing the french national accounts methodology on time-series of the class `"ts"`
* providing a simple syntax that relies on standard evaluation and functions
* being fast enough to deal with the production of multiple series without linking to Rcpp (for readability purposes)

For further details about the french national accounts methodology, see [Methodology of quarterly national accounts](https://www.insee.fr/en/information/2579410).

*Within that vignette, for each matrix time series, like in R, rows stand for the time and columns for the different variables. As for the operations,* $*$ *stands for the standard product,* $\odot$ *for the element-wise multiplication, and* $\oslash$ *for the element-wise division.*

# Two-Steps Benchmarks

```{r, eval=FALSE}
twoStepsBenchmark(turnover,construction)
```

The two-steps benchmarks, provided by `twoStepsBenchmark()`, rely on this high-frequency formula:

$$
C =  I * a + u
$$

Where:

* $C$, an univariate time series, is the high-frequency account.
* $I$, a matrix time series, combines the columns of the indicators, the outliers and the constant. If `include.differenciation` is `TRUE`, the constant is actually a trend.
* $a$, a numeric vector, stands for the coefficients to be applied.
* $u$, an univariate time series, is the smoothed part of the benchmark.

The **coefficients are estimated at low-frequency**, within the coeff.calc window, eventually after a differentiation if `include.differenciation` is `TRUE`:

$$
C_{aggregated}' =  I_{aggregated}' * a + u'
$$

If `include.rho` is `TRUE`, $u'$ is an AR1 process with an autocorrelation parameter, and $a$ is estimated through a Prais-Winsten process. Otherwise, an ordinary least squares process is used.

These **coefficients are applied at high-frequency**, to obtain the fitted values of the benchmark:

$$
\text{fitted.values} = I * a
$$

Note that, especially when `include.differenciation` is `TRUE`, the level of the fitted values is arbitrary: a constant is *chosen* to zero during the implicit reintegration. The choice of this constant, however, doesn't impact the benchmarked series.

**u is smoothed**:

$$
u = smooth(extrapolation(C_{aggregated} - \text{fitted.values}_{aggregated}))
$$

As the low-frequency values of $C$ have been set from the beginning, and the fitted values have just been computed, the aggregated values of $u$ are known.
If these values aren't defined all across the domain window, these are extrapolated as follow:

* $u(n+1)-u(n) = rho*(u(n)-u(n-1))$ if `include.differenciation` is `TRUE`
* $u(n+1) = rho * u(n)$ if `include.differenciation` is `FALSE`

The Boot, Feibes and Lisman process, through the `bflSmooth()` function, is then used to get the high-frequency values of u across the domain window.

# Proportional Denton Benchmarks

```{r, eval=FALSE}
threeRuleSmooth(turnover,construction)
```

The Proportional Denton Benchmarks, provided by `threeRuleSmooth()`, rely on this high-frequency formula:

$$
C =  I \odot a
$$

Where:

* $C$, an univariate time series, is the high-frequency account
* $I$, an *univariate* time series, the indicator
* $a$, an *univariate time series*, the coefficient to be applied, which is a rate to be smoothed.

Proportional Denton benchmarks share some similarities with univariate two-steps benchmarks without constants. There are some differences:

* The coefficient **$a$ is not a constant**.
* There is **no smoothed part**, as $a$ is already smoothed for $I \odot a$ to be equal to $C$ after aggregation.

In order to smooth the rate, a few steps are required.

An **alternate version of I is computed**, that is only used for the smoothing:

$$
I' = replication(crop(I))
$$

Only the full cycles of I are kept, then the first and last full cycles are replicated respectively backwards and forwards to fill the domain window.

The low-frequency rate is already known where $C \oslash I$ is defined. This rate is then extrapolated to fill the domain window:

$$
a_{aggregated}' = extrapolation(C_{aggregated} \oslash I_{aggregated})
$$

Such an extrapolation is a bit more problematic than the *natural* extrapolations provided by `twoStepsBenchmark()`. Indeed, the proportional Denton benchmarks don't involve any hypothesis on $a$, other than continuity. As continuity isn't enough, and that proportional Denton benchmarks are mainly used for rates that have a trend, those rates are extrapolated using an arithmetic sequence. By default, the common difference of this sequence is given by the mean of the rate differences within the delta rate window.

The high-frequency rate can then be computed with the help of a weighted Boot, Feibes and Lisman process:

$$
a = smooth(a_{aggregated}',weights=I')
$$

# Plots

disaggR provides some tools for plotting its results. Functions `in_sample()`, `in_disaggr()`, `in_scatter()`, `in_revisions()` generate objects of class `"tscomparison"`. Each object of class `"tscomparison"`, `"twoStepsBenchmark"` or `"threeRuleSmooth"` can be plotted with either the base `plot()` or the ggplot2 `autoplot()` method. These methods all share similar arguments:

* **x** (or **object** for the autoplot method) a tscomparison, a twoStepsBenchmark or a threeRuleSmooth
* **xlab**, the title for the x axis
* **ylab**, the title for the y axis
* **start**, a numeric of length 1 or 2. The start of the plot
* **end**, a numeric of length 1 or 2. The end of the plot
* **col**, the color scale applied on the plot. Could be a vector of colors, or a function from n to a color vector of size n.
* **lty**, the linetype scales applied on the plot. Could be a vector of linetypes, or a function from n to a linetypes vector of size n.
* **show.legend**, `TRUE` or `FALSE`. Should an automatic legend be added to the plot.
* **main**, a character of length 1, the title of the plot
* **mar**, a numeric of length 4, the margins of the plot specified in the form c(bottom, left, top, right).
* **theme**, a ggplot theme object to replace the default one (only for autoplot methods)
* **...**, other arguments passed either to ggplot or plot

```{r}
benchmark <- twoStepsBenchmark(hfserie = turnover,
                               lfserie = construction,
                               include.differenciation = TRUE)
plot(in_sample(benchmark,type="levels"),
     start=c(2010,1),end=c(2017,1))
```
```{r}
library(ggplot2)
smooth <- threeRuleSmooth(hfserie = turnover,
                          lfserie = construction)
autoplot(in_disaggr(smooth),
         start=c(2009,1),end=c(2013,12),
         show.legend = FALSE)
```

# Other methods

Various methods can be applied on objects of class `"twoStepsBenchmark"` and/or `"threeRuleSmooth"`.

```{r, eval=FALSE}
benchmark <- twoStepsBenchmark(turnover,construction)
smooth <- threeRuleSmooth(turnover,construction)

reView(benchmark)
rePort(benchmark)

as.ts(benchmark);as.ts(smooth)
as.list(benchmark);as.list(smooth)
coef(benchmark)
residuals(benchmark)
vcov(benchmark)
fitted(benchmark)
model.list(benchmark);model.list(smooth)
se(benchmark)
rho(benchmark)
outliers(benchmark)
smoothed.rate(smooth)
summary(benchmark)
```

Additionally, most of the methods for time series, from the package stats, automatically coerce these objects into time-series using `as.ts()`.