\section*{\refname}%
          \@mkboth{\MakeUppercase\refname}{\MakeUppercase\refname}%
          \list{\@biblabel{\@arabic\c@enumiv}}%
               {\settowidth\labelwidth{\@biblabel{#1}}%
                \leftmargin\labelwidth
                \advance\leftmargin\labelsep
                \@openbib@code
                \usecounter{enumiv}%
                \let\p@enumiv\@empty
                \itemsep=0pt
                \parsep=0pt
                \leftmargin=\parindent
                \itemindent=-\parindent
                \renewcommand\theenumiv{\@arabic\c@enumiv}}%
          \sloppy
          \clubpenalty4000
          \@clubpenalty \clubpenalty
          \widowpenalty4000%
          \sfcode`\.\@m}
         {\def\@noitemerr
           {\@latex@warning{Empty `thebibliography' environment}}%
          \endlist}
\makeatother

\begin{document}
\graphicspath{{Figs/}}
\maketitle

\begin{abstract}
\noindent
We propose a flexible means of  estimating  vector autoregressions  with time-varying  parameters (TVP-VARs) by  introducing a latent threshold process  that  is driven by the absolute size of parameter changes. This enables us to dynamically detect whether a given regression coefficient is constant or time-varying.  When applied to a medium-scale macroeconomic US dataset our model yields precise density and turning point predictions, especially during economic downturns, and provides new insights on the changing effects of increases in short-term interest rates over time.

\end{abstract}

\textbf{\small Keywords:} 
{\small Change point model, Threshold mixture innovations, Structural breaks, Shrinkage, Bayesian statistics, Monetary policy.}\\[-1em]

\textbf{\small JEL Codes}: C11, C32, C52, E42.\\[-1em]




\newpage

\section{Introduction}
\label{sec:intro}
In the last few years, economists in policy institutions and central banks were criticized for their failure to foresee the recent financial crisis that engulfed the world economy and led to a sharp drop in economic activity. Critics argued that economists failed to predict the crisis because  models commonly utilized at policy institutions back then were too simplistic. For instance, the majority of forecasting models adopted were (and possibly still are) linear and low dimensional. The former implies that the underlying structural mechanisms and the volatility of economic shocks are assumed to remain constant over time -- a rather restrictive assumption. The latter implies that  only little information is exploited which may be detrimental for obtaining reliable predictions. 


In light of this criticism,  practitioners started to develop more complex models that are capable of capturing salient features of time series commonly observed in macroeconomics and finance. Recent research \citep{stock1996evidence,cogley2002evolving,cogley2005drifts,primiceri2005time,sims2006were}  suggests that, at least for US data, there is considerable evidence that the influence of certain variables appears to be time-varying. This raises additional issues related to model specification and estimation. For instance, do all regression parameters vary over time? Or is time variation just limited to a specific subset of the parameter space? Moreover, as is the case with virtually any  modeling problem, the question  whether a given variable should be included in the model in the first place  naturally arises. Apart from deciding whether parameters are changing over time, the nature of the process that drives the dynamics of the coefficients also proves to be an important modeling decision.

In a recent contribution, \cite{fruhwirth2010stochastic} focus on model specification issues within the general framework of state space models. Exploiting a non-centered parametrization of the model allows them to rewrite the model in terms of a constant parameter specification, effectively capturing the steady state of the process along with deviations thereof. The non-centered parameterization is subsequently used to search for appropriate model specifications, imposing shrinkage on the steady state part and the corresponding deviations. Recent research aims to discriminate between inclusion/exclusion of elements of different variables and whether the associated regression coefficient is constant or time-varying \citep{belmonte2014hierarchical, eisenstat2016stochastic,koop2012forecasting,koop2013large, kalli2014time}.  Another strand of the literature asks whether coefficients are constant or time-varying by assuming that the innovation variance in the state equation
is characterized by a change point process
\citep{mcculloch1993bayesian,gerlach2000efficient, koop2009evolution, giordani2012efficient}. However, the main drawback of this modeling approach is the severe computational burden originating from the need to simulate additional latent states for each parameter. This renders estimation of large dimensional models like vector autoregressions (VARs) unfeasible. To circumvent such problems, \cite{koop2009evolution} estimate a single Bernoulli random variable to discriminate between time constancy and parameter variation for the autoregressive coefficients, the covariances, and the log-volatilities, respectively. This assumption, however, implies that either all autoregressive parameters change over a given time frame, or none of them. Along these lines, \cite{mah-son:eff} allow for independent breaks in regression coefficients and the volatility parameters. However, they show that their multivariate approach is inferior to univariate change point models when out-of-sample forecasts are considered and conclude that allowing for independent breaks in each series is important.

In the present paper, we introduce a method that can be applied to a highly parameterized  VAR model by combining ideas  from the literature of  latent threshold models \citep{nee-dun:bay, nakajima2013bayesian,nakajima2013dynamic,zhou2014bayesian,kimura2016identifying} and mixture innovation models. Specifically, we introduce a set of latent thresholds that controls the degree of time-variation separately for each parameter and for each point in time. This is achieved by estimating variable-specific thresholds that allow for movements in the autoregressive parameters if the proposed change of the parameter is large enough. We show that this can be achieved by assuming that the innovations of the state equation follow a threshold model that discriminates between a situation where the innovation variance is large and a case with an innovation variance set (very close) to zero. The proposed framework nests a wide variety of competing models, most notably the standard time-varying parameter model, a change-point model with an unknown number of regimes, mixtures between different models, and finally the simple constant parameter model. To assess systematically, in a data-driven fashion, which predictors should be included in the model, we impose a set of Normal-Gamma priors \citep{griffin2010inference} in the spirit of \cite{bitto2015achieving} on the initial state of the system.


We illustrate the empirical merits of our approach  by carrying out an extensive forecasting exercise based on a medium-scale US dataset. Our proposed framework is benchmarked against two constant parameter  Bayesian VAR models with stochastic volatility  and hierarchical shrinkage priors.  The findings indicate that  the threshold time-varying parameter  VAR  excels in crisis periods while being only slightly  inferior during ``normal'' periods in terms of one-step-ahead log predictive likelihoods. Considering  turning point predictions for GDP growth, our model outperforms the constant parameter benchmarks when upward turning points are considered while yielding similar forecasts for downward turning points. 

In the second part of the application, we provide evidence on the degree of time-variation  of the underlying causal mechanisms for the USA.  Considering the determinant of the time-varying variance-covariance matrix of the state innovations as a global measure for the strength of parameter movements, we find that those movements reach a maximum in the beginning of the 1980s while displaying only relatively modest movements before and after that period. Consequently, we investigate the effects of a monetary policy shock for the pre- and post-1980 periods separately. This exercise reveals a considerable prize puzzle in the 1960s which starts disappearing in the early 1980s.
Moreover, considering the most recent part of our sample period, we find evidence for increased effectiveness of monetary policy. This is especially pronounced during the aftermath of the global financial crisis in 2008/09.


The paper is structured as follows.
Section~\ref{sec:framework} introduces the  modeling approach, the prior setup  and the corresponding MCMC algorithm for posterior simulation. Section~\ref{sec:illustration} illustrates the behavior of the model by showcasing scenarios with few, moderately many, and many jumps in the state equation.
In Section~\ref{sec:application}, we apply the model to a medium-scale US macroeconomic dataset and assess its predictive capabilities against a range of competing models in terms of density and turning point predictions. Moreover, we investigate during which periods VAR coefficients display the largest amount of time-variation and consider the associated implications on dynamic responses with respect to a monetary policy shock. Finally, Section~\ref{sec:conclusion} concludes.
\section{Econometric framework}
\label{sec:framework}
We begin by specifying a flexible model that is capable of discriminating between constant and time-varying parameters at each point in time.
\subsection{A threshold mixture innovation model}
\label{sec:model}
Consider   the following dynamic regression model,
\begin{equation}
y_t = \boldsymbol{x}_t' \boldsymbol{\beta}_t + u_t, ~u_t \sim \mathcal{N}(0, \sigma_t^2)\label{eq:obs},
\end{equation}
where $\boldsymbol{x}_t$ is a $K$-dimensional vector of explanatory variables and $\boldsymbol{\beta}_t =(\beta_{1t},\dots,\beta_{Kt})'$ a vector of regression coefficients. The error term $u_t$ is assumed to be independently normally distributed with (potentially) time-varying variance.
This model assumes that the relationship between elements of $\boldsymbol{x}_t$ and $y_t$ is not necessarily constant over time, but changes subject to some law of motion for $\boldsymbol{\beta}_t$. Typically, researchers assume that the $j$th element of $\boldsymbol{\beta}_t$ ($j = 1,\dots,K)$ follows a random walk process,
\begin{equation}
\beta_{jt} = \beta_{j,t-1}+e_{jt},~e_{jt} \sim \mathcal{N}(0,\vartheta_j), \label{eq:states1}
\end{equation}
with $\vartheta_j$ denoting the innovation variance of the latent states. \autoref{eq:states1} implies that parameters evolve gradually over time, ruling out abrupt changes. While being conceptually flexible, in the presence of only a few breaks in the parameters, this model generates spurious movements in the coefficients that could be detrimental for the empirical performance of the model \citep{d2013macroeconomic}.

Thus, we deviate from \autoref{eq:states1} by specifying the innovations of the state equation $e_{jt}$ to be a mixture distribution. More concretely, let
\begin{align}
e_{jt} &\sim \mathcal{N}(0, \theta_{jt}),\\ \label{eq:threshold_1}
\theta_{jt} &=s_{jt} \vartheta_{j1}+(1-s_{jt}) \vartheta_{j0},
\end{align}
where $s_{jt}$ is an indicator variable with an unconditional Bernoulli distribution. This mechanism is closely related to an absolutely continuous spike-and-slab prior where the slab has variance $\vartheta_{j1}$ and the spike has variance $\vartheta_{j0}$ with  $\vartheta_{j1} \gg \vartheta_{j0}$ \citep[see e.g.,][for an excellent survey on this class of priors]{mal-wag:com}.

In the present framework we assume that the conditional distribution  $p(s_{jt}|\Delta \beta_t)$ follows a threshold process,
\begin{equation}
s_{jt} = \begin{cases} 1 ~\text{ if }~ |\Delta \beta_{jt}|>d_j, \\
0 ~\text{ if } ~ |\Delta \beta_{jt}|\le d_j, \end{cases} \label{eq:threshold_2}
\end{equation}
where $d_j$ is a coefficient-specific threshold to be estimated and $\Delta \beta_{jt} := \beta_{jt}-\beta_{j,t-1}$.  Equations (\ref{eq:threshold_1}) and (\ref{eq:threshold_2}) state that  if the absolute period-on-period change of $\beta_{jt}$ exceeds a threshold $d_j$, we assume that the change in $\beta_{jt}$ is normally distributed with  zero mean and variance $\vartheta_{j1}$.  On the contrary, if the change in the parameter is too small, the innovation variance is set close to zero, effectively implying that $\beta_{jt} \approx \beta_{j,t-1}$, i.e.,~almost no change from period $(t-1)$ to $t$. 



This modeling approach provides a great deal of flexibility, nesting a plethora of simpler model specifications. The interesting cases are characterized by situations where  $s_{jt}$ equals unity only for some $t$. For instance, it could be the case that parameters tend to exhibit strong movements at given points in time but stay constant for the majority of the time. An unrestricted time-varying parameter model would imply that the parameters are gradually changing over time, depending on the innovation variance in \autoref{eq:states1}. Another prominent case would be a structural break model with an unknown number of  breaks \citep[for a recent Bayesian exposition, see][]{koop2007estimation}.

The mixture innovation component in \autoref{eq:threshold_1} implies that we discriminate between two regimes. The first regime assumes that changes in the parameters tend to be large and important to predict $y_t$ whereas in the second regime, these changes can be safely regarded as zero, thus effectively leading to a constant parameter model over a given period of time. Compared to a standard mixture innovation model that postulates $s_{jt}$ as a sequence of independent Bernoulli variables, our approach assumes that regime shifts are governed by a (conditionally) deterministic law of motion. The main advantage of our approach relative to mixture innovation models is that instead of having to estimate a full sequence of $s_{jt}$ for all $j$, the threshold mixture innovation model only relies on a single additional parameter per coefficient. This renders estimation of high dimensional models such as vector autoregressions (VARs) feasible. The additional computational burden turns out to be negligible relative to an unrestricted TVP-VAR, see Section~\ref{sec:mcmc} for more information.

Our model is also closely related to the latent thresholding approach put forward in  \cite{nakajima2013bayesian} within the time series context. While in their model latent thresholding discriminates between the inclusion or exclusion of a given covariate at time $t$, our model detects whether the associated regression coefficient is constant or time-varying. 

\subsection{The threshold mixture innovation TTVP-VAR model}
The model proposed in the previous subsection can be straightforwardly generalized to the VAR case with stochastic volatility (SV)  by assuming that $\boldsymbol{y}_t$ is an  $m$-dimensional response vector. In this case, \autoref{eq:obs} becomes,
\begin{equation}
\boldsymbol{y}_t = \boldsymbol{x}_t' \boldsymbol{\beta}_t+\boldsymbol{u}_t,
\end{equation}
with $\boldsymbol{x}'_t = \{\boldsymbol{I}_M \otimes \boldsymbol{z}'_t\}$, where $\boldsymbol{z}_t = (\boldsymbol{y}'_{t-1}, \dots, \boldsymbol{y}'_{t-P})'$ includes the $P$ lags of the  endogenous variables.\footnote{In the empirical application, we also include an intercept term which we omit here for simplicity.} The vector $\boldsymbol{\beta}_t$ now contains the dynamic autoregressive coefficients with dimension $K=M^2p$ where each element follows  the  state equation  given by Eqs. (\ref{eq:states1}) to (\ref{eq:threshold_2}). The vector of white noise shocks $\boldsymbol{u}_t$ is distributed as
\begin{equation}
\boldsymbol{u}_t \sim \mathcal{N}(\boldsymbol{0}_m, \boldsymbol{\Sigma}_t).
\end{equation}
Hereby, $\boldsymbol{0}_m$ denotes an $m$-variate zero vector and $\boldsymbol{\Sigma}_t = \boldsymbol{V}_t \boldsymbol{H}_t \boldsymbol{V}_t'$ is a time-varying variance-covariance matrix. The matrix $\boldsymbol{V}_t$ is a lower triangular matrix with unit diagonal and $\boldsymbol{H}_t = \text{diag}(e^{h_{1t}},\dots,e^{h_{mt}})$. We assume that the logarithm of the variances evolves according to
\begin{equation}
h_{it} = \mu_i + \rho_i (h_{i,t-1}+\mu_i) + \nu_{it},~\text{ for }~i=1,\dots,m,
\end{equation}
with $\mu_i$ and $\rho_i$  being equation-specific mean and persistence parameters and $\nu_{it} \sim \mathcal{N}(0,\zeta_i)$ is  an equation-specific white noise error with variance $\zeta_i$. For the covariances in $\boldsymbol{V}_t$ we impose the random walk state equation with error variances given by \autoref{eq:threshold_1}.

Conditional on the ordering of the variables it is straightforward to estimate the TTVP model on an equation-by-equation basis, augmenting the $i$th equation with the contemporaneous values of the preceding $(i-1)$ equations (for $i>1$), leading to a Cholesky-type decomposition of the variance-covariance matrix. Thus, the  $i$th equation (for $i=2,\dots,m$) is given by
\begin{equation}
y_{it} =  \tilde{\boldsymbol{z}}'_{it} \tilde{\boldsymbol{\beta}}_{it}+  u_{it}. \label{eq: vareqspecific}
\end{equation}
We let  $\tilde{\boldsymbol{z}}_{it}= (\boldsymbol{z}'_t, y_{1t},\dots, y_{i-1, t})'$, and  $\tilde{\boldsymbol{\beta}}_{it}=(\boldsymbol{\beta}_{it}', \tilde{v}_{i1, t}, \dots, \tilde{v}_{i i-1, t})'$ is a vector of latent states with dimension $K_i= Mp+i-1$.  Here, $\tilde{v}_{ij, t}$ denotes the  dynamic regression coefficients  on the $j$th (for $j<i$) contemporaneous value showing up in  the $i$th equation.  Note that for the first equation we have $\tilde{\boldsymbol{z}}_{1t}= \boldsymbol{z}_t$ and  $\tilde{\boldsymbol{\beta}}_{1t}=\boldsymbol{\beta}_{1t}$. 

The law of motion of the $j$th element of $\tilde{\boldsymbol{\beta}}_{it}$ reads
\begin{equation}
\tilde{\beta}_{ij,t} = \tilde{\beta}_{ij,t-1}+\sqrt{\theta_{ij,t}} r_t, ~ r_t \sim \mathcal{N}(0,1).
\end{equation}
Hereby, $\theta_{ij,t}$ is defined similarly to \autoref{eq:threshold_1}.  In what follows, it proves to be convenient to stack the states of all equations in a $K$-dimensional vector $\tilde{\boldsymbol{\beta}}_t = (\boldsymbol{\beta}'_{1t},\dots, \tilde{\boldsymbol{\beta}}'_{Mt})'$ and let $\tilde{\beta}_{jt}$ denote the $j$th element of $\tilde{\boldsymbol{\beta}}_t$.

While being clearly not order-invariant, this specific way of  stating the model  yields two significant computational gains.  First,  the matrix operations involved in estimating the latent state vector become computationally less cumbersome. Second, we can exploit parallel computing and estimate each equation simultaneously on a grid.







\subsection{Prior specification}
\label{sec:priors}
Since our approach to estimation and inference is Bayesian, we have to specify suitable prior distributions for all parameters of the model.

We impose a Normal-Gamma prior \citep{griffin2010inference} on each element of $\tilde{\boldsymbol{\beta}}_{i0}$, the initial state of the $i$th equation,
\begin{equation}
\tilde{\beta}_{0i,j}|\tau_{j} \sim \mathcal{N}\left(0, \frac{2}{\lambda_i^2}  \tau^2_{ij}\right),~\tau^2_{ij} \sim \mathcal{G}(a_{i},a_{i}),
\end{equation}
for $i=1,\dots,m; j=1,\dots, K_i$. Hereby, $\lambda_i^2$ and $a_{i}$ are hyperparameters and $\tau^2_{ij}$ denotes an idiosyncratic scaling parameter that applies an individual degree of shrinkage on each element of $\tilde{\boldsymbol{\beta}}_{i0}$. The hyperparameter $\lambda_i^2$ serves as an equation-specific shrinkage parameter that shrinks all elements of $\tilde{\boldsymbol{\beta}}_{i0}$ that belong to the $i$th equation towards zero while the local shrinkage parameters $\tau_{ij}$ provide enough flexibility to also allow for non-zero values of $\tilde{\beta}_{0i, j}$ in the presence of a tight global prior specification. 

For the equation-specific scaling parameter $\lambda_i^2$ we impose a Gamma prior,
$
\lambda_i^2 \sim \mathcal{G}(b_0,b_1),
$
with $b_0$ and $b_1$ being hyperparameters chosen by the researcher. In typical applications we specify $b_0$ and $b_1$ to render this prior effectively non-influential. 

If the  innovation variances  of the observation equation are assumed to be constant over time, we impose a Gamma prior on  $\sigma_i^{-2}$ with hyperparameters $c_0$ and $c_1$, i.e.,~$\sigma_i^{-2} \sim \mathcal{G}(c_0, c_1)$.  By contrast,  if  stochastic volatility is introduced we follow \cite{kastner2014ancillarity} and impose a normally distributed prior on $\mu_i$ with mean zero and variance $100$, a Beta prior on $\rho_i$  with $(\rho_i+1)/2\sim \mathcal{B}(a_\rho,b_\rho)$, and a Gamma distributed prior on $\zeta_i \sim \mathcal{G}(1/2, 1/(2B_\zeta))$.

In principle, the spike variance $\vartheta_{ij,0}$ could be estimated from the data and a suitable shrinkage prior could be employed to push $\vartheta_{ij,0}$ towards zero. However, we follow a simpler approach and estimate the slab variance $\vartheta_{ij, 1}$ only while setting $\vartheta_{ij,0} = \xi \times \hat{\vartheta}_{ij}$. Here, $\hat{\vartheta}_{ij}$ denotes the variance of the OLS estimate for automatic scaling which we treat as a constant specified a priori. The multiplier $\xi$ is set to a fixed constant close to zero, effectively turning off any time-variation in the parameters. As long as $\vartheta_{ij,0}$ is  not chosen too large, the specific value of the spike variance proves to be rather non-influential in the empirical applications that follow. 

We use a Gamma distributed prior on the inverse of the innovation variances in the state specification in \autoref{eq:states1}, i.e.,~$
\vartheta_{ij,1}^{-1} \sim \mathcal{G}(r_{ij,0}, r_{ij, 1})$ for $i=1,\dots,m; j=1,\dots,K_i$.\footnote{Of course, it would also be possible to use a (restricted) Gamma prior on $\vartheta_{ij,1}$ in the spirit of \cite{fruhwirth2010stochastic}. However, we have encountered some issues with such a prior if the number of observations in the regime associated with $s_{ij,t}=1$ is small. This stems from the fact that the corresponding conditional posterior distribution is generalized inverse Gaussian, a distribution that is heavy tailed and under certain conditions leads to excessively large draws of $\vartheta_{ij,1}$.} 
Again, $r_{ij, 0}$ and $r_{ij, 1}$ denote scalar hyperparameters. This choice implies that we artificially bound $\vartheta_{ij,1}$ away from zero, implying that in the upper regime we do not exert strong shrinkage. This is in contrast to a standard time-varying parameter model, where this prior is usually set rather tight to control the degree of time variation in the parameters \citep[see, e.g.,][]{primiceri2005time}. Note that in our model the degree of time variation is governed by the thresholding mechanism instead.

Finally, the prior specification of the baseline model is completed by imposing a uniform distributed prior on the thresholds,
\begin{equation}
d_{ij} \sim \mathcal{U}(\pi_{ij,0}, \pi_{ij,1}) \text{ for } j=1,\dots,K_i. \label{eq:priorthresholds}
\end{equation}
Here, $\pi_{ij,0}$ and $\pi_{ij,1}$ denote the boundaries of the prior that have to be specified carefully. In our examples, we use $\pi_{0i,j} = 0.1 \times \sqrt{\vartheta_{ij,1}}$ and $\pi_{ij,1} =  1.5 \times \sqrt{\vartheta_{ij,1}}$. This prior bounds the thresholds away from zero, implying that a certain amount of shrinkage is always imposed on the autoregressive coefficients. Setting $\pi_{ij,0}=0$ for all $i,j$ would also be a feasible option but we found in simulations that being slightly informative on the presence of a threshold improves the empirical performance of the proposed model markedly. It is worth noting that even under the assumption that $\pi_{0j}>0$, our framework performs well in simulations where the data is obtained from a non-thresholded version of our model. This stems from the fact that in a situation where parameters are expected to evolve smoothly over time, the average period-on-period change of $\beta_{ij,t}$ is small, implying that $0.1 \times \sqrt{\vartheta_{ij,1}}$ is close to zero and the model effectively shrinks small parameter movements to zero. 



\subsection{Posterior simulation}
\label{sec:mcmc}
We sample from the joint posterior distribution of the model parameters by utilizing a Markov chain Monte Carlo (MCMC) algorithm. Conditional on the thresholds $d_{ij}$, the remaining parameters can be simulated in a straightforward fashion. After initializing the parameters using suitable starting values we iterate between the following six steps.

\begin{enumerate}
\item We start with equation-by-equation simulation of the full history $\{\tilde{\boldsymbol{\beta}}_{it}\}_{t=0,1,\dots,T}$ 
by means of a standard forward filtering backward sampling algorithm \citep{carter1994gibbs, fruhwirth1994data} while conditioning on the remaining parameters of the model

\item The inverse of the innovation variances of \autoref{eq:states1}, $\vartheta^{-1}_{ij},~i=1,\dots,m; j=1,\dots,K_i$ have conditional density
\[
p(\vartheta^{-1}_{ij}|\bullet)=p(\vartheta^{-1}_{ij}|d_{ij},\boldsymbol{\beta}) \propto p(\boldsymbol{\beta}|\vartheta^{-1}_{ij},d_{ij})p(d_{ij}|\vartheta^{-1}_{ij})p(\vartheta^{-1}_{ij}),
\]
which turns out to be a Gamma distribution, i.e.,
\begin{equation}
\vartheta^{-1}_{ij}|\bullet \sim \mathcal{G}\left(r_{ij,0} + \frac{T_{ij,1}}{2} + \frac{1}{2},r_{ij,1}+\frac{\sum_{t=1}^{T}s_{ij,t}(\tilde{\beta}_{ij,t}-\tilde{\beta}_{ij,t-1})^2}{2}\right),
\end{equation}
with $T_{ij,t}=\sum_{t=1}^T s_{ij, t}$ denoting the number of time periods that feature time variation in the $j$th parameter and the $i$th equation.

\item Combining the Gamma prior on $\tau_{ij}^2$ with the Gaussian likelihood yields a Generalized Inverted Gaussian (GIG) distribution
\begin{equation}
\tau_{ij}^2|\bullet \sim \mathcal{GIG}\left(a_{ij}-\frac{1}{2}, \tilde{\beta}_{ij, 0}^2, a_{ij} \lambda_i^2\right),
\end{equation}
where the density of the GIG$(\kappa, \chi,\psi)$ distribution is proportional to 
\begin{equation}
z^{\kappa-1} \exp\left\lbrace - \frac{1}{2}\left( \frac{\chi}{z}+\psi z\right)\right\rbrace.
\end{equation}
To sample from this distribution, we use the R package GIGrvg \citep{GIGrvg} implementing the efficient rejection sampler proposed by \cite{hoermann2013generating}.

\item The global shrinkage parameter $\lambda_i^2$ is sampled from a Gamma distribution given by
\begin{equation}
\lambda_i^2| \bullet \sim \mathcal{G}\left(b_0+a_i K_i, b_1+\frac{a_i}{2}\sum_{j=1}^{K_i} \tau^2_{ij}\right).
\end{equation}

\item We update the thresholds by applying $K_i$ Griddy Gibbs steps \citep{ritter1992facilitating} per equation. Due to the structure of the model, the conditional distribution of $\tilde{\boldsymbol{\beta}}_{ij,1:T} = (\beta_{ij,1},\dots,\beta_{ij,T})'$ is
\begin{equation}
p\left(\tilde{\boldsymbol{\beta}}_{ij,1:T} | d_{ij}, \vartheta_{ij}\right) \propto \prod_{t=1}^T \frac{1}{\sqrt{2 \pi \theta_{ij, t} }} \exp \left\lbrace -\frac{(\tilde{\beta}_{ijt}-\tilde{\beta}_{ij,t-1})^2}{2 \theta_{ij, t}}\right\rbrace.
\end{equation}
This expression can be straightforwardly combined with the prior in \autoref{eq:priorthresholds} to evaluate the conditional posterior of $d_{ij}$ at a given candidate point. The procedure is repeated over a fine grid of values that is determined by the prior and an approximation to the inverse cumulative distribution function  of the posterior is constructed. Finally, this approximation is used to perform inverse transform sampling.
\item The coefficients of the log-volatility equation and the corresponding history of the log-volatilities are sampled by means of the algorithm brought forward by \cite{kastner2014ancillarity} which is efficiently implemented in the R package \texttt{stochvol} \citep{kastner2016dealing}. Under homoscedasticity, $\sigma_i^{-2}$ is simulated from $\sigma_i^{-2}|\bullet \sim \mathcal{G}\left(c_0+T/2, c_1+{\sum_{t=1}^T (y_{it}-\boldsymbol{z}_{it}' \tilde{\boldsymbol{\beta}}_{it})^2}/{2}\right).$
\end{enumerate}
After obtaining an appropriate number of draws, we discard the first $N$ as burn-in and base our inference on the remaining draws from the joint posterior. 

In comparison with standard TVP-VARs, Step (5) is the only additional MCMC step needed to estimate the proposed TTVP model. Moreover, note that this update is computationally cheap, increasing the amount of time needed to carry out the analysis conducted in Section~\ref{sec:application} by around five percent. For larger models (i.e.,\ with $m$ being around $15$) this step becomes slightly more intensive but, relative to the additional computational burden introduced by applying the FFBS algorithm in Step (1), its costs are still comparably small relative to the overall computation time needed. 
We found that mixing and convergence properties of our proposed algorithm are similar to standard Bayesian TVP-VAR estimators. In other words, the sampling of the thresholds does not seem to substantially increase the autocorrelation of the MCMC draws.
The TTVP algorithm is bundled into the R package \texttt{threshtvp} which is available from the authors upon request.

\section{An illustrative example}
\label{sec:illustration}

In this section we illustrate our approach by means of a rather stylized example that emphasizes how well the mixture innovation component for the state innovations performs when used to approximate different data generating processes (DGPs).

For demonstration purposes it proves to be convenient to start with the following simple DGP with $K=1$:
\begin{align*}
y_t &= x_{1t}' \beta_{1t} + u_t, ~u_t \sim \mathcal{N}(0, 0.01^2), \\
\beta_{1t} &= \beta_{1t-1} + e_{1t}, ~e_{1t} \sim \mathcal{N}(0,s_{1t} \times 0.15^2),
\end{align*}
where $s_{1t} \in \{0,1\}$ is chosen at random to yield paths which are characterized by many, moderately many, and few breaks.
Finally, independently for all $t$, we generate $x_{1t} \sim \mathcal{U}(-1,1)$ and set $\beta_{1,0}=0$.
 
\autoref{fig:examples} shows three possible realizations of $\beta_{1t}$ and the corresponding estimates obtained from a standard TVP model and our TTVP model. To ease comparison between the models we impose a similar prior setup for both models. Specifically, for $\sigma^{-2}$ we set $c_0=0.01$ and $c_1=0.01$, implying a rather vague prior. For the shrinkage part on $\beta_{1,0}$ we set $\lambda^2 \sim \mathcal{G}(0.01,0.01)$ and $a_1 = 0.1$, effectively applying heavy shrinkage on the initial state of the system. The prior on $\vartheta_1$ is specified as in \cite{nakajima2013bayesian}, i.e., $\vartheta^{-1}_1 \sim \mathcal{G}(3,0.03)$. To complete the prior setup for the TTVP model we set $\pi_{1,0}=0.1\times \sqrt{\vartheta_1}$ and $\pi_{1,1}=1.5\times \sqrt{\vartheta_1}$.

\begin{figure}[p]
\includegraphics[width=.98\textwidth, trim=30 35 25 50, clip=true]{beta_2.pdf}\\
\includegraphics[width=.98\textwidth, trim=30 35 25 50, clip=true]{beta_2_5.pdf}\\
\includegraphics[width=.98\textwidth, trim=30 35 25 50, clip=true]{beta_3.pdf}
\caption{Left: Evolution of the actual state vector (dotted black) along with the posterior medians of the TVP model (dashed  blue) and the TTVP model (solid red). The TTVP posterior moving probability is indicated by areas shaded in gray. Right: Demeaned posterior distribution of the TVP model (90\% credible intervals in shaded blue) and the TTVP model (90\% credible intervals in red).}
\label{fig:examples}
\end{figure}

The left panel of \autoref{fig:examples} displays the evolution of the posterior median of a standard TVP model  (in dotted blue) and of the TTVP model (in solid red) along with the actual evolution of the state vector (in dotted black).  In addition, the areas shaded in gray depict the probability that a given coefficient moves over a certain time frame (henceforth labeled as posterior moving probability, PMP). The right panel shows  de-meaned 90\% credible intervals of the coefficients from the TVP model (blue shaded area) and the TTVP model (solid red lines).

At least two interesting findings emerge. First, note that in all three cases, our approach detects parameter movements rather well, with the PMP reaching unity in virtually all time points that feature a structural break of the corresponding parameter. The TVP model also tracks the actual movement of the states well but does so with much more high frequency variation. This is a direct consequence of the inverted Gamma prior on the state innovation variances that bound $\vartheta_1$ artificially away from zero, irrespective of the information contained in the likelihood \citep[see][for a general discussion of this issue]{fruhwirth2010stochastic}. 

Second, looking at the uncertainty surrounding the median estimate (right panel of \autoref{fig:examples}) reveals that our approach succeeds in shrinking the posterior variance. This is due to the fact that in periods where the true value of $\beta_{1t}$ is constant, our model successfully assumes that the estimate of the coefficient at time $t$ is also constant, whereas the TVP model imposes a certain amount of time variation. This generates additional uncertainty  that inflates the posterior variance, possibly leading to imprecise inference.

Thus, the TTVP model detects change points in the parameters in situations where the actual number of breaks is small, moderate and large. In situations where the DGP suggests that the actual threshold equals zero, our approach still captures most of medium to low frequency noise but shrinks small movements that might, in any case, be less relevant for econometric inference.




















\section{Empirical application: Macroeconomic forecasting and structural change}
\label{sec:application}
\subsection{Model specification and data}
We use an extended version of the US macroeconomic data set employed in \cite{smets2007shocks}, \cite{geweke2012prediction} and \cite{amisano2017prediction}. Data are on a quarterly basis, span the period from 1947Q2 to 2014Q4, and comprise the log differences of  consumption, investment, real GDP, hours worked,  consumer prices and real wages. Last, and as a policy variable, we include the Federal Funds Rate (FFR) in levels. In the next subsections we investigate structural breaks in  macroeconomic relationships by means of forecasting and impulse response analysis.

Following \cite{primiceri2005time}, we include $p=2$ lags of the endogenous variables. The prior setup is similar to the one adopted in the previous sections, except that now all hyperparameters are equation-specific and feature an additional index $i=1,\dots,m$. More specifically, for all applicable $i$ and $j$, we use the following values for the hyperparameters. For the shrinkage part on the initial state of the system, we again set $\lambda_i^2 \sim \mathcal{G}(0.01,0.01)$ and $a_i = 0.1$, and the prior on $\vartheta_{ij}$ is specified to be informative with $\vartheta^{-1}_{ij} \sim \mathcal{G}(3,0.03)$.  For the parameters of the log-volatility equation we use $\mu_i \sim \mathcal{N}(0, 10^2), \frac{\rho_i+1}{2} \sim \mathcal{B}(25,5)$, and $\zeta_i \sim \mathcal{G}(1/2, 1/2)$. The last ingredient missing is the prior on the thresholds where we set $\pi_{ij, 0}=0.1 \times \sqrt{\vartheta_{ij,1}}$ and $\pi_{ij, 1}=1.5 \times \sqrt{\vartheta_{ij,1}}$.

For the seven-variable VAR we draw $500\,000$ samples from the joint posterior and discard the first $400\,000$ draws as burn-in. Finally, we use thinning such that inference is based on $5000$ draws out of $100\,000$ retained draws. 

\subsection{Forecasting evidence}

We start with a simple forecasting exercise of one-step-ahead predictions. For that purpose we use an expanding window and a hold-out sample of 100 quarters. Forecasts are evaluated using log-predictive Bayes factors, which are defined as the difference of log predictive scores (LPS) of a specification of interest and a  benchmark model. The log-predictive score is a widely used metric to measure density forecast accuracy \citep[see e.g.,][]{geweke2010comparing}. 

As the benchmark model, we use a TVP-VAR with relatively little shrinkage. This amounts to setting the thresholds equal to zero  and specify the prior on $\vartheta_{ij}^{-1}\sim\mathcal{G}(3,0.03)$. We, moreover,  include two additional constant parameter competitors, namely a  Minne\-sota-type VAR \citep{Doan1984} and  a Normal-Gamma (NG) VAR \citep{Huber2017}. All models feature stochastic volatility.  In order to assess the impact of different prior hyperparameters on $\vartheta_{ij}$ and the impact of $\xi$, we estimate the TTVP model over a grid of meaningful values.

\autoref{tab:LPS_1} depicts the LPS differences between a given model and the benchmark model. First, we see that all models outperform the no-shrinkage time-varying parameter VAR as indicated by positive values of the log-predictive Bayes factors. Second, constant parameter VARs with shrinkage turn out to be hard to beat. Especially the hierarchical Minnesota prior does a good job with respect to one-quarter-ahead forecasts. For the TTVP model we see that forecast performance also varies with the prior specification. More specifically, the results show that increasing $\xi$, which implies more time variation in the lower regime a-priori,  deteriorates the forecasting performance. This is especially true if a large value for $\xi$ is coupled with small values of $r_{ij,0}$ and $r_{ij,1}$ -- the latter referring to the a priori belief of large swings of coefficients in the upper regime of the model.


\begin{table}[t]
\centering
\begin{tabular}{lcrr}
   								&$ \begin{aligned}
         r_{ij,0}&=3 \\
          r_{ij,1}&=0.03
        \end{aligned}$  & $ \begin{aligned}
         r_{ij,0}&=1.5 \\
          r_{ij,1}&=1
        \end{aligned}$ & $ \begin{aligned}
         r_{ij,0}&=0.001 \\
          r_{ij,1}&=0.001
        \end{aligned}$ \\\midrule
  $\xi=\xi_1 = 10^{-6}$ & 169.06 & 168.75 & 169.16 \\         
   $\xi  = \xi_2= 10^{-5}$& 170.80 & 170.60 & 173.87 \\         
  $\xi =\xi_3  = 10^{-4}$ & 170.95 & 172.31 & 158.44 \\         
  $\xi = \xi_4 = 10^{-3}$ & 130.45 & 163.78 & 137.53 \\         
  \midrule           	
   & NG & Minnesota &  \\ 
  BVAR & 173.77 & 177.20 &  \\        
   \bottomrule
   \end{tabular}
   \caption{Log predictive Bayes factors relative to a time-varying parameter VAR without shrinkage for different key parameters of the model. The final row refers to the log predictive Bayes factor of a BVAR  equipped with a Normal-Gamma (NG) shrinkage prior and a hierarchical Minnesota prior. All models estimated with stochastic volatility.  Numbers greater than zero  indicate that a given model outperforms the benchmark.}
   \label{tab:LPS_1}
\end{table}

To investigate the predictive performance of the different model specifications further, \autoref{fig:lps_a} shows the log predictive Bayes factors relative to the benchmark model over time. The plot shows that the  specifications with $\xi_1$ and $\xi_2$  excel during most of the sample period, irrespective of the prior on $\vartheta_{ij}$. The constant parameter models, by contrast, dominate only during two very distinct periods of our sample, namely at the beginning and at the end of the time span covered. In both periods, no severe up or downswings in economic activity occur and the  constant parameter models with SV display excellent predictive capabilities. By contrast, during volatile periods -- such as the global financial crisis -- our modeling approach seems to pay off in terms of predictive accuracy. To investigate this in more detail, we focus on the forecasting performance of the different model specifications during the period from 2006Q1 to 2010Q1 in \autoref{fig:lps_b}. Here we see that TTVP specifications with $\xi_j$ for $j<4$ outperform all remaining competitors. This additional, and more detailed, look at the forecasting performance during turbulent times thus reveals that the TTVP model is a valuable alternative to simpler models. Put differently, we observe that during more volatile periods the TTVP model can severely outperform constant parameter models, while in tranquil times its forecasts are never far off.  

\begin{figure}[p]
\centering
\begin{subfigure}{\textwidth}
\includegraphics[width=\textwidth, clip, trim = 30 45 20 40]{LPS_benchmarks.pdf}
\caption{Full evaluation period (1995Q1 to 2014Q4).}\label{fig:lps_a}
\end{subfigure}\\
\begin{subfigure}{\textwidth}
\includegraphics[width=\textwidth, clip, trim = 30 45 25 40]{LPS_benchmark_crisis.pdf}
\caption{Crisis period only (2006Q1 to 2010Q1).}\label{fig:lps_b}
\end{subfigure}
\caption{Log predictive Bayes factor relative to a TVP-VAR-SV model.}\label{fig:lps}
\end{figure}





Next, we examine turning point forecasts, since the detection of structural breaks might be a further useful application of the TTVP framework. We focus on turning points in real GDP growth and follow \cite{canova2004forecasting} to label time point $(t+1)$ a \emph{downward turning point} -- conditional on the information up to time $t$ -- if $S_{t+1}$, the growth rate of real GDP at time $(t+1)$, satisfies that $S_{t-2} < S_t$, $S_{t-1} < S_{t}$, and $S_t > S_{t+1}$. Analogously, the time point $(t+1)$ is labeled an \emph{upward turning point} if $S_{t-2} > S_t$, $S_{t-1}>S_{t}$, and $S_t < S_{t+1}$. Equipped with these definitions, we then can 
split the total number of turning points up into upturns and downturns and compute the quadratic probability (QPS) scores as an accuracy measure of upturn and downturn probability forecasts.  The results are provided in \autoref{tab:QPS}.

\begin{table}[t]
\centering
\resizebox{\columnwidth}{!}{%
\begin{tabular}{lccccccc}
&\multicolumn{3}{c}{Downturns}  & &  \multicolumn{3}{c}{Upturns}\\
&$ \begin{aligned}
         r_{ij,0}&=3 \\
          r_{ij,1}&=0.03
        \end{aligned}$  & $ \begin{aligned}
         r_{ij,0}&=1.5 \\
          r_{ij,1}&=1
        \end{aligned}$ & $ \begin{aligned}
         r_{ij,0}&=0.001 \\
          r_{ij,1}&=0.001
        \end{aligned}$  &  & $ \begin{aligned}
         r_{ij,0}&=3 \\
          r_{ij,1}&=0.03
        \end{aligned}$  & $ \begin{aligned}
         r_{ij,0}&=1.5 \\
          r_{ij,1}&=1
        \end{aligned}$ & $ \begin{aligned}
         r_{ij,0}&=0.001 \\
          r_{ij,1}&=0.001
        \end{aligned}$  \\ \midrule
  $\xi=\xi_1 = 10^{-6}$  & 0.66 & 0.67 & 0.67 &  & 0.84 & 0.83 & 0.83 \\                  
  $\xi=\xi_2 = 10^{-5}$  & 0.66 & 0.66 & 0.67 &  & 0.83 & 0.83 & 0.81 \\                  
   $\xi=\xi_3 = 10^{-4}$ & 0.64 & 0.65 & 0.68 &  & 0.83 & 0.84 & 0.80 \\                   
   $\xi=\xi_4 = 10^{-3}$ & 0.87 & 0.67 & 0.78 &  & 0.81 & 0.83 & 0.78 \\                   \midrule
     & NG & Minnesota &  &  &     NG & Minnesota& \\ 
  BVAR &0.62&0.62  &  &  &  0.84 & 0.93    &                \\ \bottomrule
 \end{tabular}}
   \caption{QPS scores  relative to a time-varying parameter VAR without shrinkage for different key parameters of the model. The final row refers to the QPS score of a BVAR equipped with a Normal-Gamma (NG) shrinkage prior and a hierarchical Minnesota prior. All models estimated with stochastic volatility.  Numbers  below unity  indicate that a given model outperforms the benchmark.}
\label{tab:QPS}
\end{table}


The picture that arises is similar to that of the density forecasting exercise: all variants of the TTVP model beat the no-shrinkage time-varying parameter VAR model. Turning point forecasts deteriorate for larger values of $\xi$ and especially so if they are coupled with small  choices for $r_{ij}$, yielding a relatively uninformative prior on $\vartheta_{ij}$ and consequently little shrinkage. Forecast gains relative to the benchmark model are more sizable for downward than for upward turning points. In comparison to the two constant parameter competitors, the TTVP model excels in predicting upward turning points (for which there are more observations in the sample), while forecast performance is slightly inferior for downward forecasts. Also note that for downward predictions, penalizing time-variation seems to be essential and consequently the strongest performance among TTVP specifications is achieved for small values of $\xi$. The opposite is the case for upward turning points where reasonable predictions can be also achieved with a rather loose prior. 





\subsection{Detecting structural breaks in US data}

In this section we aim to have a closer and more systematic look at changes in the joint dynamics of our seven variable TTVP-VAR model for the US economy. To that end, we examine the posterior mean  of the determinant of the time-varying variance-covariance matrix of the innovations in the state equation \citep{cogley2005drifts}. For each draw of $\boldsymbol{\Omega}_{it} = \text{diag}(\theta_{i1,t },\dots,\theta_{K_i1,t})$ we compute its log-determinant and subtract the mean across time. Large values of this measure point towards a pronounced degree of time-variation in the autoregressive coefficients of the corresponding equations.  The results are provided in \autoref{fig:concovtrace} for each equation and the full system.

\begin{figure}[p]
\begin{subfigure}{.327\textwidth}
\includegraphics[width=\textwidth, clip, trim = 30 40 30 40]{concovtrace.pdf}
\caption{Consumption}\label{fig:2a}
\end{subfigure}
\begin{subfigure}{.327\textwidth}
\includegraphics[width=\textwidth, clip, trim = 30 40 30 40]{invcovtrace.pdf}
\caption{Investment}\label{fig:2b}
\end{subfigure}
\begin{subfigure}{.327\textwidth}
\includegraphics[width=\textwidth, clip, trim = 30 40 30 40]{outcovtrace.pdf}
\caption{Output}\label{fig:2c}
\end{subfigure}\\[.7em]
\begin{subfigure}{.327\textwidth}
\includegraphics[width=\textwidth, clip, trim = 30 40 30 40]{houcovtrace.pdf}
\caption{Hours worked}\label{fig:2d}
\end{subfigure}
\begin{subfigure}{.327\textwidth}
\includegraphics[width=\textwidth, clip, trim = 30 40 30 40]{infcovtrace.pdf}
\caption{Inflation}\label{fig:2e}
\end{subfigure}
\begin{subfigure}{.327\textwidth}
\includegraphics[width=\textwidth, clip, trim = 30 40 30 40]{reacovtrace.pdf}
\caption{Real wages}\label{fig:2f}
\end{subfigure}\\[.7em]
\centering
\begin{subfigure}{.327\textwidth}
\includegraphics[width=\textwidth, clip, trim = 30 40 30 40]{intcovtrace.pdf}
\caption{FFR}\label{fig:2g}
\end{subfigure}
\begin{subfigure}{.327\textwidth}
\includegraphics[width=\textwidth, clip, trim = 30 40 30 40]{det_overall.pdf}
\caption{Overall}\label{fig:2h}
\end{subfigure}
\caption{Posterior mean of the determinant of time-varying variance-covariance matrix of the innovations to the state equation from 1947Q2 to 2014Q4. Values are obtained by taking the exponential of the demeaned log-determinant across equations. Gray shaded areas refer to US recessions dated by the NBER business cycle dating committee.}
\label{fig:concovtrace}
\end{figure}


For all variables we see at least one prominent spike during the sample period indicating a structural break. Most spikes in the determinant occur around 1980, when then Fed chairman Paul Volcker sharply increased short-term interest rates to fight inflation. Other breaks relate to the dot-com bubble in the early 2000s (consumption), the oil price crisis and stock market crash in the early 1970s (hours worked) and another oil price related crisis in the early 1990s. Also, the transition from positive interest rates to the zero lower bound in the midst of the global financial crisis is indicated by a spike in the determinant. That we can relate spikes to historical episodes of financial and economic distress lends further confidence in the modeling approach. Among these periods, the early 1980s seem to have constituted by far the most severe rupture for the US economy. 


\subsection{Impulse responses to a monetary policy shock}
In this section we examine the dynamic responses of a set of macroeconomic variables to a contractionary monetary policy shock. The monetary policy shock is calibrated as a 100 basis point (bp) increase in the FFR and identified using a Cholesky ordering with the variables appearing in exactly the same order as mentioned above. This ordering is in the spirit of \cite{christiano2005} and has been subsequently used in the literature \citep[see][for an excellent survey]{Coibion2012}. Drawing on the results of the previous section, we focus on two sub-sets of the sample, namely the pre-Volcker period from 1947Q4 to 1979Q1 and the rest of the sample.\footnote{The split into two sub-sets is conducted for interpretation purposes only. For estimation, the entire sample has been used.} The time-varying impulse responses -- as functions of horizons -- are displayed in \autoref{fig:irf_volcker}. Additionally, we also include impulse responses for different horizons -- as functions of time -- over the full sample period in \autoref{fig:irf1} and \autoref{fig:irf2}.

\begin{figure}[p]
\begin{minipage}{1\linewidth}~\\
\centering \textbf{1947Q4 to 1979Q1}
\end{minipage}\\[.5em]
\begin{minipage}[b]{0.246\linewidth}
\centering \includegraphics[clip, trim=20 45 20 50, width=\linewidth]{pre_volcker_consumption.pdf}
Consumption
\end{minipage}%
\begin{minipage}[b]{0.246\linewidth}
\centering \includegraphics[clip, trim=20 45 20 50, width=\linewidth]{pre_volcker_investment.pdf}
Investment
\end{minipage}
\begin{minipage}[b]{0.246\linewidth}
\centering \includegraphics[clip, trim=20 45 20 50, width=\linewidth]{pre_volcker_output.pdf}
Output
\end{minipage}
\begin{minipage}[b]{0.246\linewidth}
\centering \includegraphics[clip, trim=20 45 20 50, width=\linewidth]{pre_volcker_hours.pdf}
Hours worked
\end{minipage}
\vspace{-.5em}

\centering
\begin{minipage}[b]{0.246\linewidth}
\centering \includegraphics[clip, trim=20 45 20 50, width=\linewidth]{pre_volcker_inflation.pdf}
Inflation
\end{minipage}
\begin{minipage}[b]{0.246\linewidth}
\centering \includegraphics[clip, trim=20 45 20 50, width=\linewidth]{pre_volcker_real_wage.pdf}
Real wages
\end{minipage}
\begin{minipage}[b]{0.246\linewidth}
\centering \includegraphics[clip, trim=20 45 20 50, width=\linewidth]{pre_volcker_interest_rate.pdf}
FFR
\end{minipage}\\

\vspace{2em}

\begin{minipage}{1\linewidth}~\\
\centering \textbf{1979Q2 to 2014Q4}
\end{minipage}\\[.5em]

\begin{minipage}[b]{0.246\linewidth}
\centering \includegraphics[clip, trim=20 45 20 50, width=\linewidth]{post_volcker_consumption.pdf}
Consumption
\end{minipage}%
\begin{minipage}[b]{0.246\linewidth}
\centering \includegraphics[clip, trim=20 45 20 50, width=\linewidth]{post_volcker_investment.pdf}
Investment
\end{minipage}
\begin{minipage}[b]{0.246\linewidth}
\centering \includegraphics[clip, trim=20 45 20 50, width=\linewidth]{post_volcker_output.pdf}
Output
\end{minipage}
\begin{minipage}[b]{0.246\linewidth}
\centering \includegraphics[clip, trim=20 45 20 50, width=\linewidth]{post_volcker_hours.pdf}
Hours worked
\end{minipage}

\vspace{.5em}

\centering
\begin{minipage}[b]{0.246\linewidth}
\centering \includegraphics[clip, trim=20 45 20 50, width=\linewidth]{post_volcker_inflation.pdf}
Inflation
\end{minipage}
\begin{minipage}[b]{0.246\linewidth}
\centering \includegraphics[clip, trim=20 45 20 50, width=\linewidth]{post_volcker_real_wage.pdf}
Real wages
\end{minipage}
\begin{minipage}[b]{0.246\linewidth}
\centering \includegraphics[clip, trim=20 45 20 50, width=\linewidth]{post_volcker_interest_rate.pdf}
FFR
\end{minipage}

\caption{Posterior median impulse response functions over two sample splits, namely the pre-Volcker period (1947Q4 to 1979Q1) and the rest of the sample period (1979Q2 to 2014Q4). The coloring of the impulse responses refer to their timing: light yellow stands for the beginning of the sample split, dark red stands for the end of sample split. For reference, 68\% credible intervals over the average of the sample period provided (dotted black lines).}\label{fig:irf_volcker}
\end{figure}

\begin{sidewaysfigure}[p]
\includegraphics[width=\textwidth]{RA.pdf}
\caption{Posterior median responses to a $+100$ bp monetary policy shock, after 4 (top panels), 8 (middle panels) and 12 (bottom panels) quarters. Shaded areas correspond to 90\% (dark red) and 68\% (light red) credible sets.}\label{fig:irf1}
\end{sidewaysfigure}


\begin{sidewaysfigure}[p]
\includegraphics[width=\textwidth]{supply.pdf}
\caption{Posterior median responses to a $+100$ bp monetary policy shock, after 4 (top panels), 8 (middle panels) and 12 (bottom panels) quarters. Shaded areas correspond to 90\% (dark red) and 68\% (light red) credible sets.}\label{fig:irf2}
\end{sidewaysfigure}

In \autoref{fig:irf_volcker} we investigate whether the size and the shape of responses varies between and within the two sub-samples. For that purpose we show median responses over the first sample split in the top row and for the second part of the sample in the bottom row of \autoref{fig:irf_volcker}. Impulse responses that belong to the beginning of a sample split are depicted in light yellow, those that belong to the end of the sample period in dark red. To fix ideas, if the size of a response increases continuously over time we should see a smooth darkening of the corresponding impulse from light yellow to dark red. Considering, for instance, hours worked, this phenomenon can clearly be seen in in the second sample period from 1979Q2 to 2014Q4, where the median response changes gradually from slightly negative to substantially negative. On the other hand, abrupt changes are also clearly visible, see e.g.,\ the drastic change of the inflation response from 1979Q1 (the last quarter in the first sample) to 1979Q2 (the first quarter in the second sampler), dropping from substantially positive to just above zero within one quarter (see also \autoref{fig:irf1}).

Considering the dynamic responses across different angles, we find three regularities which are worth emphasizing. The first concerns the overall effects of the monetary policy shock. Note that an unexpected rate increase deters investment growth, hours worked and consequently overall output growth for both sample splits. These results are reasonable from an economic perspective. Also, estimated effects on output growth and inflation are comparable to those of \citet{Baumeister2013} who use a TVP-VAR framework and US data. Responses of consumption growth tend to be accompanied by wide credible sets. The same applies to inflation and real wages. 

Second, we examine changes in  responses over time for the first sub-period. One of the variables that shows a great deal of variation in magnitudes is the response of inflation. Here, effects become increasingly positive the further one moves from 1947Q4 to 1979Q1  and the shades of the responses turn continuously darker. These results imply a severe ``price puzzle''. While overall credible sets for the sub-sample are wide, positive responses for inflation and thus the price puzzle are estimated over the period from the mid-1960s to the beginning of the 1980s (see also \autoref{fig:irf1}). A similar picture arises when looking at consumption growth. During the first sample split, effects become increasingly more negative, but responses are only precisely estimated for the period from the mid-1960s to the beginning of the 1980s. This might be explained by the fact that the monetary policy driven increase in inflation spurs consumption since saving becomes less attractive. 


Third, we focus on the results over the more recent second sample split from 1979Q2 to 2014Q4. Paul Volcker's fight against inflation had some bearings on overall macroeconomic dynamics in the USA. With the onset of the 1980s, the aforementioned price puzzle starts to disappear (in the sense that   effects are surrounded by wide credible sets and medium responses increasingly negative).  There is also a great deal of time variation evident in other responses, mostly becoming increasingly negative. Put differently, the effectiveness of monetary policy seems to be higher in the more recent sample period than before. This can be seen by effects on hours worked, investment growth and output growth. That the effects of a hypothetical monetary policy shock on output growth are particular strong after the crisis corroborates findings of \citet{Baumeister2013} and \citet{Feldkircher2016}. The latter argue that this is related to the zero lower bound period: after a prolonged period of unaltered interest rates, a deviation from the (long-run) interest rate mean can exert considerable effects on the macroeconomy.
 
 

\section{Closing remarks}
\label{sec:conclusion}
This paper puts forth a novel approach to estimate time-varying parameter models in a Bayesian framework. We assume that the state innovations are following a threshold model where the threshold variable is the absolute period-on-period change of the corresponding states. This implies that if the (proposed) change is sufficiently large, the corresponding variance is set to a value greater than zero. Otherwise, it is set close to zero which implies that the states remained virtually constant from $(t-1)$ to $t$. Our framework is capable of discriminating between a plethora of competing specifications, most notably models that feature many, moderately many, and few structural breaks in the regression parameters

We also propose a generalization of our model to the VAR framework with stochastic volatility. In an application to the US macroeconomy, we examine the usefulness of the TTVP-VAR in terms of forecasting, turning point prediction, and structural impulse response analysis. Our results show that the model yields precise forecasts, especially so during more volatile times such as witnessed in 2008. For that period, the forecast gain over simpler constant parameter models is particularly high. We then proceed by investigating turning point predictions, and observe excellent performance of the TTVP model, in particular for upturn predictions. Finally, we examine impulse responses to a $+100$ basis points contractionary monetary policy shock focusing on two sub-periods of our sample span, the pre-Volcker period and the rest of the sample. Our results reveal significant evidence for a severe price puzzle during episodes of the pre-Volcker period. The positive effect of the rate increase in inflation disappears in the second half of our sample. Modeling changes in responses over the two sub-periods only, such as in a regime switching model, however, would be too simplistic, as we do find also a lot of time variation within each sub-period. For example, we find increasing effectiveness of monetary policy in terms of output, investment growth, and hours worked in the more recent sub-period. This is especially true for the period after the global financial crisis in which the Federal Funds Rate has been tied to zero. For that period, a hypothetical deviation from the zero lower bound would create  pronounced effects on the wider macroeconomy

\section{Acknowledgments}
We sincerely thank the participants of the WU Brown Bag Seminar of the Institute of Statistics and Mathematics, the 3rd Vienna Workshop on High-Dimensional Time Series in Macroeconomics and Finance 2017, the  NBP Workshop on Forecasting 2017, and in particular Sylvia Fr\"uhwirth-Schnatter, for many helpful comments and suggestions that improved the paper significantly.

\singlespacing
\bibliographystyle{./bibtex/econometrica}
