**Abstract:**Hawkes processes are point process models that have been used to capture self-excitatory behaviour in social interactions, neural activity, earthquakes and viral epidemics. They can model the occurrence of the times and locations of events. Here we develop a new class of spatiotemporal Hawkes processes that can capture both triggering and clustering behaviour and we provide an efficient method for performing inference. We use a log-Gaussian Cox process (LGCP) as prior for the background rate of the Hawkes process which gives arbitrary flexibility to capture a wide range of underlying background effects (for infectious diseases these are called endemic effects). The Hawkes process and LGCP are computationally expensive due to the former having a likelihood with quadratic complexity in the number of observations and the latter involving inversion of the precision matrix which is cubic in observations. Here we propose a novel approach to perform MCMC sampling for our Hawkes process with LGCP background, using pre-trained Gaussian Process generators which provide direct and cheap access to samples during inference. We show the efficacy and flexibility of our approach in experiments on simulated data and use our methods to uncover the trends in a dataset of reported crimes in the US.

**Submission Length:**Long submission (more than 12 pages of main content)

**Changes Since Last Submission:**Please find the changes in the new version below: - Two more paragraphs on other models and their connection to ours at the end of section 1. ``Other recent appraoches$\dots$nature of the process)." and "Beyong neural network approaches$\dots$continuous time.'' - One more paragraph in section 2 ``Few neural network models$\dots$background intensity''' - section 3.1: typos were correctedin the first paragraph and in the equation 1, minor text changes, $\lambda\geq 0$ - section 3.2: the whole section is restructured with major differences than before - section 3.2 added more information on theoretical results on hawkes processes ``$g$ can be parametric or it can be estimated,\dots,intensity cannot be constant'' - section 3.2: restructuring of paragraph ``The process with intensity\dots''semi-positive definite'' - section 3.3 fixed notation to be consistent on the used of $\mathbf{s}$ - section 4.1: Algorithm 1: we added more details and fixed typos - section 4.1: we added Algorithm 2 in appendix - section 4.1: we cite Hawkes and Oakes, 1974 as requested - section 4.2: in Equations 9 and 10 added the prior on $a_0$ to agree with Algorithm 1 - esction 5.1: added paragraph ``To further validate our method$\dots$process to the data'' on goodness of fit - section 5.2: new paragraph ``The metrics we use to test$\dots$(here the number of test points)'' to explain the two metrics used, namely RMSE and NNL and their definitions - section 5.2: Figure 4: we added to the Figure an extra row of plots to report the negative normalised log-likelihood - section 5.2: text to explain the new plots ``The NNL results suggest the same conclusion$\dots$Hawkes model'' - section 5.2: four paragraphs to explain what the estimated parameters are under an LGCP-Hawkes model, demonstrating how this model recovers Hawkes, Poisson, LGCP ``Furthermore, what is of interest$\dots$and self-exciting behaviour'' - section 5.3: added the line ``We also tried different priors such as Gamma and Exponential but the convergence of the chains was better when using Truncated Normal distributions.'' - section 5.3: slight change in ``We report an estimate and $90\%$....'' to replace longitude, latitude with y distance and x distance - section 5.3: Revised Table 1: with two new rows to report the normalised negative log-likelihood on training and test data for all models. One new column to introduce the results for the new comparisoin we do with (DeepSTPP) - section 5.3: last line ``The training inference times $\dots$ DeepSTPP" to report the runtime for the gunshot experiment of all five models - throughout the paper we fixed notation to be consistent: eg $\mathbf{s}$ is used to denote space only and it is always bold - we fixed all the typos that were raised by all three reviewers - appendix: introduced new Figure (Figure 9) in appendix A.2 -appendix: introduced the new algorithm (Algorithm 2) in appendiex A.1 - we fixed all the references to be the right style

**Assigned Action Editor:**~Sinead_Williamson1

**License:**Creative Commons Attribution 4.0 International (CC BY 4.0)

**Submission Number:**617

Loading