\newsiamthm

hypothesisHypothesis \newsiamthmclaimClaim \newsiamremarkremarkRemark

Large Deviations of Hawkes Processes on Structured Sparse Random Graphs

D. Avitabile ¹¹1Vrije Universiteit. d.avitabile@vu.nl J. MacLaurin²²2New Jersey Institute of Technology. james.maclaurin@njit.edu

Abstract

We prove a Large Deviation Principle for Hawkes Processes on sparse large disordered networks with a graphon structure. We apply our results to a stochastic $SIS$ epidemiological model on a disordered networks, and determine Euler-Lagrange equations that dictate the most likely transition path between different states of the network.

1 Introduction

We study the dynamics of high-dimensional Hawkes processes on networks with a particular average structure. Hawkes Processes are continuous-time jump-Markov processes whose intensity function is itself stochastic [24, 25, 39]. Applications include high-dimensional spiking neuron models [46, 31, 22], epidemics on structured populations [49, 50, 5], sociological models [54], mathematical finance [35], machine learning [52] and various other population dynamics models. In all of the above applications, a central aim is to understand how the combination of stochasticity and network structure shapes the resulting dynamics [37].

There has been much recent effort directed towards determining deterministic ‘neural field’ equations to describe the large $N$ limiting behavior of spatially-extended Hawkes Processes [27, 22, 2, 23, Baars2024]. In particular, a recent emphasis has been to understand how the graphon determines pattern formation [33, 13, 14] and other coherent structures. This paper builds on these works by determining a Large Deviation Principle. The Large Deviation Principle determines an asymptotic estimate for the probability of a deviation from the large $N$ limiting behavior. It is useful for estimating large-scale transitions of the system induced by rare finite-size fluctuations of the system [17, 55, 34, 45]. Early work on the Large Deviations of neural fields has been done by Kuehn and Riedler [38]. This paper parallels a recent preprint [6] by the authors of this paper, that determines neural field equations, and a Large Deviation Principle, for rate neurons on a compact space.

There exists a well-developed literature for the Large Deviations of stochastic processes driven by Poisson Random Measures [21]. (Note that Hawkes Processes can be represented as a double-time integral with respect to a standard Poisson Random Measure on $(\mathbb{R}^{+})^{2}$ [40].) Most treatments of the Large Deviations of spatially-distributed systems driven by Poisson Random Measures concern PDEs perturbed by a spatially-distributed Poisson Random Measure [20, 19]. Our system differs from these papers because the noise can only occur at the locations of the nodes of the graph. Since the nodes are approximately uniformly distributed throughout the spatial domain, in the large size limit the Large Deviations rate function becomes an integral over $\mathcal{E}\times\mathbb{R}^{+}$ of a Lagrangian function.

On a related note, there has been much recent interest in the Large Deviations of Chemical Reaction Networks [48, 8]. These are high-dimensional jump Markovian Processes, just as in this paper, but without the spatial extension. Recent works include those of Dupuis, Ramanan and Wu [29], Pardoux and Samegni-Kepgnou [47], Agazzi, Eckmann and Dembo [4], and Patterson and Renger [48]. Patterson and Renger [48] and Agazzi, Patterson, Renger and [3] prove the Large Deviations Principle in a general setting by also studying the convergence of the reaction fluxes (like in this paper).

To this end, we consider the large $N$ dynamics of jump-Markov processes on an inhomogeneous network (i.e. a graph, with edges and nodes). The ‘agents’ correspond to the nodes of the graph. The state of the $j^{th}$ agent is written as $\sigma^{j}(t)$ , and it takes on values in a finite state-space $\Gamma$ . For epidemiological applications [18], one typically takes $\Gamma=\{S,I,R\}$ (susceptible, infected, recovered) or just $\Gamma=\{S,I\}$ . For neuroscience applications [32], one might take $\Gamma=\{0,1\}$ (i.e. spiking or non-spiking).

The population is assumed to exist on a disordered static network. We make minimal assumptions on the network; in particular, we do not require that the edges are sampled from a probability distribution. Our main requirement is that the typical number of edges connected to each vertex asymptotes to $\infty$ as $N\to\infty$ , and that the edge connectivity resembles that of a ‘graphon’. It is already known that these conditions ensure that the large $N$ limiting dynamics resembles the all-to-all connectivity case [44, 2]. One way to generate the graphon structure is to sample the graph randomly from a probability distribution known as a W-random graph [12, 11]. Essentially this means that the connections are sampled independently, where the probability of a connection is a function of the locations of the afferent vertices. The formalism is flexibile enough to accommodate a wide range of models, including a power-law model (often considered a paradigm for populations with a clustered social structure), and populations with a geometric spatially-distributed (i.e. the probability of a connection correlates with the geometric distance between the people).

Important cases covered by the $W$ -random-graph formalism include the following:

•

Sparse Power Law Graphs: these were originally defined in the seminal paper by Barabasi and Albert [7], and further developed by Bollobas et al [10]. In the original paper, Barabasi and Albert [7] constructed this graph iteratively, by successively adding vertices, and then connecting them, with the probability of a connection being proportional to the existing degree of the node. It was shown by [11] that one obtains an asymptotically excellent approximation to a power law graph in the $W$ -random graph formalism, as long as one chooses $\mathcal{E}=(0,1]$ , $\{x^{j}_{N}\}_{j\in I_{N}}$ uniformly distributed over $\mathcal{E}$ , and $\mathbb{P}\big{(}J^{jk}=1\big{)}=(1-\beta)^{2}(x^{j}_{N}x^{k}_{N})^{-\beta}$ for some parameters such that $0<\beta<\gamma<1$ .
•

Inhomogeneous Erdos-Renyi Random Graphs. There has recently been considerable interest in neuroscience and ecology for random graphs with distance-dependent connectivity [44]. In this example, one takes $\mathcal{E}=\mathbb{S}^{1}$ (motivation for this lies in the ring-structure of the visual cortex [16]), $\mathcal{J}:\mathbb{S}^{1}\times\mathbb{S}^{1}\to\mathbb{R}^{+}$ any smooth function
•

Small World Graphs were first defined in the seminal paper of Watts and Strogatz [51]. These Graphs are constructed by taking a ring of nearest-neighbor-connected vertices, and then randomly reassigning some edges.

1.1 Notation

The $N$ particles are indexed by $I_{N}=\{1,2,\ldots,N-1,N\}$ . If $\mathcal{E}$ is a Polish Space, let $\mathcal{M}(\mathcal{E})$ denote the space of all Borel measures on $\mathcal{E}$ . Let $\mathcal{P}(\mathcal{E})\subseteq\mathcal{M}(\mathcal{E})$ denote the space of all measures with total mass of one. We endow both of these spaces with the topology of weak convergence: i.e. the topology generated by open sets of the form, for a continuous bounded function $g\in\mathcal{C}(\mathcal{E})$ , $k\in\mathbb{R}$ and $\epsilon>0$ ,

\big{\{}\mu\in\mathcal{M}(\mathcal{E}):\big{|}\mathbb{E}^{\mu}[g]-k\big{|}<% \epsilon\big{\}}.

The Wasserstein distance on $\mathcal{P}(\Gamma\times\mathcal{E})$ is defined to be

(1)

\displaystyle d_{W}(\mu,\nu)=\inf\big{\{}\mathbb{E}^{\zeta}\big{[}\chi\{\sigma% \neq\widetilde{\sigma}\}+d_{\mathcal{E}}(x,\widetilde{x})\big{]}\big{\}},

and the infimum is over all couplings $\zeta$ of $\mu$ and $\nu$ . Write $\mathfrak{B}(\mathcal{E})$ to denote the Borel sigma algebra. Let $D\big{(}[0,T],\mathbb{R}\big{)}$ denote the Skorohod space of all cadlag functions.

Define the following metric $d_{T}$ on $\mathcal{M}\big{(}\mathcal{E}\times[0,T]\big{)}^{\Gamma\times\Gamma}$ :

(2)

\displaystyle d_{T}(\mu,\nu)=\sum_{\alpha,\beta\in\Gamma}\sup_{h}\big{|}% \mathbb{E}^{\mu_{\alpha\mapsto\beta}}[h]-\mathbb{E}^{\nu_{\alpha\mapsto\beta}}% [h]\big{|},

where the supremum is taken over all functions $h$ that are (i) Lipschitz with Lipschitz constant less than or equal to $1$ and (ii) such that $|h|\leq 1$ .

Write

\pi_{T}:\mathcal{M}\big{(}\mathcal{E}\times[0,\infty)\big{)}^{\Gamma\times% \Gamma}\to\mathcal{M}\big{(}\mathcal{E}\times[0,T]\big{)}^{\Gamma\times\Gamma}

to be the projection of a measure onto the measure upto time $T$ . We also naturally consider $d_{T}$ to be a semi-metric on $\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{\Gamma\times\Gamma}$ , as follows

(3)

\displaystyle d_{T}(\mu,\nu):=d_{T}\big{(}\pi_{T}\mu,\pi_{T}\nu\big{)}.

Finally, define

(4)

\displaystyle d(\mu,\nu)=\sum_{j=1}^{\infty}2^{-j}d_{j}(\mu,\nu).

We endow $\mathcal{M}\big{(}\mathcal{E}\times[0,\infty)\big{)}^{\Gamma\times\Gamma}$ with the cylinder topology generated by open sets of the form, for $\mathcal{O}$ open in $\mathcal{M}\big{(}\mathcal{E}\times[0,T]\big{)}^{\Gamma\times\Gamma}$ ,

(5)

\displaystyle\big{\{}\mu\in\mathcal{M}\big{(}\mathcal{E}\times[0,\infty)\big{)% }^{\Gamma\times\Gamma}:\pi_{T}(\mu)\in\mathcal{O}\big{\}}.

2 Model Outline and Main Result

2.1 Geometry of the Connectivity

We make general assumptions on the geometry of the connectivity. These assumptions will hold in a variety of circumstances. The nodes of the graph are assigned positions on a compact smooth Riemannian Manifold $\mathcal{E}$ . We let $\mu_{Rie}\in\mathcal{P}(\mathcal{E})$ denote the (normalized) volume measure on $\mathcal{E}$ . The position of particle $j\in I_{N}:=\{1,2,\ldots,N\}$ is denoted $x^{j}_{N}$ (a non-random constant that depends on $N$ too), and we write $x_{N}=(x^{j}_{N})_{j\in I_{N}}$ . Write the empirical measure of initial conditions to be

(6)

\displaystyle\hat{\mu}^{N}(x)=N^{-1}\sum_{j\in I_{N}}\delta_{x^{j}_{N}}\in% \mathcal{P}(\mathcal{E}).

{hypothesis}

1.

It is assumed that $\hat{\mu}^{N}(x_{N})$ converges weakly to some $\kappa\in\mathcal{P}(\mathcal{E})$ as $N\to\infty$ .
2.

It is assumed that $\kappa$ has a smooth density $\rho$ with respect to $\mu_{Rie}$ .
3.

It is assumed that $\mathcal{E}$ is metrizable, with a metric $d_{\mathcal{E}}$ .
4.

It is assumed that $\mathcal{E}$ is compact.

The strength of connection from particle $j$ to particle $k$ is denoted $J^{jk}\in\mathbb{R}$ . The connectivity can be both excitatory and inhibitory; and symmetric or asymmetric. We assume that the connectivity converges to a ‘graphon’-type structure as $N\to\infty$ [43], although it must be emphasized that we do not per se require that it is sampled from a probability distribution. These assumptions are broadly similar to those made by Lucon [44] in treating the large $N$ limiting dynamics. {hypothesis} (i) We assume that there is a sequence $(\phi_{N})_{N\geq 1}$ that increases to $\infty$ , and a constant $C_{\mathcal{J}}>0$ such that for all $N\geq 1$ ,

(7)		$\displaystyle\sup_{j,k\in I_{N}}\big{\|}J^{jk}\big{\|}$	$\displaystyle\leq C_{\mathcal{J}}$
(8)		$\displaystyle\sup_{j\in I_{N}}\sum_{k\in I_{N}}\chi\{J^{jk}\neq 0\}$	$\displaystyle\leq C_{\mathcal{J}}\phi_{N}.$

(ii) It is assumed that there exists a function $\mathcal{J}\in\mathcal{C}(\mathcal{E}\times\mathcal{E})$ such that

(9)

\displaystyle\lim_{N\to\infty}N^{-1}\sum_{j\in I_{N}}\eta^{j}_{N}=0.

where

(10)

\displaystyle\eta^{j}_{N}=\sup_{\alpha\in\{-1,0,1\}^{N}}\big{|}\sum_{k\in I_{N% }}\big{(}\phi_{N}^{-1}J^{jk}-\mathcal{J}(x^{j}_{N},x^{k}_{N})\big{)}\alpha^{k}% \big{|}.

(iii) Finally, we assume that $\mathcal{J}$ is uniformly Lipschitz in both arguments, i.e. for all $x,y,z\in\mathcal{E}$

(11)		$\displaystyle\big{\|}\mathcal{J}(x,y)-\mathcal{J}(x,z)\big{\|}$	$\displaystyle\leq C_{\mathcal{J}}d_{\mathcal{E}}(y,z)$
(12)		$\displaystyle\big{\|}\mathcal{J}(y,x)-\mathcal{J}(z,x)\big{\|}$	$\displaystyle\leq C_{\mathcal{J}}d_{\mathcal{E}}(y,z)$

We next note that these assumptions are guaranteed to be satisfied if the edges are sampled independently from a distribution whose probability varies continuously over $\mathcal{E}$ .

Lemma 2.1.

Suppose that $\{J^{jk}\}_{j,k\in I_{N}}$ are $\{-1,0,1\}$ -valued random variables, and that they are either (i) mutually independent or (ii) independent, except that $J^{jk}=J^{kj}$ . Suppose also that there are continuous functions $p_{+},p_{-},\mathcal{J}:\mathcal{E}\times\mathcal{E}\to\mathbb{R}$ such that

(13)	$\displaystyle\mathbb{P}\big{(}J^{jk}=1\big{)}$	$\displaystyle=\phi_{N}p_{+}(x^{j}_{N},\theta^{k}_{N})$
(14)	$\displaystyle\mathbb{P}\big{(}J^{jk}=-1\big{)}$	$\displaystyle=\phi_{N}p_{-}(x^{j}_{N},\theta^{k}_{N})$
(15)	$\displaystyle\mathcal{K}(\theta,\alpha)$	$\displaystyle=p_{+}(\theta,\alpha)-p_{-}(\theta,\alpha).$

Suppose that the scaling is such that for any positive constant $c>0$ ,

(16)

\displaystyle\lim_{N\to\infty}N\exp\big{(}-cN\phi_{N}\big{)}<\infty.

Then Hypothesis (9) is satisfied.

A proof is provided in [6].

2.2 Stochastic Transitions

The transitions of the states are taken to be Poissonian and to assume the following mean-field form. For each $\alpha\in\Gamma$ , there must exist a function

(17)

\displaystyle f_{(\alpha)}:\mathcal{E}\times\Gamma\times\mathbb{R}^{\Gamma}\to% \mathbb{R}^{+}

such that for $\alpha\neq\sigma^{j}(t)$ and $h\ll 1$ ,

(18)

\displaystyle\mathbb{P}\big{(}\sigma^{j}(t+h)=\alpha\;|\;\mathcal{F}_{t}\big{)% }=hf_{(\alpha)}\big{(}x^{j}_{N},\sigma^{j}(t),w^{j}(t)\big{)}+O(h^{2}).

Here $w^{j}(t)=\big{(}w^{j}_{\beta}(t)\big{)}_{\beta\in\Gamma}$ and $w^{j}_{\beta}(t)\in\mathbb{R}^{+}$ is such that

(19)

\displaystyle w^{j}_{\beta}(t)=N^{-1}\phi_{N}^{-1}\sum_{k=1}^{N}J^{jk}\chi\{% \sigma^{k}(t)=\beta\}.

For alll $\alpha\in\Gamma$ , we write

(20)

\displaystyle f_{(\alpha)}\big{(}\cdot,\alpha,\cdot\big{)}=0.

Define the empirical occupation measure at time $t$ , $\hat{\nu}^{N}_{t}\in\mathcal{P}(\Gamma\times\mathcal{E})$ to be such that

(21)

\displaystyle\hat{\nu}^{N}_{t}\big{(}\alpha\times A\big{)}=N^{-1}\sum_{j\in I_% {N}}\chi\{x^{j}_{N}\in A,\sigma^{j}(t)=\alpha\}.

The only assumption on the initial conditions $\{\sigma^{j}(0)\}_{j\in I_{N}}$ is that $\hat{\nu}^{N}_{0}$ converges weakly to a limit $\nu_{0}$ , as noted in the following hypothesis. {hypothesis} We assume that there is a measure $\nu_{0}\in\mathcal{P}(\Gamma\times\mathcal{E})$ such that

(22)

\displaystyle\lim_{N\to\infty}d_{W}\big{(}\hat{\nu}^{N}_{0},\nu_{0}\big{)}=0.

It must be emphasized that the initial conditions can be dependent on the specific choice of the connectivity. It is well-known that a Large Deviation Principle for Poisson Random Measures may not be possible if the intensity function hits zero [48, 3]. Hence we make the following assumptions. {hypothesis} Its assumed that $f_{(\alpha)}$ is has strictly positive upper and lower bounds, i.e. there are constants $c_{f},C_{f}>0$ such that for all $\alpha\in\Gamma$ ,

(23)

\displaystyle 0<c_{f}\leq f_{(\alpha)}(\cdot,\cdot,\cdot)\leq C_{f}.

Its also assumed that $f_{(\alpha)}$ is globally Lipschitz, i.e. for all $\alpha,\beta\in\Gamma$ and all $w,\widetilde{w}\in\mathbb{R}^{\Gamma}$ ,

(24)

\displaystyle\big{|}f_{(\alpha)}(\theta,\beta,w)-f_{(\alpha)}(\widetilde{% \theta},\beta,\widetilde{w})\big{|}\leq C_{f}\big{\{}d_{\mathcal{E}}(\theta,% \widetilde{\theta})+\|w-\widetilde{w}\|\big{\}}.

A key object used to study the large $N$ behavior of the system is the empirical reaction flux [48]. The empirical reaction flux $\hat{\mu}^{N}_{\alpha\mapsto\beta}\in\mathcal{M}(\mathcal{E}\times\mathbb{R}^{% +})$ is defined to count the total number of $\alpha\mapsto\beta$ transitions over a specified time interval (and scaled by $N^{-1}$ ), i.e. in the case that $\alpha\neq\beta$ for a measurable subset $A\subset\mathcal{E}$ and $[a,b]\subset\mathbb{R}^{+}$ ,

(25)

\displaystyle\hat{\mu}^{N}_{\alpha\mapsto\beta}\big{(}A\times[a,b]\big{)}=N^{-% 1}\sum_{j\in I_{N}}\sum_{s\in[a,b]}\chi\big{\{}x^{j}_{N}\in A,\sigma^{j}_{s^{-% }}=\alpha,\sigma^{j}_{s}=\beta\big{\}}

We stipulate that $\hat{\mu}^{N}_{\alpha\mapsto\alpha}\big{(}A\times[a,b]\big{)}=0$ always. Finally we denote the empirical process

(26)		$\displaystyle\hat{\nu}^{N}$	$\displaystyle:=\big{(}\hat{\nu}^{N}_{t}\big{)}_{t\geq 0}\in\mathcal{D}\big{(}% \mathbb{R}^{+},\mathcal{P}(\Gamma\times\mathcal{E})\big{)}$
(27)		$\displaystyle\hat{\nu}^{N}_{t}(\alpha\times A)$	$\displaystyle:=N^{-1}\sum_{j\in I_{N}}\chi\big{\{}\sigma^{j}(t)=\alpha,x^{j}_{% N}\in A\big{\}}.$

The joint state space for the empirical reaction flux, and the empirical measure is denoted by

(28)

\displaystyle\mathcal{X}=\mathcal{M}(\mathcal{E}\times\mathbb{R}^{+})^{\Gamma% \times\Gamma}\times\mathcal{D}\big{(}[0,\infty),\mathcal{P}(\Gamma\times% \mathcal{E})\big{)}

2.3 Main Results

We first prove that the empirical reaction flux and the empirical occupation measure both concentrate in the large $N$ limit. (This is basically already known [2, 23]).

Theorem 2.2.

There exists unique $(\mu,\nu)\in\mathcal{X}$ to which the system converges as $N\to\infty$ . Write the density of $\nu_{t}$ to be $\nu_{t}(\alpha,\theta)$ , i.e. for any $A\in\mathfrak{B}(\mathcal{E})$ ,

(29)

\displaystyle\nu_{T}(\alpha\times A):=

\displaystyle\int_{A}\nu_{t}(\alpha,\theta)d\mu_{Rie}(\theta)

The density evolves as

(30)	$\displaystyle\frac{d\nu_{t}(\alpha,\theta)}{dt}=$	$\displaystyle\sum_{\beta\neq\alpha}\big{(}f_{\alpha}(\theta,\beta,w_{t}(\theta% ))\nu_{t}(\beta,\theta)-f_{\beta}(\theta,\alpha,w_{t}(\theta))\nu_{t}(\alpha,% \theta)\big{)}\text{ where }$
(31)	$\displaystyle w_{t}(\theta):=$	$\displaystyle\big{(}w_{t,\alpha}(\theta)\big{)}_{\alpha\in\Gamma}$
(32)	$\displaystyle w_{t,\alpha}(\theta)=$	$\displaystyle\int_{\mathcal{E}}\mathcal{J}(\theta,\zeta)\nu_{t}(\alpha,\zeta)d% \mu_{Rie}(\zeta).$

Furthermore the limits of the empirical reaction fluxes are given by

(33)		$\displaystyle\frac{d\mu_{\alpha\mapsto\beta}}{dt}(A\times[s,t])=$	$\displaystyle\int_{A}f_{\beta}(\theta,\alpha,w_{t}(\theta))\nu_{t}(\alpha,% \theta)\mu_{Rie}(d\theta)\quad\quad t\geq s$
(34)		$\displaystyle\mu_{\alpha\mapsto\beta}(A\times\{s\})=$	$\displaystyle 0.$

We next state the Large Deviation Principle for the empirical reaction fluxes. We first define the rate function:

(35)

\displaystyle\mathcal{G}:\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big% {)}^{\Gamma\times\Gamma}\to\mathbb{R}.

For $\mu\in\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{\Gamma\times\Gamma}$ , we stipulate that

(36)

\displaystyle\mathcal{G}(\mu)=\infty

in the case that for some $\alpha,\beta\in\Gamma$ with $\alpha\neq\beta$ , $\mu_{\alpha\mapsto\beta}$ does not have a density (with respect to $\mu_{Rie}\otimes\mu_{Leb}$ ). If $\mu_{\alpha\mapsto\alpha}$ has nonzero measure for some $\alpha\in\Gamma$ , then we also stipulate that $\mathcal{G}(\mu)=\infty$ . Otherwise, writing $p_{\alpha\mapsto\beta}(x,t)$ to be the density, i.e. the function such that

(37)

\displaystyle\frac{d\mu_{\alpha\mapsto\beta}}{d\mu_{Rie}\otimes d\mu_{Leb}}(x,% t)=p_{\alpha\mapsto\beta}(x,t),

we define

(38)

\displaystyle\mathcal{G}(\mu)=\sum_{\alpha\neq\beta}\int_{\mathcal{E}}\int_{0}% ^{\infty}\rho(x)\ell\big{(}p_{\alpha\mapsto\beta}(x,t)/\lambda_{(\alpha,\beta)% }(x,t)\big{)}\lambda_{(\alpha,\beta)}(x,t)dtd\mu_{Rie}(x)

Here,

(39)		$\displaystyle\ell(a)=$	$\displaystyle a\log a-a+1$
(40)		$\displaystyle\lambda_{(\alpha,\beta)}(x,t)=$	$\displaystyle f_{\beta}(x,\alpha,w(x,t))\nu_{t}(\alpha,x)$
(41)		$\displaystyle w(x,t)=$	$\displaystyle\big{(}w_{\zeta}(x,t)\big{)}_{\zeta\in\Gamma}$
	$\displaystyle w_{\zeta}(x,t)=$	$\displaystyle\mathbb{E}^{(\alpha,\iota)\sim\nu_{0}}\big{[}\chi\{\zeta=\alpha\}% \mathcal{J}(x,\iota)\big{]}$
(42)			$\displaystyle+\sum_{\alpha\in\Gamma}\int_{\mathcal{E}}\int_{0}^{t}\mathcal{J}(% x,y)\big{\{}p_{\alpha\mapsto\zeta}(y,s)-p_{\zeta\mapsto\alpha}(y,s)\big{\}}ds% \mu_{Rie}(dy)$
(43)		$\displaystyle\nu_{t}(\alpha,x)=$	$\displaystyle\nu_{0}(\alpha,x)+\sum_{\beta\neq\alpha}\int_{0}^{t}\big{(}p_{% \beta\mapsto\alpha}(x,s)-p_{\alpha\mapsto\beta}(x,s)\big{)}ds.$

The main theoretical result of this paper is the following theorem.

Theorem 2.3.

Let $\mathcal{A},\mathcal{O}\subseteq\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^% {+}\big{)}^{\Gamma\times\Gamma}$ be (respectively) closed and open. Then

(44)		$\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}% \hat{\mu}^{N}\in\mathcal{A}\big{)}$	$\displaystyle\leq-\inf_{\mu\in\mathcal{A}}\mathcal{G}(\mu)$
(45)		$\displaystyle\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(% }\hat{\mu}^{N}\in\mathcal{O}\big{)}$	$\displaystyle\geq-\inf_{\mu\in\mathcal{O}}\mathcal{G}(\mu).$

Furthermore, $\mathcal{G}$ is lower semicontinuous and has compact level sets.

We next note that the LDP holds for arbitrary stopping times.

Corollary 2.4.

Suppose that the dynamics is identical to the previous section, except that now the initial conditions are arbitrary constants $\sigma(0):=\widetilde{\sigma}\in\Gamma^{N}$ (and we don’t require Hypothesis 2.2 either). Let $\tau_{N}$ be any stopping time such that there is a positive sequence $(\epsilon_{N})_{N\geq 1}$ such that $\lim_{N\to\infty}\epsilon_{N}=0$ and a measure $\kappa\in\mathcal{P}(\Gamma\times\mathcal{E})$ such that with unit probability

(46)

\displaystyle d_{W}\big{(}\hat{\nu}^{N}_{\tau_{N}},\kappa\big{)}\leq\epsilon_{% N}.

Then Theorem 2.2 and Theorem 2.3 hold true if we take as initial conditions $\sigma^{j}_{0}=\widetilde{\sigma}^{j}_{\tau_{N}}$ . (We emphasize in particular that $\tau_{N}$ could diverge to $\infty$ as $N\to\infty$ ).

2.4 Contracted Rate Function

In this section, we determine the structure of the Large Deviation rate function for the empirical occupation measure $\hat{\nu}^{N}$ only. One easily checks that the empirical occupation measures can be obtained by applying a continuous transformation to the empirical reaction fluxes. This is noted in the following Lemma.

Lemma 2.5.

For $t>0$ and $\nu_{0}\in\mathcal{P}(\Gamma\times\mathcal{E})$ , define $\Psi_{\nu_{0},t}:\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{% \Gamma\times\Gamma}\to\mathcal{P}(\Gamma\times\mathcal{E})$ to be such that, writing $\Phi_{\nu_{0},t}(\mu)=\kappa_{T}$ , for any $A\in\mathfrak{B}(\mathcal{E})$ , and any $\alpha\in\Gamma$ ,

(47)

\displaystyle\nu_{t}(\alpha\times A)=\nu_{0}(\alpha\times A)+\sum_{\beta\neq% \alpha}\bigg{(}\mu_{\beta\mapsto\alpha}(A\times[0,t])-\mu_{\alpha\mapsto\beta}% (A\times[0,t])\bigg{)}

For each $t>0$ and $\nu_{0}\in\mathcal{P}(\Gamma\times\mathcal{E})$ , $\Psi_{\nu_{0},t}$ is continuous. Also $\nu_{0}\mapsto\Psi_{\nu_{0},t}(\mu)$ is continuous for any $\mu\in\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{\Gamma\times\Gamma}$ . Write $\Psi_{\nu_{0}}(\cdot):=(\Psi_{\nu_{0},t}(\cdot))_{t\geq 0}\in D\big{(}[0,% \infty),\mathcal{P}(\Gamma\times\mathcal{E})\big{)}$ . Furthermore with unit probability, for all $t>0$ ,

(48)

\displaystyle\Psi_{\hat{\nu}^{N}_{0},t}\big{(}\hat{\mu}^{N}\big{)}

\displaystyle=\hat{\nu}_{t}^{N}.

The proof of Lemma 2.5 follows almost immediately from the definitions and is neglected.

We can now define the contracted rate function:

(49)		$\displaystyle\mathcal{H}$	$\displaystyle:\mathcal{D}\big{(}[0,\infty),\mathcal{P}(\Gamma\times\mathcal{E}% )\big{)}\to\mathbb{R}^{+}$
(50)		$\displaystyle\mathcal{H}(\nu)$	$\displaystyle=\inf\big{\{}\mathcal{G}(\mu):\Psi_{\nu_{0}}(\mu)=\nu\text{ and }% \mu\in\mathcal{M}(\mathcal{E}\times\mathbb{R}^{+})^{\Gamma\times\Gamma}\big{\}}.$

and we recall that $\nu_{0}$ is the limit of the empirical occupation measure at time $0$ .

Corollary 2.6.

Let $\mathcal{A},\mathcal{O}\subseteq\mathcal{D}\big{(}[0,\infty),\mathcal{P}(% \Gamma\times\mathcal{E})\big{)}$ be (respectively) closed and open. Then

(51)		$\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}% \hat{\nu}^{N}\in\mathcal{A}\big{)}$	$\displaystyle\leq-\inf_{\nu\in\mathcal{A}}\mathcal{H}(\nu)$
(52)		$\displaystyle\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(% }\hat{\nu}^{N}\in\mathcal{O}\big{)}$	$\displaystyle\geq-\inf_{\nu\in\mathcal{O}}\mathcal{H}(\nu).$

Furthermore, $\mathcal{H}$ is lower semicontinuous and has compact level sets.

Proof 2.7.

Since $\Psi$ is continuous, this follows from an application of the Contraction Principle [28] to Theorem 2.3.

We desire a more workable definition of $\mathcal{H}$ . To this end, write

(53)

\displaystyle\mathcal{U}\subseteq\mathcal{D}\big{(}[0,\infty),\mathcal{P}(% \Gamma\times\mathcal{E})\big{)}

to consist of all $(\nu_{t})_{t\geq 0}$ such that (i) $t\mapsto\nu_{t}$ is continuous and (ii) there exist functions $\{r_{t,\eta}\}_{\eta\in\Gamma}\subset L^{1}(\mathcal{E}\times[0,\infty))$ (the set of all functions that are integrable with respect to $\mu_{Rie}\otimes\mu_{Leb}$ upto finite times) such that for all $A\in\mathfrak{B}(\mathcal{E})$ ,

(54)		$\displaystyle\nu_{t}(\eta\times A)=$	$\displaystyle\int_{0}^{t}\int_{A}r_{s,\eta}(\theta)d\mu_{Rie}(\theta)d\mu_{Leb% }(s)\text{ and }$
(55)		$\displaystyle\sum_{\eta\in\Gamma}\nu_{t}(\eta\times A)=$	$\displaystyle\kappa(A).$

For any $r\in L^{1}\big{(}\mathcal{E}\times[0,\infty)\big{)}^{\Gamma}$ , define the set

(56)

\displaystyle\mathcal{Q}_{t}(r)=\bigg{\{}\big{(}q_{\alpha\mapsto\beta}\big{)}_% {\alpha,\beta\in\Gamma}\in L^{1}(\mathcal{E})^{\Gamma\times\Gamma}\;:\;r_{t,% \zeta}=\sum_{\alpha\in\Gamma,\alpha\neq\zeta}\big{(}q_{\alpha\mapsto\zeta}-q_{% \zeta\mapsto\alpha}\big{)}\bigg{\}}

and define the function $L:L^{1}(\mathcal{E})^{\Gamma}\times L^{1}(\mathcal{E})^{\Gamma}\mapsto\mathbb{R}$ to be

(57)

\displaystyle L_{t}(r,w)=\inf\bigg{\{}\sum_{\alpha\neq\beta}\int_{\mathcal{E}}% \lambda_{\beta}(x,\alpha,w(x))\ell\big{(}q_{\alpha\mapsto\beta}(x)/\lambda_{% \beta}(x,\alpha,w(x))\big{)}d\mu_{Rie}(x)\;:\;q\in\mathcal{Q}_{t}(r)\bigg{\}}

Lemma 2.8.

If $\nu\notin\mathcal{U}$ , then

(58)

\displaystyle\mathcal{H}(\nu)=\infty.

Otherwise

(59)	$\displaystyle\mathcal{H}(\nu)=$	$\displaystyle\int_{0}^{\infty}L_{t}(r_{t},w_{t})dt\text{ where }$
(60)	$\displaystyle w_{t}(x)=$	$\displaystyle\big{(}w_{\zeta}(x,t)\big{)}_{\zeta\in\Gamma}$
(61)	$\displaystyle w_{t,\zeta}(x)=$	$\displaystyle\mathbb{E}^{(\alpha,\iota)\sim\nu_{0}}\big{[}\chi\{\zeta=\alpha\}% \mathcal{J}(x,\iota)\big{]}+\int_{\mathcal{E}}\mathcal{J}(x,y)r_{t,\zeta}(y)% \mu_{Rie}(dy).$

Furthermore, $\mathcal{H}$ is strictly convex in its first argument.

3 An Application: Transition Paths For Hawkes Models in Epidemiology and Neuroscience

We consider a simple stochastic SIS model for a structured population. There is certainly much work that as determined the large $N$ limiting dynamics for these models [18]. However to the knowledge of these authors, there does not exist a spatially-extended large Deviation Principle in the manner of this paper. The computation of optimal transition paths for spatially extended systems has become of increasing interest in recent years [9].

We first outline a model of $N\gg 1$ people on a structured network. The nodes of the network reside in a domain $\mathcal{E}=\mathbb{S}^{1}$ . The position of the $j^{th}$ person is $x^{j}=2\pi j/N$ , and their state is $\sigma^{j}(t)\in\{S,I\}$ (i.e. susceptible or infected). The probability of a positive connection from $\alpha\mapsto\theta$ is $\mathcal{J}(\theta,\alpha)$ : i.e. $\mathbb{P}\big{(}K^{jk}=1\big{)}=\phi_{N}\mathcal{J}(x^{j}_{N},x^{k}_{N})$ , and $\mathbb{P}\big{(}K^{jk}=0\big{)}=1-\phi_{N}\mathcal{J}(x^{j}_{N},x^{k}_{N})$ . We assume symmetric connections, so that $\mathcal{J}(\theta,\alpha)=\mathcal{J}(\alpha,\theta)$ . We also assume that $\mathcal{J}$ is piecewise continuous.

The probability that a susceptible person transitions to being infected over the time interval $[t,t+h]$ (For $h\ll 1$ ) is, for a positive parameter $\beta$ ,

(62)

\displaystyle h\beta W^{j}(t)+O(h^{2})\text{ where }W^{j}(t)=N^{-1}\sum_{k\in I% _{N}}J^{jk}\chi\{\sigma^{k}(t)=I\}

The probability that an infected person transitions back to being susceptible over a time interval $[t,t+h]$ is constant, i.e. for some $\alpha>0$ it is

(63)

\displaystyle\alpha h+O(h^{2}).

Write $s(\theta,t)\in[0,1]$ to represent the proportion of susceptible people at position $\theta\in\mathcal{E}$ at time $t$ in the large $N$ limit, and let $i(\theta,t)\in[0,1]$ be the proportion of infected people. Since these are the only two possibilities, it must be that

(64)

\displaystyle s(\theta,t)+i(\theta,t)=1.

Lets first write out the large $N$ limiting dynamics. This is non-stochastic, and such that

(65)

\displaystyle\frac{ds(t,\theta)}{dt}=-\beta s(t,\theta)\int_{\mathcal{E}}% \mathcal{J}(\theta,\widetilde{\theta})(1-s(t,\widetilde{\theta}))d\widetilde{% \theta}+\alpha(1-s(t,\theta))

The Large Deviations Rate function governing the proportion of susceptible people is $\mathcal{H}_{T}:H^{1}\big{(}\mathcal{E}\times[0,T],\mathbb{R}^{+}\big{)}\to% \mathbb{R}$ assumes the following form. The time derivative of $s$ is written as $\dot{s}(\theta,t)$ .

The rate function assumes the form

(66)

\displaystyle\mathcal{H}_{T}(s)=\int_{0}^{T}\int_{\mathcal{E}}L_{\theta}(\dot{% s}(t,\theta),s(t))dtd\theta

where for any $\theta\in\mathcal{E}$ , we define

L_{\theta}:\mathbb{R}\times\mathcal{C}(\mathcal{E})\to\mathbb{R}

is defined to be such that

	$\displaystyle L_{\theta}(\dot{s},s)=$	$\displaystyle\inf\big{\{}\ell\big{(}a/\lambda_{\theta}(s)\big{)}\lambda_{% \theta}(s)+\alpha\ell\big{(}b/\{\alpha(1-s(\theta))\}\big{)}(1-s(\theta))$
(67)			$\displaystyle\text{ where }a,b\geq 0\text{ and }\dot{s}=b-a\big{\}}$
(68)		$\displaystyle\lambda_{\theta}:$	$\displaystyle\mathcal{C}(\mathcal{E})\to\mathbb{R}\text{ is such that }$
(69)		$\displaystyle\lambda_{\theta}(s)=$	$\displaystyle\beta s(\theta)\int_{\mathcal{E}}\mathcal{J}(\theta,\widetilde{% \theta})\big{(}1-s(\widetilde{\theta})\big{)}d\widetilde{\theta}\text{ and }$
(70)		$\displaystyle\ell(x)=$	$\displaystyle x\log x-x+1.$

We note that $L_{\theta}(\dot{s},s)$ is uniquely minimized for the large $N$ limiting dynamics, i.e. $L_{\theta}(\dot{s},s)=0$ if and only if

\dot{s}=-\beta s(t,\theta)\int_{\mathcal{E}}\mathcal{J}(\theta,\widetilde{% \theta})(1-s(t,\widetilde{\theta}))d\widetilde{\theta}+\alpha(1-s(t,\theta)).

In computing the Large Deviations rate function for trajectories that differ from the above, we are trying to understand the relative likelihood of rare noise-induced events that differ from the above dynamics.

It turns out that the infimum in (67) is uniquely realized. We note this in the following lemma.

Lemma 3.1.

For $s\in\mathcal{C}(\mathcal{E})$ and $\theta\in\mathcal{E}$ , and any $\dot{s}\in\mathbb{R}$ ,

(71)

\displaystyle L_{\theta}(\dot{s},s)=\alpha(1-s(\theta))\ell\bigg{(}\frac{% \lambda_{\theta}(s)}{A_{\theta}(\dot{s},s)}\bigg{)}+\lambda_{\theta}(s)\ell% \bigg{(}\frac{A_{\theta}(\dot{s},s)}{\lambda_{\theta}(s)}\bigg{)},

where $A_{\theta}:\mathbb{R}\times\mathcal{C}(\mathcal{E})\to\mathbb{R}$ is such that

(72)

\displaystyle A_{\theta}(\dot{s},s)=\frac{1}{2}\bigg{(}-\dot{s}+\big{(}\dot{s}% ^{2}+4\alpha\lambda_{\theta}(s)(1-s(\theta))\big{)}^{1/2}\bigg{)}

Proof 3.2.

Fixing $\dot{s},s$ , this is effectively a 1d optimization problem (fixing $b=\dot{s}+a$ ) of the function

\widetilde{L}(a):=\ell\big{(}a/\lambda_{\theta}(s)\big{)}\lambda_{\theta}(s)+% \alpha\ell\big{(}(\dot{s}+a)/\{\alpha(1-s(\theta))\}\big{)}(1-s(\theta)),

with domain $a\geq\max\big{\{}0,-\dot{s}\big{\}}$ . Since $\ell$ is convex, it must be that $a\mapsto\widetilde{L}(a)$ is convex, and the infimum must occur at points such that $\partial_{a}\widetilde{L}_{a}=0$ . We differentiate and find that the optimal $a$ must be such that

(73)

\displaystyle\log\bigg{(}\frac{a}{\lambda_{\theta}(s)}\bigg{)}+\log\bigg{(}% \frac{\dot{s}(\theta)+a}{\alpha(1-s(\theta))}\bigg{)}=0.

This means that

(74)

\displaystyle a(\dot{s}(\theta)+a)=\alpha\lambda_{\theta}(s)(1-s(\theta))

and therefore

(75)

\displaystyle a^{2}+a\dot{s}(\theta)-\alpha\lambda_{\theta}(s)(1-s(\theta))=0.

Since $a\geq 0$ , the only valid root is

(76)

\displaystyle a=\frac{1}{2}\bigg{(}-\dot{s}(\theta)+\big{(}\dot{s}(\theta)^{2}% +4\alpha\lambda_{\theta}(s)(1-s(\theta))\big{)}^{1/2}\bigg{)}

Lemma 3.3.

For every $\theta\in\mathcal{E}$ and $s\in\mathcal{C}(\mathcal{E})$ , the function $\dot{s}\mapsto L_{\theta}(\dot{s},s)$ is strictly convex.

Proof 3.4.

First, it is proved in Lemma 3.1 that the infimum in (67) is always realized at a unique $a\geq 0$ . Consider $\dot{s}_{1},\dot{s}_{2}\in\mathbb{R}$ and suppose that for some $\zeta\in[0,1]$ , $\dot{s}=\zeta\dot{s}_{1}+(1-\zeta)\dot{s}_{2}$ . Let $a_{1},a_{2}\geq 0$ be the (respective) values of $a$ that realize the infimum, i.e. they are such that

(77)	$\displaystyle L_{\theta}(\dot{s}_{1},s)=$	$\displaystyle\ell\big{(}a_{1}/\lambda_{\theta}(s)\big{)}\lambda_{\theta}(s)+% \alpha\ell\big{(}(\dot{s}_{1}+a_{1})/\{\alpha(1-s(\theta))\}\big{)}(1-s(\theta))$
(78)	$\displaystyle L_{\theta}(\dot{s}_{2},s)=$	$\displaystyle\ell\big{(}a_{2}/\lambda_{\theta}(s)\big{)}\lambda_{\theta}(s)+% \alpha\ell\big{(}(\dot{s}_{2}+a_{2})/\{\alpha(1-s(\theta))\}\big{)}(1-s(\theta))$
(79)	$\displaystyle\dot{s}_{1}+a_{1}\geq$	$\displaystyle 0$
(80)	$\displaystyle\dot{s}_{2}+a_{2}\geq$	$\displaystyle 0.$

Write $a=\zeta a_{1}+(1-\zeta)a_{2}$ , and notice that $\dot{s}+a\geq 0$ . If we substitute $a$ into the RHS of (67), then since the function $\ell$ is strictly convex,

	$\displaystyle L_{\theta}(\dot{s},s)\leq$	$\displaystyle\ell\big{(}a/\lambda_{\theta}(s)\big{)}\lambda_{\theta}(s)+\alpha% \ell\big{(}(\dot{s}+a)/\{\alpha(1-s(\theta))\}\big{)}(1-s(\theta))$
	$\displaystyle<$	$\displaystyle\zeta\ell\big{(}a_{1}/\lambda_{\theta}(s)\big{)}\lambda_{\theta}(% s)+(1-\zeta)\ell\big{(}a_{2}/\lambda_{\theta}(s)\big{)}\lambda_{\theta}(s)$
		$\displaystyle\zeta\alpha\ell\big{(}(\dot{s}_{1}+a_{1})/\{\alpha(1-s(\theta))\}% \big{)}(1-s(\theta))$
		$\displaystyle+(1-\zeta)\alpha\ell\big{(}(\dot{s}_{2}+a_{2})/\{\alpha(1-s(% \theta))\}\big{)}(1-s(\theta))$
(81)		$\displaystyle=$	$\displaystyle\zeta L_{\theta}(\dot{s}_{1},s)+(1-\zeta)L_{\theta}(\dot{s}_{2},s).$

3.1 Euler-Lagrange Equations for the Optimal Trajectory

Fix an initial distribution of population $\bar{s}_{0}\in\mathcal{C}(\mathcal{E})$ and a final population distribution $\bar{s}_{T}\in\mathcal{C}(\mathcal{E})$ . Assume that

(82)		$\displaystyle\inf_{\theta\in\mathcal{E}}\{\bar{s}_{0}(\theta),\bar{s}_{T}(% \theta)\}$	$\displaystyle>0$
(83)		$\displaystyle\sup_{\theta\in\mathcal{E}}\{\bar{s}_{0}(\theta),\bar{s}_{T}(% \theta)\}$	$\displaystyle<1.$

Our main result in this section is that any optimal trajectory must satisfy the following Euler-Lagrange equations. Unfortunately, in general there will not be a unique solution to these equations. See for instance [36, 55] for more details on how to compute the optimal path numerically.

Theorem 3.5.

Suppose that $s\in\mathcal{C}([0,T],\mathcal{C}(\mathcal{E}))$ is such that

(84)

\displaystyle\mathcal{H}_{T}(s)=\inf\big{\{}\mathcal{H}_{T}(u)\;:\;u_{0}=\bar{% s}_{0}\text{ and }u_{T}=\bar{s}_{T}\big{\}}

and for each $\theta\in\mathcal{E}$ , $t\mapsto s_{t}(\theta)$ is twice continuously differentiable, with first and second derivatives written (respectively) as $\dot{s}_{t}(\theta)$ and $\ddot{s}_{t}(\theta)$ . Any minimizer must satisfy the second-order integro-differential equation, for all $\theta\in\mathcal{E}$ and $t\in[0,T]$ ,

(85)

\displaystyle\ddot{s}_{t}(\theta)\frac{\partial^{2}L_{\theta}}{\partial\dot{s}% ^{2}}(\dot{s},s)+\dot{s}_{t}(\theta)\mathcal{O}_{\theta}(\dot{s}_{t},s_{t})=% \mathcal{G}_{\theta}(\dot{s}_{t},s_{t})

and $\mathcal{G}_{\theta},\mathcal{O}_{\theta}:\mathcal{C}(\mathcal{E})\times% \mathcal{C}(\mathcal{E})\mapsto\mathbb{R}$ are bounded nonlocal smooth operators defined in the course of the proof. Furthermore (since $L_{\theta}$ is convex in its first argument)

(86)

\displaystyle\frac{\partial^{2}L_{\theta}}{\partial\dot{s}^{2}}(\dot{s},s)>0.

Lemma 3.6.

There is a unique $s$ satisfying (84). The optimal trajectory is such that at each $\theta\in\mathcal{E}$ ,

(87)

\displaystyle\frac{d}{dt}\frac{\partial L_{\theta}}{\partial\dot{s}}(\dot{s}(t% ,\theta),s)=\mathcal{G}_{\theta}(\dot{s},s)

where

(88)

\displaystyle\mathcal{G}_{\theta}:

\displaystyle\mathcal{C}(\mathcal{E})\times\mathcal{C}(\mathcal{E})\to\mathbb{R}

is defined to be such that for any $x\in L^{2}(\mathcal{E})$ ,

(89)

\displaystyle\lim_{\epsilon\to 0^{+}}\epsilon^{-1}\int_{\mathcal{E}}\big{(}L_{% \theta}(\dot{s}(\theta),s+\epsilon x)-L_{\theta}(\dot{s}(\theta),s)\big{)}d% \theta=\int_{\mathcal{E}}x(\theta)\mathcal{G}_{\theta}(\dot{s},s)d\theta.

Proof 3.7.

The fact that the infimum is realized follows from the fact that $\mathcal{H}_{T}$ is lower semi-continuous. The identity in (87) is a standard result from Calculus of Variations.

We now compute an expression for $\mathcal{G}_{\theta}$ . To this end, let $D\lambda_{\theta}(s)\cdot x$ be the Frechet Derivative of $\lambda_{\theta}$ in the direction $x\in L^{2}(\mathcal{E})$ , i.e.

(90)

\displaystyle D\lambda_{\theta}(s)\cdot x=x(\theta)\beta\int_{\mathcal{E}}% \mathcal{J}(\theta,\widetilde{\theta})(1-s(\widetilde{\theta}))d\widetilde{% \theta}-\beta s(\theta)\int_{\mathcal{E}}\mathcal{J}(\theta,\widetilde{\theta}% )x(\widetilde{\theta})d\widetilde{\theta}.

and let $DA_{\theta}(\dot{s},s)\cdot x$ be the Frechet Derivative of $A_{\theta}$ in the direction $x\in L^{2}(\mathcal{E})$ , i.e.

(91)

\displaystyle DA_{\theta}(\dot{s},s)\cdot x=\alpha\big{(}\dot{s}^{2}+4\alpha% \lambda_{\theta}(s)(1-s(\theta))\big{)}^{-1/2}\big{(}-\lambda_{\theta}(s)x(% \theta)+(1-s(\theta))D\lambda_{\theta}(s)\cdot x\big{)}

We next compute the partial derivatives with respect to $\dot{s}$ .

Lemma 3.8.

(92)		$\displaystyle\frac{\partial L_{\theta}}{\partial\dot{s}}(\dot{s},s)=$	$\displaystyle-\frac{\partial A_{\theta}}{\partial\dot{s}}(\dot{s},s)\log\bigg{% (}\frac{\lambda_{\theta}(s)}{A_{\theta}(\dot{s},s)}\bigg{)}\bigg{\{}1+\frac{% \alpha(1-s(\theta))\lambda_{\theta}(s)}{A_{\theta}(\dot{s},s)^{2}}\bigg{\}}$
	$\displaystyle\frac{\partial^{2}L_{\theta}}{\partial\dot{s}^{2}}(\dot{s},s)=$	$\displaystyle-\frac{\partial^{2}A_{\theta}}{\partial\dot{s}^{2}}(\dot{s},s)% \log\bigg{(}\frac{\lambda_{\theta}(s)}{A_{\theta}(\dot{s},s)}\bigg{)}\bigg{\{}% 1+\frac{\alpha(1-s(\theta))\lambda_{\theta}(s)}{A_{\theta}(\dot{s},s)^{2}}% \bigg{\}}$
(93)			$\displaystyle+\bigg{(}\frac{\partial A_{\theta}}{\partial\dot{s}}(\dot{s},s)% \bigg{)}^{2}\bigg{\{}A_{\theta}(\dot{s},s)^{-1}+\frac{2\alpha(1-s(\theta))% \lambda_{\theta}(s)}{A_{\theta}(\dot{s},s)^{3}}\log\bigg{(}\frac{\lambda_{% \theta}(s)}{A_{\theta}(\dot{s},s)}\bigg{)}+\frac{\alpha(1-s(\theta))\lambda_{% \theta}(s)}{A_{\theta}(\dot{s},s)^{3}}\bigg{\}}$
(94)		$\displaystyle\frac{\partial A_{\theta}}{\partial\dot{s}}(\dot{s},s)=$	$\displaystyle-\frac{1}{2}+\frac{\dot{s}}{2}\big{(}\dot{s}^{2}+4\alpha\lambda_{% \theta}(s)(1-s(\theta))\big{)}^{-1/2}$
(95)		$\displaystyle\frac{\partial^{2}A_{\theta}}{\partial\dot{s}^{2}}(\dot{s},s)=$	$\displaystyle\frac{1}{2}\big{(}\dot{s}^{2}+4\alpha\lambda_{\theta}(s)(1-s(% \theta))\big{)}^{-1/2}-\frac{\dot{s}^{2}}{2}\big{(}\dot{s}^{2}+4\alpha\lambda_% {\theta}(s)(1-s(\theta))\big{)}^{-3/2}$

Lemma 3.9.

(96)

\mathcal{G}_{\theta}(\dot{s},s)=\mathcal{N}_{\theta}(\dot{s},s)+\beta\mathcal{% M}_{\theta}(s)\int_{\mathcal{E}}\mathcal{J}(\theta,\widetilde{\theta})(1-s(% \widetilde{\theta}))d\widetilde{\theta}-\beta\int_{\mathcal{E}}\mathcal{M}_{% \widetilde{\theta}}(s)\mathcal{J}(\widetilde{\theta},\theta)s(\widetilde{% \theta})d\widetilde{\theta}

where $\mathcal{M}_{\theta},\mathcal{N}_{\theta}:\mathcal{C}(\mathcal{E})\times% \mathcal{C}(\mathcal{E})\to\mathbb{R}$ are such that

(97)

\mathcal{M}_{\theta}(\dot{s},s)=\ell\bigg{(}\frac{A_{\theta}(\dot{s}(\theta),s% )}{\lambda_{\theta}(s)}\bigg{)}+\bigg{(}\frac{\alpha(1-s(\theta))}{A_{\theta}(% \dot{s},s)}+\frac{A_{\theta}(\dot{s},s)}{\lambda_{\theta}(s)}\bigg{)}\log\bigg% {(}\frac{\lambda_{\theta}(s)}{A_{\theta}(\dot{s},s)}\bigg{)}\\ +\alpha\big{(}1-s(\theta)\big{)}\big{(}\dot{s}(\theta)^{2}+4\alpha\lambda_{% \theta}(s)(1-s(\theta))\big{)}^{-1/2}\log\bigg{(}\frac{\lambda_{\theta}(s)}{A_% {\theta}(\dot{s},s)}\bigg{)}\times\bigg{\{}-\alpha(1-s(\theta))\frac{\lambda_{% \theta}(s)}{A_{\theta}(\dot{s}(\theta),s)^{2}}-1\bigg{\}}

and

(98)

\mathcal{N}_{\theta}(\dot{s},s)=-\alpha\ell\bigg{(}\frac{\lambda_{\theta}(s)}{% A_{\theta}(\dot{s},s)}\bigg{)}\\ +\alpha\lambda_{\theta}(s)\big{(}\dot{s}^{2}+4\alpha\lambda_{\theta}(s)(1-s(% \theta))\big{)}^{-1/2}\log\bigg{(}\frac{\lambda_{\theta}(s)}{A_{\theta}(\dot{s% },s)}\bigg{)}\bigg{\{}\alpha(1-s(\theta))\frac{\lambda_{\theta}(s)}{A_{\theta}% (\dot{s}(\theta),s)^{2}}+1\bigg{\}}

Proof 3.10.

We compute that for any particular $\theta\in\mathcal{E}$ ,

(99)

\lim_{\epsilon\to 0^{+}}\epsilon^{-1}\big{(}L_{\theta}(\dot{s},s+\epsilon x)-L% _{\theta}(\dot{s},s)\big{)}=-\alpha x(\theta)\ell\bigg{(}\frac{\lambda_{\theta% }(s)}{A_{\theta}(\dot{s},s)}\bigg{)}+D\lambda_{\theta}(s)\cdot x\ell\bigg{(}% \frac{A_{\theta}(\dot{s},s)}{\lambda_{\theta}(s)}\bigg{)}\\ +\alpha(1-s(\theta))\log\bigg{(}\frac{\lambda_{\theta}(s)}{A_{\theta}(\dot{s},% s)}\bigg{)}\bigg{(}\frac{A_{\theta}(\dot{s},s)D\lambda_{\theta}(s)\cdot x-% \lambda_{\theta}(s)DA_{\theta}(\dot{s},s)\cdot x}{A_{\theta}(\dot{s},s)^{2}}% \bigg{)}\\ +\lambda_{\theta}(s)\log\bigg{(}\frac{A_{\theta}(\dot{s},s)}{\lambda_{\theta}(% s)}\bigg{)}\bigg{(}\frac{\lambda_{\theta}(s)DA_{\theta}(\dot{s},s)\cdot x-A_{% \theta}(\dot{s},s)D\lambda_{\theta}(s)\cdot x}{\lambda_{\theta}(s)^{2}}\bigg{)% }\\ :=\mathcal{M}_{\theta}(\dot{s},s)D\lambda_{\theta}(s)\cdot x+x(\theta)\mathcal% {N}_{\theta}(\dot{s},s),

where $\mathcal{M}_{\theta}(\dot{s},s)$ is defined in (97) and $\mathcal{N}_{\theta}(\dot{s},s)$ is defined in (98).

Differentiating, we find that

(100)	$\displaystyle\frac{d}{dt}\frac{\partial L_{\theta}}{\partial\dot{s}}(\dot{s}(t% ,\theta),s)=$	$\displaystyle\frac{\partial^{2}L_{\theta}}{\partial\dot{s}^{2}}(\dot{s}(t,% \theta),s)\ddot{s}(t,\theta)+\mathcal{O}_{\theta}(\dot{s},s)\text{ where }$
(101)	$\displaystyle\mathcal{O}_{\theta}:$	$\displaystyle\mathcal{C}(\mathcal{E})\times\mathcal{C}(\mathcal{E})\mapsto% \mathbb{R}\text{ is such that }$
(102)	$\displaystyle\mathcal{O}_{\theta}(\dot{s},s):=$	$\displaystyle\lim_{\epsilon\to 0^{+}}\epsilon^{-1}\bigg{(}\frac{\partial L_{% \theta}}{\partial\dot{s}}(\dot{s}(t,\theta),s+\epsilon\dot{s})-\frac{\partial L% _{\theta}}{\partial\dot{s}}(\dot{s}(t,\theta),s)\bigg{)}.$

It remains to find a convenient expression for $\mathcal{O}_{\theta}(\dot{s},s)$ .

Lemma 3.11.

\mathcal{O}_{\theta}(\dot{s},s)=\frac{\partial A_{\theta}}{\partial\dot{s}}(% \dot{s},s)\bigg{\{}1+\frac{\alpha(1-s(\theta)\lambda_{\theta}(\dot{s},s))}{A_{% \theta}(\dot{s},s)^{2}}\bigg{\}}\bigg{\{}\frac{\Delta A_{\theta}(\dot{s},s)}{A% _{\theta}(\dot{s},s)}-\frac{\Delta\lambda_{\theta}(\dot{s},s)}{\lambda_{\theta% }(\dot{s},s)}\bigg{\}}\\ +\alpha\dot{s}\log\bigg{(}\frac{\lambda_{\theta}(s)}{A_{\theta}(\dot{s},s)}% \bigg{)}\bigg{\{}1+\frac{\alpha(1-s(\theta))\lambda_{\theta}(s)}{A_{\theta}(% \dot{s},s)^{2}}\bigg{\}}\bigg{(}\dot{s}(\theta)^{2}+4\alpha\lambda_{\theta}(s)% (1-s(\theta))\bigg{)}^{-3/2}\times\\ \bigg{(}(1-s(\theta))\Delta\lambda_{\theta}(\dot{s},s)-\lambda_{\theta}(s)\dot% {s}(\theta)\bigg{)}\\ +\alpha\frac{\partial A_{\theta}}{\partial\dot{s}}(\dot{s},s)\log\bigg{(}\frac% {\lambda_{\theta}(s)}{A_{\theta}(\dot{s},s)}\bigg{)}\bigg{(}\frac{\dot{s}(% \theta)\lambda_{\theta}(\dot{s},s)}{A_{\theta}(\dot{s},s)^{2}}+\frac{2(1-s(% \theta))\lambda_{\theta}(\dot{s},s)}{A_{\theta}(\dot{s},s)^{3}}\Delta A_{% \theta}(\dot{s},s)-\frac{(1-s(\theta))}{A_{\theta}(\dot{s},s)^{2}}\Delta% \lambda_{\theta}(\dot{s},s)\bigg{)}.

where

(103)		$\displaystyle\Delta\lambda_{\theta}(\dot{s},s)$	$\displaystyle=\dot{s}(\theta)\beta\int_{\mathcal{E}}\mathcal{J}(\theta,% \widetilde{\theta})(1-s(\widetilde{\theta}))d\widetilde{\theta}-\beta s(\theta% )\int_{\mathcal{E}}\mathcal{J}(\theta,\widetilde{\theta})\dot{s}(\widetilde{% \theta})d\widetilde{\theta}$
(104)		$\displaystyle\Delta A_{\theta}(\dot{s},s)$	$\displaystyle=\alpha\big{(}\dot{s}^{2}+4\alpha\lambda_{\theta}(s)(1-s(\theta))% \big{)}^{-1/2}\big{(}-\lambda_{\theta}(s)\dot{s}(\theta)+(1-s(\theta))\Delta% \lambda_{\theta}(s)\big{)}$

4 Proofs

There are two main steps to our proof of Theorem 2.3. The first step is to show that the system can be approximated very well by a system with averaged interactions. The next step is to prove the Large Deviation Principle for the system with averaged interactions (this is Theorem 4.2). The main result of this paper (Theorem 2.3) will follow from these results thanks to [28, Theorem 4.2.13].

4.1 Proof Outline

Our proof proceeds by transforming the Large Deviations of the uncoupled system to the Large Deviations of the averaged system through a time-rescaling transformation. Lets first outline the Large Deviations for the uncoupled system.

Let $\{Y^{j}_{\alpha\beta}(t)\}_{\alpha\beta\in\Gamma}$ be independent Poisson Processes of unit intensity. We define the empirical reaction flux $\grave{\mu}^{N}_{\alpha\mapsto\beta}\in\mathcal{M}\big{(}\mathcal{E}\times% \mathbb{R}^{+}\big{)}$ to be such that for any $A\in\mathfrak{B}(\mathcal{E})$ and an interval $[a,b]\subset\mathbb{R}^{+}$ ,

(105)

\displaystyle\grave{\mu}^{N}_{\alpha\mapsto\beta}\big{(}A\times[a,b]\big{)}=N^% {-1}\sum_{j\in I_{N}}\sum_{t\in[a,b]}\chi\big{\{}x^{j}_{N}\in A,Y^{j}_{\alpha% \beta}(t^{-})\neq Y^{j}_{\alpha\beta}(t)\big{\}}.

We write $\grave{\mu}^{N}=\big{(}\grave{\mu}^{N}_{\alpha\mapsto\beta}\big{)}_{\alpha,% \beta\in\Gamma}\in\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{% \Gamma\times\Gamma}$ . Define the rate function $\mathcal{I}:\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{\Gamma% \times\Gamma}\to\mathbb{R}$ as follows. For $\mu\in\mathcal{M}\big{(}\Gamma\times\mathcal{E}\times\mathbb{R}^{+}\big{)}^{% \Gamma\times\Gamma}$ , if there exists $\alpha,\beta\in\Gamma$ such that $\mu_{\alpha\mapsto\beta}$ is not absolutely continuous with respect to Lebesgue measure then define $\mathcal{I}(\mu)=\infty$ . Otherwise, writing $p_{\alpha\mapsto\beta}$ to be the density of $\mu_{\alpha\mapsto\beta}$ , define

(106)		$\displaystyle\mathcal{I}(\mu)=$	$\displaystyle\sum_{\alpha,\beta\in\Gamma}\int_{\mathcal{E}}\int_{0}^{\infty}% \ell\big{(}p_{\alpha\mapsto\beta}(x,t)\big{)}dt\kappa(dx)\text{ where }$
(107)		$\displaystyle\ell(a)=$	$\displaystyle a\log a-a+1.$

Note that $\ell(a)\geq 0$ . This means that the integral in (231) is well-defined (and could be $\infty$ ). We can now state a Large Deviation Principle for the uncoupled system.

Theorem 4.1.

Let $\mathcal{A},\mathcal{O}\subseteq\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^% {+}\big{)}^{\Gamma\times\Gamma}$ be (respectively) closed and open. Then

(108)		$\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}% \grave{\mu}^{N}\in\mathcal{A}\big{)}$	$\displaystyle\leq-\inf_{\mu\in\mathcal{A}}\mathcal{I}(\mu)$
(109)		$\displaystyle\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(% }\grave{\mu}^{N}\in\mathcal{O}\big{)}$	$\displaystyle\geq-\inf_{\mu\in\mathcal{O}}\mathcal{I}(\mu).$

Furthermore, $\mathcal{I}$ is lower semicontinuous and has compact level sets.

4.2 System with Averaged Interactions

We next define an approximate process with ‘averaged’ interactions. Let $\{\bar{\sigma}^{j}(t)\}_{j\in I_{N}}$ be a system of jump-Markov Processes such that, for $\alpha\neq\bar{\sigma}^{j}(t)$ , and $h\ll 1$ ,

(110)

\displaystyle\mathbb{P}\big{(}\bar{\sigma}^{j}(t+\Delta)=\alpha\;|\;\mathcal{F% }_{t}\big{)}=hf_{(\alpha)}\big{(}x^{j}_{N},\bar{\sigma}^{j}(t),\bar{w}^{j}(t)% \big{)}+O(h^{2}),

where $\bar{w}^{j}(t)=\big{(}\bar{w}^{j}_{\beta}(t)\big{)}_{\beta\in\Gamma}$ and

(111)

\displaystyle\bar{w}_{\beta}^{j}(t)=N^{-1}\sum_{k=1}^{N}\mathcal{J}(x^{j}_{N},% x^{k}_{N})\chi\{\bar{\sigma}^{k}(t)=\beta\}.

We take the initial conditions to be the same for the two systems, i.e. $\bar{\sigma}^{j}(0)=\sigma^{j}(0)$ . Later on, in the proofs, it will be useful to represent $\bar{\sigma}(t)$ as a time-rescaled version of the uncoupled system. To this end, define $\{Z^{j}_{\alpha\beta}(t)\}_{\alpha\beta\in\Gamma,\alpha\neq\beta}$ to ‘count’ the number of $\alpha\mapsto\beta$ transitions in the coupled system, i.e. be such that

(112)

\displaystyle Z^{j}_{\alpha\beta}(t)=

\displaystyle Y^{j}_{\alpha\beta}\bigg{(}\int_{0}^{t}f_{(\alpha)}\big{(}x^{j}_% {N},\bar{\sigma}^{j}(s),\bar{w}^{j}(s)\big{)}\chi\{\bar{\sigma}^{j}_{s}=\alpha% \}ds\bigg{)}

and for any $\alpha\in\Gamma$ ,

(113)		$\displaystyle\bar{\sigma}^{j}(t)=$	$\displaystyle\alpha\text{ if and only if }$
(114)		$\displaystyle\sum_{\beta\neq\alpha}\big{(}Z^{j}_{\beta\alpha}(t)-Z^{j}_{\alpha% \beta}(t)\big{)}+\chi\{\bar{\sigma}^{j}(0)=\alpha\}=$	$\displaystyle 1.$

Since (with unit probability) $Y^{j}_{\alpha\beta}(t)$ only makes a finite number of jumps over a bounded time interval, one easily checks that there exists a unique $\{\bar{\sigma}^{j}(t)\}_{j\in I_{N}\fatsemi t\geq 0}$ satisfying (112) - (114).

Theorem 4.2.

Let $\mathcal{A},\mathcal{O}\subseteq\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^% {+}\big{)}^{\Gamma\times\Gamma}$ be (respectively) closed and open. Then

(115)		$\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}% \bar{\mu}^{N}\in\mathcal{A}\big{)}$	$\displaystyle\leq-\inf_{\mu\in\mathcal{A}}\mathcal{G}(\mu)$
(116)		$\displaystyle\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(% }\bar{\mu}^{N}\in\mathcal{O}\big{)}$	$\displaystyle\geq-\inf_{\mu\in\mathcal{O}}\mathcal{G}(\mu).$

Furthermore, $\mathcal{G}$ is lower semicontinuous and has compact level sets.

This will be proved further below, in Section 4.3.

We define the empirical reaction flux $\bar{\mu}^{N}_{\alpha\mapsto\beta}\in\mathcal{M}\big{(}\mathcal{E}\times% \mathbb{R}^{+}\big{)}$ to be such that for any $A\in\mathfrak{B}(\mathcal{E})$ and an interval $[a,b]\subset\mathbb{R}^{+}$ ,

(117)

\displaystyle\bar{\mu}^{N}_{\alpha\mapsto\beta}\big{(}A\times[a,b]\big{)}=N^{-% 1}\sum_{j\in I_{N}}\sum_{t\in[a,b]}\chi\big{\{}x^{j}_{N}\in A,Z^{j}_{\alpha% \beta}(t^{-})\neq Z^{j}_{\alpha\beta}(t)\big{\}}.

We write $\grave{\mu}^{N}=\big{(}\grave{\mu}^{N}_{\alpha\mapsto\beta}\big{)}_{\alpha,% \beta\in\Gamma}\in\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{% \Gamma\times\Gamma}$ .

We next use Girsanov’s Theorem to compare the Large Deviations in our main result (Theorem 2.3) to the Large Deviation Principle in Theorem 4.2. Let $\bar{P}^{N}_{T}\in\mathcal{P}\big{(}\mathcal{D}([0,T],\Gamma)^{N}\big{)}$ be the probability law of $\big{(}\bar{\sigma}^{j}_{t}\big{)}_{j\in I_{N}\fatsemi t\leq T}$ . Let $P^{N}_{J,T}\in\mathcal{P}\big{(}\mathcal{D}([0,T],\Gamma)^{N}\big{)}$ be the probability law of the original system $\big{(}\sigma^{j}_{t}\big{)}_{j\in I_{N}\fatsemi t\leq T}$ . Thanks to Girsanov’s Theorem [15],

(118)

\frac{dP^{N}_{J,T}}{d\bar{P}^{N}_{T}}=\exp\big{(}N\Gamma_{T}(\sigma)\big{)}

where

(119)

\Gamma_{T}(\sigma)=N^{-1}\sum_{j\in I_{N}}\sum_{\beta\in\Gamma}\int_{0}^{T}% \big{(}f_{\beta}(x^{j}_{N},\sigma^{j}(s),\bar{w}^{j}_{s})-f_{\beta}(x^{j}_{N},% \sigma^{j}(s),w^{j}_{s})\big{)}ds\\ +N^{-1}\sum_{j\in I_{N}}\sum_{s\leq T\;:\;\sigma^{j}(s^{-})\neq\sigma^{j}(s)}% \sum_{\beta\neq\sigma^{j}_{s}}\bigg{\{}\log\big{(}f_{\beta}(x^{j}_{N},\sigma^{% j}(s^{-}),w^{j}_{s^{-}})\big{)}-\log\big{(}f_{\beta}(x^{j}_{N},\sigma^{j}(s^{-% }),\bar{w}^{j}_{s^{-}})\big{)}\\ +N^{-1}\sum_{j\in I_{N}}\sum_{\beta\neq\sigma^{j}_{T}}\bigg{\{}\log\big{(}f_{% \beta}(x^{j}_{N},\sigma^{j}(T),w^{j}_{T})\big{)}-\log\big{(}f_{\beta}(x^{j}_{N% },\sigma^{j}(T),\bar{w}^{j}_{T})\big{)}\bigg{\}}.

In the following lemma we prove that the Girsanov Exponent is with very high probability uniformly upperbounded.

Lemma 4.3.

For any $\epsilon,T>0$ ,

(120)

\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}% \big{|}\Gamma_{T}\big{|}\geq\epsilon\big{)}=-\infty.

We can now state the proof of our main result, Theorem 2.3.

Proof 4.4.

Let

\mathcal{V}_{N,\epsilon}=\big{\{}\big{|}\Gamma_{T}(\sigma)\big{|}\leq\epsilon% \big{\}}.

Starting with the upper bound, let $\mathcal{A}\subset\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{% \Gamma\times\Gamma}$ be closed. Then for any $\epsilon>0$ ,

(121)

\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}\hat{\mu}^{N}% \in\mathcal{A}\big{)}\leq\\ \max\bigg{\{}\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}% \hat{\mu}^{N}\in\mathcal{A},\mathcal{V}_{N,\epsilon}\big{)},\underset{N\to% \infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}\mathcal{V}_{N,\epsilon}^{c% }\big{)}\bigg{\}}

The second term on the RHS is $-\infty$ , thanks to Lemma 4.3. It thus suffices that we demonstrate that

(122)

\displaystyle\lim_{\epsilon\to 0^{+}}\underset{N\to\infty}{\overline{\lim}}N^{% -1}\log\mathbb{P}\big{(}\hat{\mu}^{N}\in\mathcal{A},\mathcal{V}_{N,\epsilon}% \big{)}\leq-\inf_{\mu\in\mathcal{A}}\mathcal{G}(\mu).

Now thanks to the Girsanov Expression in (118)

	$\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}% \hat{\mu}^{N}\in\mathcal{A},\mathcal{V}_{N,\epsilon}\big{)}\leq$	$\displaystyle\epsilon+\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{% P}\big{(}\bar{\mu}^{N}\in\mathcal{A},\mathcal{V}_{N,\epsilon}\big{)}$
(123)		$\displaystyle\leq$	$\displaystyle\epsilon+\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{% P}\big{(}\bar{\mu}^{N}\in\mathcal{A}\big{)}$
(124)		$\displaystyle\leq$	$\displaystyle\epsilon-\inf_{\mu\in\mathcal{A}}\mathcal{G}(\mu),$

thanks to Theorem 4.2. Taking $\epsilon$ to $0$ , we obtain (122).

Turning to the lower bound, let $\mathcal{O}\subset\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{% \Gamma\times\Gamma}$ be open, we find that for any $\epsilon>0$ ,

	$\displaystyle\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(% }\hat{\mu}^{N}\in\mathcal{O}\big{)}\geq$	$\displaystyle\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(% }\hat{\mu}^{N}\in\mathcal{O},\mathcal{V}_{N,\epsilon}\big{)}$
(125)		$\displaystyle\geq$	$\displaystyle\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(% }\bar{\mu}^{N}\in\mathcal{O},\mathcal{V}_{N,\epsilon}\big{)}-\epsilon,$

thanks to (118). Now

(126)

\displaystyle\mathbb{P}\big{(}\bar{\mu}^{N}\in\mathcal{O},\mathcal{V}_{N,% \epsilon}\big{)}=\mathbb{P}\big{(}\bar{\mu}^{N}\in\mathcal{O}\big{)}-\mathbb{P% }\big{(}\bar{\mu}^{N}\in\mathcal{O},\mathcal{V}^{c}_{N,\epsilon}\big{)}

and since $\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(}\bar{\mu}^{N% }\in\mathcal{O},\mathcal{V}^{c}_{N,\epsilon}\big{)}=-\infty$ , it must hold that

	$\displaystyle\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(% }\bar{\mu}^{N}\in\mathcal{O},\mathcal{V}_{N,\epsilon}\big{)}\geq$	$\displaystyle\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(% }\bar{\mu}^{N}\in\mathcal{O}\big{)}$
(127)		$\displaystyle\geq$	$\displaystyle-\inf_{\mu\in\mathcal{O}}\mathcal{G}(\mu).$

Taking $\epsilon\to 0^{+}$ , we obtain that

(128)

\displaystyle\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(% }\hat{\mu}^{N}\in\mathcal{O}\big{)}\geq-\inf_{\mu\in\mathcal{O}}\mathcal{G}(% \mu),

as required.

We next prove Lemma 4.3.

Proof 4.5.

It suffices to demonstrate the following three inequalities

(129)		$\displaystyle\underset{N\to\infty}{\overline{\lim}}$	$\displaystyle N^{-1}\log\mathbb{P}\bigg{(}N^{-1}\sum_{j\in I_{N}}\sum_{\beta% \in\Gamma}\int_{0}^{T}\big{(}f_{\beta}(x^{j}_{N},\sigma^{j}(s),\bar{w}^{j}_{s}% )-f_{\beta}(x^{j}_{N},\sigma^{j}(s),w^{j}_{s})\big{)}ds\geq\epsilon/3\bigg{)}=-\infty$
	$\displaystyle\underset{N\to\infty}{\overline{\lim}}$	$\displaystyle N^{-1}\log\mathbb{P}\bigg{(}N^{-1}\sum_{j\in I_{N}}\sum_{s\leq T% \;:\;\sigma^{j}(s^{-})\neq\sigma^{j}(s)}\sum_{\beta\neq\sigma^{j}(s)}\bigg{\{}% \log\big{(}f_{\beta}(x^{j}_{N},\sigma^{j}(s^{-}),w^{j}_{s^{-}})\big{)}$
(130)			$\displaystyle-\log\big{(}f_{\beta}(x^{j}_{N},\sigma^{j}(s^{-}),\bar{w}^{j}_{s^% {-}})\big{)}\bigg{\}}\geq\epsilon/3\bigg{)}=-\infty$
	$\displaystyle\underset{N\to\infty}{\overline{\lim}}$	$\displaystyle N^{-1}\log\mathbb{P}\bigg{(}N^{-1}\sum_{j\in I_{N}}\sum_{\beta% \neq\sigma^{j}(T)}\bigg{\{}\log\big{(}f_{\beta}(x^{j}_{N},\sigma^{j}(T),w^{j}_% {T})\big{)}$
(131)			$\displaystyle-\log\big{(}f_{\beta}(x^{j}_{N},\sigma^{j}(T),\bar{w}^{j}_{T})% \big{)}\bigg{\}}\geq\epsilon/3\bigg{)}=-\infty.$

The demonstration of (131) is very similar to that of (130) and will be neglected.

For each $\beta\in\Gamma$ , it follows from the fact that $f_{\beta}$ is Lipschitz that there is a constant $C>0$ such that

(132)

\displaystyle N^{-1}\sum_{j\in I_{N}}\big{|}f_{\beta}(x^{j}_{N},\sigma^{j}(s),% \bar{w}^{j}_{s})-f_{\beta}(x^{j}_{N},\sigma^{j}(s),w^{j}_{s})\big{|}\leq N^{-1% }C\sum_{j\in I_{N}}\|\bar{w}^{j}_{s}-w^{j}_{s}\|.

Furthermore Assumption 2.1 implies that there must exist a non-random sequence $(\delta_{N})_{N\geq 1}$ that decreases to $0$ and such that

(133)

\displaystyle N^{-1}\sum_{j\in I_{N}}\|\bar{w}^{j}_{s}-w^{j}_{s}\|\leq\delta_{% N}.

Once $N$ is large enough that $\delta_{N}<\epsilon/2$ , (129) must hold.

Turning to (130), since $f_{\beta}$ is (i) Lipschitz and (ii) uniformly lower-bounded by a positive constant and (iii) uniformly upperbounded, there exists a constant $C>0$ such that (for the constant $\eta^{j}_{N}$ defined in Assumption 2.1)

	$\displaystyle\bigg{\|}N^{-1}\sum_{j\in I_{N}}\sum_{s\leq T\;:\;\sigma^{j}(s^{-}% )\neq\sigma^{j}(s)}\sum_{\beta\neq\sigma^{j}(s)}$	$\displaystyle\bigg{\{}\log\big{(}f_{\beta}(x^{j}_{N},\sigma^{j}(s),w^{j}_{s})% \big{)}-\log\big{(}f_{\beta}(x^{j}_{N},\sigma^{j}(s),\bar{w}^{j}_{s})\big{)}% \bigg{\}}\bigg{\|}$
	$\displaystyle\leq$	$\displaystyle CN^{-1}\sum_{j\in I_{N}}\sum_{s\leq T\;:\;\sigma^{j}(s^{-})\neq% \sigma^{j}(s)}\eta^{j}_{N}$
	$\displaystyle\leq$	$\displaystyle CN^{-1}\sum_{j\in I_{N}}\sum_{\alpha,\beta\in\Gamma}Z^{j}_{(% \alpha,\beta)}(T)\delta^{j}_{N}$
(134)		$\displaystyle\leq$	$\displaystyle CN^{-1}\sum_{j\in I_{N}}\sum_{\alpha,\beta\in\Gamma}Y^{j}_{(% \alpha,\beta)}(f_{max}T)\eta^{j}_{N}$

Thanks to Chernoff’s Inequality, for a constant $a>0$ ,

	$\displaystyle\mathbb{P}\bigg{(}CN^{-1}\sum_{j\in I_{N}}\sum_{\alpha,\beta\in\Gamma}$	$\displaystyle Y^{j}_{(\alpha,\beta)}(f_{max}T)\eta^{j}_{N}\geq\epsilon\bigg{)}% \leq\mathbb{E}\bigg{[}\exp\bigg{(}a\sum_{j\in I_{N}}\sum_{\alpha,\beta\in% \Gamma}Y^{j}_{(\alpha,\beta)}(f_{max}T)\eta^{j}_{N}-aC^{-1}N\epsilon\bigg{)}% \bigg{]}$
		$\displaystyle\leq\prod_{j\in I_{N}}\big{\{}1+f_{max}T\big{(}\exp(a\eta^{j}_{N}% )-1\big{)}\big{\}}^{\|\Gamma\|}\exp\big{(}-aC^{-1}N\epsilon\big{)}$
(135)			$\displaystyle\leq\exp\bigg{(}\|\Gamma\|f_{max}T\sum_{j\in I_{N}}\big{(}\exp(a% \eta^{j}_{N})-1\big{)}-aC^{-1}N\epsilon\bigg{)}$

We next claim that for arbitrarily large $a$

(136)

\lim_{N\to\infty}\big{\{}N^{-1}|\Gamma|f_{max}T\sum_{j\in I_{N}}\big{(}\exp(a% \eta^{j}_{N})-1\big{)}\big{\}}=0.

Now Assumption 2.1 implies that there exists a non-random constant $C_{\mathcal{J}}$ such that $\eta^{j}_{N}\leq\mathcal{C}_{\mathcal{J}}$ with unit probability. We thus obtain that, for $\delta$ small enough that for all $b\in[0,\delta]$ , $\exp(ab)-1\leq 2b$ ,

(137)		$\displaystyle N^{-1}\sum_{j\in I_{N}}\big{(}\exp(a\eta^{j}_{N})-1\big{)}$	$\displaystyle\leq N^{-1}\sum_{j\in I_{N}}2a\eta^{j}_{N}\chi\big{\{}\eta^{j}_{N% }\leq\delta\big{\}}+N^{-1}\sum_{j\in I_{N}}\exp\big{(}aC_{\mathcal{J}}\big{)}% \chi\big{\{}\eta^{j}_{N}\geq\delta\big{\}}$
(138)			$\displaystyle\leq N^{-1}\sum_{j\in I_{N}}2a\eta^{j}_{N}+N^{-1}\sum_{j\in I_{N}% }\exp\big{(}aC_{\mathcal{J}}\big{)}\chi\big{\{}\eta^{j}_{N}\geq\delta\big{\}}.$

Assumption 2.1 implies that

(139)		$\displaystyle\lim_{N\to\infty}N^{-1}\sum_{j\in I_{N}}\chi\big{\{}\eta^{j}_{N}% \geq\delta\big{\}}$	$\displaystyle=0$
(140)		$\displaystyle\lim_{N\to\infty}N^{-1}\sum_{j\in I_{N}}2a\eta^{j}_{N}$	$\displaystyle=0.$

We have thus established (136).

We therefore find that for arbitrarily large $a$ ,

(141)

\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\bigg{(% }CN^{-1}\sum_{j\in I_{N}}\sum_{\alpha,\beta\in\Gamma}Y^{j}_{(\alpha,\beta)}(f_% {max}T)\eta^{j}_{N}\geq\epsilon\bigg{)}\leq-aC^{-1}.

Taking $a\to\infty$ , we obtain (130). We have thus established (129) and (130).

For a reaction $\alpha\mapsto\beta$ , define the empirical flux measure $\bar{\mu}^{N}_{\alpha\mapsto\beta}\in\mathcal{M}(\mathcal{E}\times\mathbb{R}^{% +})$ for the averaged system to be such that, for measurable $A\subseteq\mathcal{E}$ and a time interval $[a,b]$ ,

(142)

\displaystyle\bar{\mu}^{N}_{\alpha\mapsto\beta}\big{(}A\times[a,b]\big{)}=N^{-% 1}\sum_{j\in I_{N}}\sum_{s\in[a,b]}\chi\big{\{}x^{j}_{N}\in A,\bar{\sigma}^{j}% _{s^{-}}=\alpha,\bar{\sigma}^{j}_{s}=\beta\big{\}}.

Writing $\delta_{m}=\big{(}\log m\big{)}^{-1}$ , define the set

(143)

\displaystyle\mathcal{K}_{m}=\bigg{\{}\mu\in\mathcal{M}\big{(}\mathcal{E}% \times\mathbb{R}^{+}\big{)}^{\Gamma\times\Gamma}\;:\;\text{ For all }0\leq b% \leq m^{2},\;\mu_{\alpha\mapsto\beta}\big{(}\mathcal{E}\times[b/m,(b+1)/m]\big% {)}\leq\delta_{m}\bigg{\}}.

Lemma 4.6.

There exists $N_{m}$ such that for all $N\geq N_{m}$ ,

(144)

\displaystyle N^{-1}\log\mathbb{P}\big{(}\bar{\mu}^{N}\notin\mathcal{K}_{m}% \big{)}\leq-\frac{1}{2}\sqrt{\log m}

Furthermore $\mathcal{K}_{m}$ is compact.

Proof 4.7.

Using a union of events bound,

(145)

\displaystyle\mathbb{P}\big{(}\bar{\mu}^{N}\notin\mathcal{K}_{m}\big{)}\leq% \sum_{a=0}^{m^{2}}\mathbb{P}\big{(}N^{-1}\sup_{\alpha,\beta\in\Gamma}\sum_{j% \in I_{N}}\big{(}Z^{j}_{(\alpha,\beta)}\big{(}t^{(m)}_{a+1}\big{)}-Z^{j}_{(% \alpha,\beta)}\big{(}t^{(m)}_{a}\big{)}\big{)}\geq\delta_{m}\big{)}

For a positive integer $a$ , and $c>0$ , thanks to Chernoff’s Inequality,

	$\displaystyle\mathbb{P}\big{(}N^{-1}\sup_{\alpha,\beta\in\Gamma}\sum_{j\in I_{% N}}\big{(}$	$\displaystyle Z^{j}_{(\alpha,\beta)}\big{(}t^{(m)}_{a+1}\big{)}-Z^{j}_{(\alpha% ,\beta)}\big{(}t^{(m)}_{a}\big{)}\big{)}\geq\delta_{m}\big{)}$
		$\displaystyle\leq\sum_{\alpha,\beta\in\Gamma}\mathbb{P}\big{(}N^{-1}\sum_{j\in I% _{N}}\big{(}Z^{j}_{(\alpha,\beta)}\big{(}t^{(m)}_{a+1}\big{)}-Z^{j}_{(\alpha,% \beta)}\big{(}t^{(m)}_{a}\big{)}\big{)}\geq\delta_{m}\big{)}$
		$\displaystyle\leq\sum_{\alpha,\beta\in\Gamma}\mathbb{E}\bigg{[}\exp\bigg{(}c% \sum_{j\in I_{N}}\big{(}Z^{j}_{(\alpha,\beta)}\big{(}t^{(m)}_{a+1}\big{)}-Z^{j% }_{(\alpha,\beta)}\big{(}t^{(m)}_{a}\big{)}\big{)}-cN\delta_{m}\bigg{)}\bigg{]}$
		$\displaystyle\leq\sum_{\alpha,\beta\in\Gamma}\mathbb{E}\bigg{[}\exp\bigg{(}c% \sum_{j\in I_{N}}\big{(}Y^{j}_{(\alpha,\beta)}\big{(}t^{(m)}_{a}+f_{max}/m\big% {)}-Y^{j}_{(\alpha,\beta)}\big{(}t^{(m)}_{a}\big{)}\big{)}-cN\delta_{m}\bigg{)% }\bigg{]}$
(146)			$\displaystyle=\exp\big{(}-cN\delta_{m}\big{)}\sum_{\alpha,\beta\in\Gamma}\prod% _{j\in I_{N}}\bigg{(}1+f_{max}m^{-1}\big{(}\exp(c)-1\big{)}\bigg{)}^{N}$

Thanks to the inequality $1+x\leq\exp(x)$ ,

(147)

\exp\big{(}-cN\delta_{m}\big{)}\sum_{\alpha,\beta\in\Gamma}\prod_{j\in I_{N}}% \bigg{(}1+f_{max}m^{-1}\big{(}\exp(c)-1\big{)}\bigg{)}^{N}\leq\\ |\Gamma|^{2}\exp\bigg{(}-cN\delta_{m}+Nf_{max}m^{-1}\big{(}\exp(c)-1\big{)}% \bigg{)}.

We choose $\delta_{m}=(\log m)^{-1/2}$ and $c=\log m$ , and we obtain that

(148)

\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}% \bar{\mu}^{N}\notin\mathcal{K}_{m}\big{)}=-\sqrt{\log m}.

Taking $m\to\infty$ , we obtain the lemma. Since $\mathcal{E}$ is compact, the compactness of $\mathcal{K}_{m}$ is immediate from Prokhorov’s Theorem.

4.3 Large Deviations of the Averaged System

We prove a Large Deviation Principle for the system with averaged interactions. Recall that $\bar{\mu}^{N}$ is the empirical reaction flux for the system with averaged interactions (142). Our first step is to prove the upper bound of Theorem 4.2. Our method is to show that the empirical reaction flux $\grave{\mu}^{N}$ of the driving Poisson Processes can be written as an almost-continuous transformation of the empirical reaction flux $\bar{\mu}^{N}$ associated to the coupled system.

We start with the upper bound. We are going to show that there exists a measurable map $\Phi:\mathcal{P}(\Gamma\times\mathcal{E})\times\mathcal{M}(\mathcal{E}\times% \mathbb{R}^{+})^{\Gamma\times\Gamma}\mapsto\mathcal{M}(\mathcal{E}\times% \mathbb{R}^{+})^{\Gamma\times\Gamma}$ such that, with unit probability,

(149)

\displaystyle\Phi_{\hat{\nu}^{N}_{0}}\big{(}\bar{\mu}^{N}\big{)}=\grave{\mu}^{% N}.

Furthermore $\Phi$ will have the useful property that, with very high probability, it can be approximated extremely well by a continuous function $\Phi^{(m)}:\mathcal{M}(\mathcal{E}\times\mathbb{R}^{+})^{\Gamma\times\Gamma}% \mapsto\mathcal{M}(\mathcal{E}\times\mathbb{R}^{+})^{\Gamma\times\Gamma}$ , which we now define.

For a positive integer $m$ , let $\big{\{}S^{(m)}_{i}\big{\}}_{1\leq i\leq M_{m}}\subset\mathfrak{B}\big{(}% \mathcal{E}\big{)}$ be disjoint sets such that (i) $\text{diam}\big{(}S^{(m)}_{i}\big{)}\leq m^{-1}$ , (ii) the interior of $S^{(m)}_{i}$ is nonempty and

(150)		$\displaystyle\mathcal{E}=$	$\displaystyle\bigcup_{1\leq i\leq M_{m}}S^{(m)}_{i}$
(151)		$\displaystyle S^{(m)}_{i}\cap S^{(m)}_{j}=$	$\displaystyle\emptyset\text{ if }i\neq j.$

Let $\theta^{(m)}_{i}$ be any point in $S^{(m)}_{i}$ . Next, define

(152)

\displaystyle\Lambda^{(m)}_{(\alpha,\beta),t}(\theta,z,\nu_{0},\mu):\mathcal{E% }\times D([0,t],\Gamma)\times\mathcal{P}(\Gamma\times\mathcal{E})\times% \mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{\Gamma\times\Gamma}% \to\mathbb{R}^{+}

as follows. For any $\nu_{0}\in\mathcal{P}(\Gamma\times\mathcal{E})$ and $\mu\in\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{\Gamma\times\Gamma}$ write

\big{(}\nu_{t}\big{)}_{t\geq 0}=\Psi_{\nu_{0}}(\mu)

and then define for any $\theta\in S^{(m)}_{i}$ and any $\alpha\neq\beta$ ,

(153)	$\displaystyle\Lambda^{(m)}_{(\alpha,\beta),t}(\theta,\nu_{0},\mu)=$	$\displaystyle\int_{0}^{t}f_{(\beta)}(\theta,\alpha,W^{(m)}_{s}(\theta))\nu_{s}% (\alpha\times S^{(m)}_{i})ds\text{ where }$
(154)	$\displaystyle W^{(m)}_{s}(\theta)=$	$\displaystyle\big{(}W^{(m)}_{s,\zeta}(\theta)\big{)}_{\zeta\in\Gamma}$
(155)	$\displaystyle W^{(m)}_{s,\zeta}(\theta)=$	$\displaystyle\mathbb{E}^{(\alpha,\widetilde{\theta})\sim\nu_{t}}\big{[}\chi\{% \zeta=\alpha\}\mathcal{J}(\theta^{(m)}_{i},\widetilde{\theta})\big{]}.$

For any $\alpha,\beta\in\Gamma$ , $z\in\mathcal{D}([0,t],\Gamma)$ , we write $\Lambda_{(\alpha,\beta,z,\nu_{0},\mu)}:\mathcal{E}\times\mathbb{R}^{+}\to% \mathcal{E}\times\mathbb{R}^{+}$ to be such that

(156)

\displaystyle\Lambda^{(m)}_{(\alpha,\beta,z,\nu_{0},\mu)}(\theta,t)=\big{(}% \theta,\Lambda^{(m)}_{(\alpha,\beta),t}(\theta,z,\nu_{0},\mu)\big{)}.

Write $\Lambda^{(m),-1}_{(\alpha,\beta,z,\nu_{0},\mu)}:\mathcal{E}\times\mathbb{R}^{+% }\to\mathcal{E}\times\mathbb{R}^{+}$ to be the inverse-function of $\Lambda^{(m)}_{(\alpha,\beta,z,\nu_{0},\mu)}$ . This exists (and is continuously-differentiable with respect to time) because (by assumption) $f_{\beta}(\cdot,\cdot,\cdot)$ is uniformly bounded away from zero.

We now define $\Phi^{(m)}_{\nu_{0}}(\mu):=\xi=(\xi_{\alpha\mapsto\beta})_{\alpha,\beta\in\Gamma}$ as follows. For any $A\in\mathfrak{B}(\mathcal{E})$ and a time interval $B\in\mathfrak{B}(\mathbb{R}^{+})$ , we stipulate that

(157)

\displaystyle\xi_{\alpha\mapsto\beta}(\Lambda^{(m)}_{(\alpha,\beta,z,\nu_{0},% \mu)}(A\times B))=\mu_{\alpha\mapsto\beta}\big{(}A\times B\big{)}.

We obtain the following property.

Lemma 4.8.

$\Phi^{(m)}$ is uniquely well-defined for any $m\geq 1$ . Furthermore, $\Phi^{(m)}$ is continuous in both of its arguments, as long as $\mathcal{M}(\mathcal{E}\times[0,\infty))^{\Gamma\times\Gamma}$ is endowed with the topology $\mathcal{T}$ defined in the Appendix..

The proof follows almost immediately from the definitions.

Lemma 4.9.

For any $\epsilon,L>0$ , there exists $m_{\epsilon,L}$ such that for all $m,n\geq m_{\epsilon,L}$ ,

(158)

\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\bigg{(% }d\big{(}\Phi^{(m)}_{\hat{\nu}^{N}_{0}}(\bar{\mu}^{N}),\Phi^{(n)}_{\hat{\nu}^{% N}_{0}}(\bar{\mu}^{N})\big{)}\geq\epsilon\bigg{)}\leq-L.

Proof 4.10.

Let $T>0$ and $\delta\ll 1$ be arbitrary. Write $\mathcal{U}_{T,\delta,\epsilon}\subset\mathcal{P}(\Gamma\times\mathcal{E})% \times\mathcal{M}(\mathcal{E}\times[0,T])^{\Gamma\times\Gamma}$ to consist of all $\nu,\mu$ such that for any set $A\in\mathfrak{B}(\mathcal{E})$ whose diameter is less than $\delta$ , and any sub-interval $[a,b]$ with $b-a<\delta$ and $b\leq T$ ,

(159)		$\displaystyle\mu_{\alpha\mapsto\beta}(A\times[a,b])$	$\displaystyle\leq\epsilon$
(160)		$\displaystyle\nu(\alpha\times A)$	$\displaystyle\leq\epsilon.$

For $\mu\in\mathcal{M}(\mathcal{E}\times[0,T])^{\Gamma\times\Gamma}$ , $\nu\in\mathcal{P}(\Gamma\times\mathcal{E})$ , $t\leq T$ , $\theta\in S^{(m)}_{i}$ , define

(161)

\displaystyle W^{(m)}_{t,\zeta,\mu,\nu}(\theta)=\mathbb{E}^{(\beta,\widetilde{% \theta})\sim\nu}\big{[}\chi\{\beta=\zeta\}\mathcal{J}(\theta^{(m)}_{i},% \widetilde{\theta})\big{]}+\sum_{\alpha\neq\zeta}\bigg{(}\mathbb{E}^{\mu_{% \alpha\mapsto\zeta}}\big{[}\mathcal{J}(\theta^{(m)}_{i},\widetilde{\theta})% \big{]}-\mathbb{E}^{\mu_{\zeta\mapsto\alpha}}\big{[}\mathcal{J}(\theta^{(m)}_{% i},\widetilde{\theta})\big{]}\bigg{)}.

We next claim that for any $\epsilon>0$ , there exists $m_{\epsilon}\in\mathbb{Z}^{+}$ such that as long as $m,n\geq m_{\epsilon}$ , and writing $\eta=\frac{1}{2m_{\epsilon}}$ , it must be that

(162)

\displaystyle\sup_{(\mu,\nu)\in\mathcal{U}_{T,\eta,\epsilon}}\sup_{\theta\in% \mathcal{E}}\sup_{s\leq T}\sup_{\zeta\in\Gamma}\big{|}W^{(m)}_{t,\zeta,\mu,\nu% }(\theta)-W^{(n)}_{t,\zeta,\mu,\nu}(\theta)\big{|}\leq 2\epsilon

Indeed (162) will hold as long as $m_{\epsilon}$ is big enough that

(163)

\sup\bigg{\{}\big{|}\mathcal{J}(\theta,\widetilde{\theta})-\mathcal{J}(\theta,% \bar{\theta})\big{|}+\big{|}\mathcal{J}(\widetilde{\theta},\theta)-\mathcal{J}% (\bar{\theta},\theta)\big{|}:\theta,\widetilde{\theta},\bar{\theta}\in\mathcal% {E},\;d_{\mathcal{E}}(\widetilde{\theta},\bar{\theta})\leq\frac{1}{4m_{% \epsilon}}\bigg{\}}\leq\epsilon/(2|\Gamma|),

which is possible because $\mathcal{J}$ is uniformly continuous. Indeed (163) implies that for any $m\geq m_{\epsilon}$ , then necessarily

(164)

\big{|}\mathcal{J}(\theta,\widetilde{\theta})-\mathcal{J}(\theta,\bar{\theta})% \big{|}+\big{|}\mathcal{J}(\widetilde{\theta},\theta)-\mathcal{J}(\bar{\theta}% ,\theta)\big{|}\leq\frac{1}{4m},

and therefore (162) holds. For any $\iota\ll 1$ , through taking $\epsilon$ to be sufficiently small, it must therefore hold that as long as $(\mu,\nu)\in\mathcal{U}_{T,\eta,\epsilon}$ , for all $m,n\geq m_{\epsilon}$ ,

(165)

\displaystyle\sup_{\theta\in\mathcal{E}}\sup_{\alpha,\beta\in\Gamma,\alpha\neq% \beta}\sup_{t\leq T}\big{|}\Lambda^{(m)}_{(\alpha,\beta),t}(\theta,\nu,\mu)-% \Lambda^{(n)}_{(\alpha,\beta),t}(\theta,\nu,\mu)\big{|}\leq\iota.

We next define

(166)

\Phi_{\nu}(\mu)=\lim_{p\to\infty}\Phi_{\nu}^{(m_{p})}\big{(}\mu\big{)}

as long as the limit exists, where $(m_{p})_{p\geq 1}$ is an increasing sequence such that for all $n\geq m_{p}$ ,

(167)

\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\bigg{(% }d\big{(}\Phi^{(m_{p})}_{\hat{\nu}^{N}_{0}}(\bar{\mu}^{N}),\Phi^{(n)}_{\hat{% \nu}^{N}_{0}}(\bar{\mu}^{N})\big{)}\geq 2^{-p}\bigg{)}\leq-p.

(It has just been proved in Lemma 4.9 that the sequence $(m_{p})_{p\geq 1}$ exists).

Lemma 4.11.

$\grave{\mu}^{N}$ is identically distributed (in probability law) to $\Phi\big{(}\bar{\mu}^{N}\big{)}$ .

Proof 4.12.

This follows from the time-rescaled representation of the averaged system in (112): with this representation, $\Phi_{\hat{\nu}^{N}_{0}}\big{(}\bar{\mu}^{N}\big{)}=\grave{\mu}^{N}$ .

Lemma 4.13.

For any $\epsilon>0$ ,

(168)

\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}% d\big{(}\Phi_{\hat{\nu}^{N}_{0}}(\bar{\mu}^{N}),\Phi_{\nu_{0}}(\bar{\mu}^{N})% \big{)}\geq\epsilon\big{)}=-\infty.

Proof 4.14.

Thanks to (LABEL:eq:_to_prove_lipschitz_Lambda), for any $T>0$ ,

(169)

\displaystyle\sup_{\alpha,\beta\in\Gamma}\sup_{t\in[0,T]}\sup_{\theta\in% \mathcal{E}}\big{|}\Lambda_{(\alpha,\beta),t}(\theta,\nu_{0},\bar{\mu}^{N})-% \Lambda_{(\alpha,\beta),t}(\theta,\hat{\nu}^{N}_{0},\bar{\mu}^{N})\big{|}\to 0.

uniformly as $N\to\infty$ ,

We can now prove the upper bound in Theorem 4.2.

Lemma 4.15.

Let $\mathcal{A}\subseteq\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{% \Gamma\times\Gamma}$ be closed. Then

(170)

\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}% \bar{\mu}^{N}\in\mathcal{A}\big{)}

\displaystyle\leq-\inf_{\mu\in\mathcal{A}}\mathcal{G}(\mu).

Furthermore, $\mathcal{G}$ is lower semicontinuous and has compact level sets.

Proof 4.16.

Thanks to Lemma 4.11,

(171)

\displaystyle\mathbb{P}\big{(}\bar{\mu}^{N}\in\mathcal{A}\big{)}=\mathbb{P}% \big{(}\grave{\mu}^{N}\in\Phi_{\hat{\nu}^{N}_{0}}(\mathcal{A})\big{)}.

Define the event

(172)

\displaystyle\mathcal{Q}_{N}^{(m)}=\big{\{}d\big{(}\grave{\mu}^{N},\Phi^{(m)}_% {\hat{\nu}^{N}_{0}}(\hat{\mu}^{N})\big{)}\leq\delta\big{\}}

Furthermore for the integer $m_{p}$ ,

	$\displaystyle\mathbb{P}\big{(}\grave{\mu}^{N}\in\Phi_{\hat{\nu}^{N}_{0}}(% \mathcal{A})\big{)}$	$\displaystyle\leq\mathbb{P}\big{(}\grave{\mu}^{N}\in\Phi_{\hat{\nu}^{N}_{0}}(% \mathcal{A}),\mathcal{Q}_{N}^{(m_{p})}\big{)}+\mathbb{P}\big{(}(\mathcal{Q}_{N% }^{(m_{p})})^{c}\big{)}$
(173)			$\displaystyle\leq\mathbb{P}\big{(}\grave{\mu}^{N}\in\bar{\mathcal{A}}^{(m)}_{% \delta},\mathcal{Q}_{N}^{(m_{p})}\big{)}+\mathbb{P}\big{(}(\mathcal{Q}_{N}^{(m% _{p})})^{c}\big{)}$

where $\bar{\mathcal{A}}^{(m)}_{\delta}$ is the closed $\delta$ -blowup of $\Phi^{(m)}_{\nu_{0}}(\mathcal{A})$ . Thanks to Lemma 4.13,

(174)

\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}% \grave{\mu}^{N}\in\bar{\mathcal{A}}^{(m)}_{\delta},\mathcal{Q}_{N}^{(m_{p})}% \big{)}\leq\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}% \grave{\mu}^{N}\in\bar{\mathcal{A}}^{(m)}_{\delta}\big{)}.

Since $\bar{\mathcal{A}}^{(m)}_{\delta}$ is closed, the Large Deviations of the uncoupled system in Theorem 4.1 implies that

(175)

\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}% \grave{\mu}^{N}\in\bar{\mathcal{A}}_{\delta}\big{)}\leq-\inf_{\mu\in\bar{% \mathcal{A}}^{(m)}_{\delta}}\mathcal{I}(\mu).

Taking $\delta\to 0^{+}$ , and exploiting the lower semicontinuity of $\mathcal{I}$ ,

\lim_{m\to\infty}\inf_{\mu\in\bar{\mathcal{A}}^{(m)}_{\delta}}\mathcal{I}(\mu)% =\inf_{\mu\in\bar{\mathcal{A}}}\mathcal{I}(\mu).

Finally, since $\mathcal{G}\big{(}\Phi_{\nu_{0}}(\mu)\big{)}=\mathcal{I}(\mu)$ , it holds that

\inf_{\mu\in\bar{\mathcal{A}}_{0}}\mathcal{I}(\mu)=\inf_{\mu\in\mathcal{A}}% \mathcal{G}(\mu)

We now turn to proving the lower bound. For $a>0$ , define

(176)

\mathcal{U}_{a}\subset\mathcal{M}(\mathcal{E}\times\mathbb{R}^{+})^{\Gamma% \times\Gamma}

to consist of all $\big{(}\mu_{\alpha\mapsto\beta}\big{)}_{\alpha,\beta\in\Gamma}$ such that $\mu_{\alpha\mapsto\beta}$ has a density $p_{\alpha\mapsto\beta}:\mathcal{E}\times\mathbb{R}^{+}\to\mathbb{R}$ such that there exist constants $0=b_{0}<b_{1}<\ldots<b_{a^{2}}=a$ such that for all $s,t\in[b_{i},b_{i+1})$ and $x,y\in\mathcal{E}$ ,

(177)

\big{|}p_{\alpha\mapsto\beta}(x,s)-p_{\alpha\mapsto\beta}(y,t)\big{|}\leq a|t-% s|+a\|x-y\|

and $b_{i+1}-b_{i}\geq a^{-1}$ .

Lemma 4.17.

For any $\mu\in\bigcup_{a>0}\mathcal{U}_{a}$ and $\nu_{0}\in\mathcal{P}(\Gamma\times\mathcal{E})$ , there exists a unique $\gamma\in\mathcal{M}(\mathcal{E}\times\mathbb{R}^{+})^{\Gamma\times\Gamma}$ such that $\Phi_{\nu_{0}}(\gamma)=\mu$ .

Proof 4.18.

Let $\mu_{\alpha\mapsto\beta}$ have density $p_{\alpha\mapsto\beta}:\mathcal{E}\times\mathbb{R}^{+}\to\mathbb{R}^{+}$ . We define $\gamma_{\alpha\mapsto\beta}$ to have density $\widetilde{p}_{\alpha\mapsto\beta}$ . By inspection, it must be that for all $t\in[b_{0},b_{1})$ , $x\in\mathcal{E}$ and $\alpha,\beta\in\Gamma$ ,

(178)

\displaystyle\widetilde{p}_{\alpha\mapsto\beta}(x,t)=p_{\alpha\mapsto\beta}% \big{(}x,\Lambda_{(\alpha,\beta),t}(x,\widetilde{p})\big{)}f_{(\beta)}(x,% \alpha,W_{t}(\widetilde{p},x))

where $W_{t}(\widetilde{p},x)=\big{(}W_{t,\zeta}(\widetilde{p},x)\big{)}_{\zeta\in\Gamma}$ , and

(179)

\displaystyle W_{t,\zeta}(\widetilde{p},x)=\mathbb{E}^{(\xi,y)\sim\nu_{0}}\big% {[}\chi\{\zeta=\xi\}\mathcal{J}(x,y)\big{]}+\sum_{\alpha\in\gamma}\int_{0}^{t}% \int_{\mathcal{E}}\mathcal{J}(x,y)\big{(}\widetilde{p}_{\alpha\mapsto\zeta}(y,% s)-\widetilde{p}_{\zeta\mapsto\alpha}(y,s)\big{)}dyds

and

(180)

\displaystyle\Lambda_{(\alpha,\beta),t}(x,\widetilde{p})=\int_{0}^{t}f_{(\beta% )}(x,\alpha,W_{s}(\widetilde{p},x))ds.

We are going to show that (i) there exists a mapping $\Gamma_{T}:$ such that

(181)

\displaystyle\widetilde{p}=\Gamma_{T}(\widetilde{p})

and that (ii) $\Gamma_{T}$ is contractive with respect to the supremum norm.

Observe that there is a constant $C$ such that

(182)

\displaystyle\sup_{\zeta\in\Gamma}\sup_{x\in\mathcal{E}}\sup_{t\leq T}\big{|}W% _{t,\zeta}(\widetilde{p},x)-W_{t,\zeta}(\hat{p},x)\big{|}\leq CT\sup_{\alpha,% \beta\in\Gamma}\sup_{x\in\mathcal{E}}\sup_{t\leq T}\big{|}\widetilde{p}_{% \alpha\mapsto\beta}(x,t)-\hat{p}_{\alpha\mapsto\beta}(x,t)\big{|}

Thus for small enough $T$ , a fixed point argument implies that there is a unique $\big{(}\widetilde{p}_{\alpha\mapsto\beta}(x,t)\big{)}_{t\leq T\fatsemi x\in% \mathcal{E}\fatsemi\alpha,\beta\in\Gamma}$ satisfying (201). This argument can then be iterated for increasing $T$ .

We now turn to proving the lower bound (116).

Lemma 4.19.

Suppose that $\mathcal{O}\subseteq\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{% \Gamma\times\Gamma}$ is open. Then

(183)

\displaystyle\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(% }\bar{\mu}^{N}\in\mathcal{O}\big{)}\geq-\inf_{\mu\in\mathcal{O}}\mathcal{G}(% \mu).

Proof 4.20.

If $\inf_{\mu\in\mathcal{O}}\mathcal{G}(\mu)=\infty$ , then the Lemma is immediate. Otherwise let $\mu$ be any member of $\mathcal{O}$ such that $\mathcal{G}(\mu)<\infty$ . We must show that

(184)

\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(}\bar{\mu}^{N% }\in\mathcal{O}\big{)}\geq-\mathcal{G}(\mu).

Since $\mathcal{G}(\mu)<\infty$ , it must be that for all $\alpha,\beta\in\Gamma$ , $\mu_{\alpha\mapsto\beta}$ has a density $p_{\alpha\mapsto\beta}$ , and that

(185)

\displaystyle\int_{\mathcal{E}}\int_{0}^{\infty}\ell\big{(}p_{\alpha\mapsto% \beta}(x,t)/\lambda_{(\alpha,\beta)}(x,t)\big{)}\lambda_{(\alpha,\beta)}(x,t)% dtd\mu_{leb}(x)<\infty.

Since $\ell\geq 0$ , $\lambda_{(\alpha,\beta)}(x,t)>0$ ,

(186)

\int_{\mathcal{E}}\int_{0}^{\infty}\ell\big{(}p_{\alpha\mapsto\beta}(x,t)/% \lambda_{(\alpha,\beta)}(x,t)\big{)}\lambda_{(\alpha,\beta)}(x,t)dtd\mu_{leb}(% x)\\ =\lim_{g\to\infty}\int_{\mathcal{E}}\int_{0}^{g}\chi\big{\{}p_{\alpha\mapsto% \beta}(x,t)\leq g\big{\}}\ell\big{(}p_{\alpha\mapsto\beta}(x,t)/\lambda_{(% \alpha,\beta)}(x,t)\big{)}\lambda_{(\alpha,\beta)}(x,t)dtd\mu_{leb}(x).

For $g>0$ , let $\mathcal{U}_{c,g}$ consist of all $q\in\mathcal{U}_{c}$ such that

(187)

\displaystyle\sup_{x,t}q(x,t)\leq g.

For some integer $c>0$ , and $g>0$ , let $p^{(c,g)}\in\mathcal{U}_{c,g}$ be such that

(188)

\sum_{\alpha,\beta\in\Gamma}\int_{\mathcal{E}}\int_{0}^{g}\big{|}p_{\alpha% \mapsto\beta}(x,t)-p^{(c,g)}_{\alpha\mapsto\beta}(x,t)\big{|}dtdx\\ =\inf\bigg{\{}\sum_{\alpha,\beta\in\Gamma}\int_{\mathcal{E}}\int_{0}^{g}\big{|% }q_{\alpha\mapsto\beta}(x,t)-p_{\alpha\mapsto\beta}(x,t)\big{|}dtdx\;,\;q\in% \mathcal{U}_{c,g}\bigg{\}}.

For $t\geq g$ , we stipulate that $p^{(c,g)}_{\alpha\mapsto\beta}(x,t)=f_{\beta}(x,\alpha,W^{(c,g)}_{t}(x))$ where $W^{(c,g)}_{t}(x)=\big{(}W^{(c,g)}_{t,\zeta}(x)\big{)}_{\zeta\in\Gamma}$ , and

(189)

\displaystyle W^{(c,g)}_{t,\zeta}(x)=\mathbb{E}^{(\xi,y)\sim\nu_{0}}\big{[}% \chi\{\zeta=\xi\}\mathcal{J}(x,y)\big{]}+\sum_{\alpha\in\gamma}\int_{0}^{t}% \int_{\mathcal{E}}\mathcal{J}(x,y)\big{(}p^{(c,g)}_{\alpha\mapsto\zeta}(y,s)-p% ^{(c,g)}_{\zeta\mapsto\alpha}(y,s)\big{)}dyds

and

(190)

\displaystyle\Lambda^{(c,g)}_{(\alpha,\beta),t}(x)=\int_{0}^{t}f_{(\beta)}(x,% \alpha,W^{(c,g)}_{s}(x))ds.

We note that $p^{(c)}_{\alpha\mapsto\beta}(x,t)$ is well defined for $t\geq c$ : it is the density that would result from the large $N$ limiting dynamics in Theorem 2.2. Since

\int_{\mathcal{E}}\int_{0}^{g}p_{\alpha\mapsto\beta}(t,x)dtdx<\infty,

it must be that

(191)

\lim_{c\to\infty}\sum_{\alpha,\beta\in\Gamma}\int_{\mathcal{E}}\int_{0}^{g}% \chi\big{\{}p_{\alpha\mapsto\beta}(x,t)\leq g\big{\}}\big{|}p_{\alpha\mapsto% \beta}(x,t)-p^{(c,g)}_{\alpha\mapsto\beta}(x,t)\big{|}dtdx=0.

(By definition, the Lebesgue integral is the limit of piecewise-constant approximations). Write $\mu_{\alpha\mapsto\beta}^{(c,g)}\in\mathcal{M}(\mathcal{E}\times\mathbb{R}^{+})$ to be the measure with density $p^{(c,g)}_{\alpha\mapsto\beta}$ . We therefore find that

(192)

\displaystyle\lim_{g\to\infty}\lim_{c\to\infty}\mathcal{G}\big{(}\mu^{(c,g)}% \big{)}=\mathcal{G}(\mu).

Furthermore

(193)

\displaystyle\lim_{g\to\infty}\lim_{c\to\infty}d\big{(}\mu^{(c,g)},\mu\big{)}=0.

Thus for large enough values of $c,g$ , we may assume that $\mu^{(c,g)}\in\mathcal{O}$ . Write $\gamma\in\bigcup_{a>0}\mathcal{U}_{a}$ to be such that

\Phi_{\nu_{0}}(\gamma^{(c,g)})=\mu^{(c,g)}.

Thanks to Lemma 4.21, there exists $\delta>0$ such that

(194)

\displaystyle\bigg{\{}\kappa\in\mathcal{M}(\mathcal{E}\times\mathbb{R}^{+})^{% \Gamma\times\Gamma}\;:\;\kappa=\Phi_{\nu_{0}}(\xi)\text{ for some }\xi\in B_{% \delta}(\gamma^{(c,g)})\bigg{\}}\subseteq\mathcal{O}.

It therefore follows from the Large Deviations Lower Bound in Theorem 4.1 that

	$\displaystyle\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(% }\bar{\mu}^{N}\in\mathcal{O}\big{)}\geq$	$\displaystyle\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(% }\grave{\mu}^{N}\in B_{\delta}(\gamma)\big{)}$
(195)		$\displaystyle\geq$	$\displaystyle-\inf_{\xi\in B_{\delta}(\gamma^{(c,g)})}\mathcal{I}(\xi)$
(196)		$\displaystyle\geq$	$\displaystyle-\mathcal{I}(\gamma^{(c,g)}).$

Furthermore, upon performing a change of variable,

(197)

\mathcal{I}(\gamma^{(c,g)})=\sum_{\alpha,\beta\in\Gamma}\int_{\mathcal{E}}\int% _{0}^{g}\ell\big{(}p^{(c,g)}_{\alpha\mapsto\beta}(x,t)/\lambda^{(c,g)}_{(% \alpha,\beta)}(x,t)\big{)}\lambda^{(c,g)}_{(\alpha,\beta)}(x,t)dtd\mu_{leb}(x)% \\ \to\mathcal{G}(\mu),

as $c,g\to\infty$ , thanks to (192). This implies the Lemma.

Lemma 4.21.

For any $\mu\in\cup_{a\geq 0}\mathcal{U}_{a}$ , and any $\epsilon,L>0$ , there exists $\delta>0$ such that if

(198)

\displaystyle d\big{(}\grave{\mu}^{N},\mu\big{)}\leq\delta

then

(199)

\displaystyle d\big{(}\bar{\mu}^{N},\Phi^{-1}_{\nu_{0}}(\mu)\big{)}\leq\epsilon.

Proof 4.22.

Let $a>0$ be such that $\mu\in\mathcal{U}_{a}$ . Recall that, by definition, $\mu$ must be piecewise Lipschitz over intervals of the form $[b_{i},b_{i+1})$ , with Lipschitz constant less than or equal to $a$ . We start by proving that for arbitrary $\grave{\epsilon}_{1}>0$ ,

(200)

\displaystyle d_{b_{1}}\big{(}\bar{\mu}^{N},\Phi^{-1}_{\hat{\nu}^{N}_{0}}(\mu)% \big{)}\leq\grave{\epsilon}_{1}.

Let $p_{\alpha\mapsto\beta}$ be the density of $\mu_{\alpha\mapsto\beta}$ . Write $\gamma^{N}=\Phi^{-1}_{\hat{\nu}^{N}_{0}}(\mu)$ and define the density of $\gamma^{N}_{\alpha\mapsto\beta}$ to be $\widetilde{p}^{N}_{\alpha\mapsto\beta}$ , i.e.

(201)

\displaystyle\widetilde{p}_{\alpha\mapsto\beta}(x,t)=p_{\alpha\mapsto\beta}% \big{(}x,\widetilde{\Lambda}^{N}_{(\alpha,\beta),t}(x,\widetilde{p})\big{)}f_{% \beta}(x,\alpha,\widetilde{W}^{N}_{t}(\widetilde{p},x))

where $\widetilde{W}^{N}_{t}(\widetilde{p},x)=\big{(}\widetilde{W}^{N}_{t,\zeta}(% \widetilde{p},x)\big{)}_{\zeta\in\Gamma}$ , and

	$\displaystyle\widetilde{W}^{N}_{t,\zeta}(\widetilde{p},x)=$	$\displaystyle\mathbb{E}^{\nu_{0}}\big{[}\chi\{\sigma_{0}=\zeta\}\mathcal{J}(x,% y)\big{]}+\sum_{\alpha\in\gamma}\int_{0}^{t}\int_{\mathcal{E}}\mathcal{J}(x,y)% \big{(}\widetilde{p}_{\alpha\mapsto\zeta}(y,s)-\widetilde{p}_{\zeta\mapsto% \alpha}(y,s)\big{)}dyds$
(202)		$\displaystyle\widetilde{\Lambda}^{N}_{(\alpha,\beta),t}(x,\widetilde{p})=$	$\displaystyle\int_{0}^{t}f_{(\beta)}(x,\alpha,\widetilde{W}^{N}_{s}(\widetilde% {p},x))ds.$

Now define

(203)		$\displaystyle\Lambda^{N}_{(\alpha,\beta),t}(x)=$	$\displaystyle\int_{0}^{t}f_{\beta}(x,\alpha,W^{N}_{s}(x))ds\text{ where }$
	$\displaystyle W^{N}_{s,\zeta}(x)=$	$\displaystyle N^{-1}\sum_{j\in I_{N}}\chi\{\sigma^{j}(0)=\zeta\}\mathcal{J}(x,% x^{j}_{N})+N^{-1}\sum_{\alpha\in\Gamma}\sum_{j\in I_{N}}\mathcal{J}(x,x^{j}_{N% })\big{(}Z^{j}_{\alpha\mapsto\zeta}(s)-Z^{j}_{\zeta\mapsto\alpha}(s)\big{)}$

and let $\hat{\gamma}^{N}_{\alpha\mapsto\beta}$ have density $\hat{p}^{N}_{\alpha\mapsto\beta}$ , which is such that for all $t\geq 0$ , $x\in\mathcal{E}$

(204)

\displaystyle\hat{p}^{N}_{\alpha\mapsto\beta}(x,t)=p_{\alpha\mapsto\beta}\big{% (}x,\Lambda^{N}_{(\alpha,\beta),t}(x)\big{)}f_{\beta}\big{(}x,\alpha,W^{N}_{t}% (x)\big{)}.

We first claim that for arbitrary $\widetilde{\epsilon},t>0$ , for all small enough $\delta$ it must be that

(205)

\displaystyle d_{t}\big{(}\hat{\gamma}^{N},\bar{\mu}^{N}\big{)}\leq\widetilde{\epsilon}

Indeed writing $\Upsilon^{N}_{(\alpha,\beta)}:\mathcal{E}\times\mathbb{R}^{+}\mapsto\mathcal{E% }\times\mathbb{R}^{+}$ to be the function-inverse of the function $(x,t)\to\Lambda^{N}_{(\alpha,\beta),t}(x)$ , it must be that for any bounded continuous function $h:\mathcal{E}\times\mathbb{R}^{+}\mapsto\mathbb{R}$ ,

(206)		$\displaystyle\mathbb{E}^{\hat{\gamma}^{N}_{\alpha\mapsto\beta}}[h]$	$\displaystyle=\mathbb{E}^{\mu_{\alpha\mapsto\beta}}\big{[}h\big{(}\Upsilon^{N}% _{(\alpha,\beta)}\big{)}\big{]}$
(207)		$\displaystyle\mathbb{E}^{\bar{\mu}^{N}_{\alpha\mapsto\beta}}[h]$	$\displaystyle=\mathbb{E}^{\grave{\mu}^{N}_{\alpha\mapsto\beta}}\big{[}h\big{(}% \Upsilon^{N}_{(\alpha,\beta)}\big{)}\big{]}.$

One easily checks that $\Upsilon^{N}_{(\alpha,\beta)}$ is differentiable-in-time, with derivative lower-bounded by $c:=c_{f}^{-1}$ . It therefore follows from the definition of the bounded-Lipschitz metric that

(208)

\displaystyle d_{t}\big{(}\hat{\gamma}^{N},\bar{\mu}^{N}\big{)}\leq d_{ct}\big% {(}\grave{\mu}^{N},\mu\big{)}.

We have therefore established (205). We write

(209)

\displaystyle\phi^{N}_{t}=\sup_{g\in\mathcal{C}(\mathcal{E}\times[0,t])}\big{|% }\mathbb{E}^{\hat{\gamma}^{N}}[g]-\mathbb{E}^{\bar{\mu}^{N}}[g]\big{|}

and we note that (thanks to (205)), for any $\hat{\epsilon}>0$ there must exist $\widetilde{\epsilon}>0$ such that as long as (205) is satisfied,

(210)

\displaystyle\phi^{N}_{t}\leq\hat{\epsilon}.

Now

(211)

\big{|}\widetilde{p}_{\alpha\mapsto\beta}(x,t)-\hat{p}^{N}_{\alpha\mapsto\beta% }(x,t)\big{|}\leq\\ \big{|}p_{\alpha\mapsto\beta}\big{(}x,\widetilde{\Lambda}^{N}_{(\alpha,\beta),% t}(x,\widetilde{p})\big{)}f_{\beta}(x,\alpha,\widetilde{W}^{N}_{t}(\widetilde{% p},x))-p_{\alpha\mapsto\beta}\big{(}x,\Lambda^{N}_{(\alpha,\beta),t}(x,)\big{)% }f_{\beta}(x,\alpha,\widetilde{W}^{N}_{t}(\widetilde{p},x))\big{|}\\ +\big{|}p_{\alpha\mapsto\beta}\big{(}x,\Lambda^{N}_{(\alpha,\beta),t}(x,)\big{% )}\big{(}f_{\beta}\big{(}x,\alpha,W^{N}_{t}(x)\big{)}-f_{\beta}\big{(}x,\alpha% ,\widetilde{W}^{N}_{t}(\widetilde{p},x)\big{)}\big{|}

Using the fact that (i) $|f_{\beta}(\cdot,\cdot,\cdot)|$ is upperbounded by $C_{f}$ , (ii) the time-derivative of $p$ is upperbounded by $a$ and (ii) $\big{|}\Lambda^{N}_{(\alpha,\beta),t}(x)\big{|}\leq tC_{f}$ , we obtain that there is a constant $C>0$ such that

	$\displaystyle\big{\|}\widetilde{p}_{\alpha\mapsto\beta}(x,t)-\hat{p}^{N}_{% \alpha\mapsto\beta}(x,t)\big{\|}\leq$	$\displaystyle C\big{\|}\Lambda^{N}_{(\alpha,\beta),t}(x)-\widetilde{\Lambda}^{N% }_{(\alpha,\beta),t}(x,\widetilde{p})\big{\|}+Ct\sup_{x\in\mathcal{E}}\big{\|}% \widetilde{W}^{N}_{t}(\widetilde{p},x)-W^{N}_{t}(x)\big{\|}$
(212)		$\displaystyle\leq$	$\displaystyle 2Ct\sup_{x\in\mathcal{E}}\big{\|}\widetilde{W}^{N}_{t}(\widetilde% {p},x)-W^{N}_{t}(x)\big{\|}$

Write

(213)

\displaystyle y^{N}_{t}=\sup_{x\in\mathcal{E}}\sup_{\alpha,\beta\in\Gamma}\big% {|}\hat{p}^{N}_{\alpha\mapsto\beta}(x,t)-\widetilde{p}^{N}_{\alpha\mapsto\beta% }(x,t)\big{|}

We next claim that there exists a constant $c>0$ such that for all $t\leq b_{1}$ ,

(214)

\displaystyle\sup_{\zeta\in\Gamma}\sup_{x\in\mathcal{E}}\big{|}\widetilde{W}^{% N}_{t,\zeta}(\widetilde{p},x)-W^{N}_{t,\zeta}(x)\big{|}\leq c\int_{0}^{t}y^{N}% _{s}ds+c\phi^{N}_{t}+\mathbb{E}^{(\sigma_{0},y)\sim\nu_{0}}\big{[}\chi\{\sigma% _{0}=\zeta\}\mathcal{J}(x,y)\big{]}.

Indeed we find that

(215)

\big{|}\widetilde{W}^{N}_{t,\zeta}(\widetilde{p},x)-W^{N}_{s,\zeta}(x)\big{|}% \leq\big{|}\mathbb{E}^{\nu_{0}}\big{[}\chi\{\sigma_{0}=\zeta\}\mathcal{J}(x,y)% \big{]}-N^{-1}\sum_{j\in I_{N}}\chi\{\sigma^{j}(0)=\zeta\}\mathcal{J}(x,x^{j}_% {N})\big{|}\\ \sum_{\alpha\in\gamma}\int_{0}^{t}\int_{\mathcal{E}}\mathcal{J}(x,y)\big{|}% \widetilde{p}_{\alpha\mapsto\zeta}(y,s)-\widetilde{p}_{\zeta\mapsto\alpha}(y,s% )-\hat{p}_{\alpha\mapsto\zeta}(y,s)+\hat{p}_{\zeta\mapsto\alpha}(y,s)\big{|}% dyds\\ +\bigg{|}N^{-1}\sum_{\alpha\in\Gamma}\sum_{j\in I_{N}}\mathcal{J}(x,x^{j}_{N})% \big{(}Z^{j}_{\alpha\mapsto\zeta}(s)-Z^{j}_{\zeta\mapsto\alpha}(s)\big{)}-\sum% _{\alpha\in\gamma}\int_{0}^{t}\int_{\mathcal{E}}\mathcal{J}(x,y)\big{(}\hat{p}% _{\alpha\mapsto\zeta}(y,s)-\hat{p}_{\zeta\mapsto\alpha}(y,s)\big{)}dyds\bigg{|% }\\ :=H_{1,\zeta}(x)+H_{2,\zeta}(t,x)+H_{3,\zeta}(t,x).

By definition

H_{1,\zeta}(x)=\big{|}\mathbb{E}^{(\sigma_{0},y)\sim\nu_{0}}\big{[}\chi\{% \sigma_{0}=\zeta\}\mathcal{J}(x,y)\big{]}-\mathbb{E}^{(\sigma_{0},y)\sim\hat{% \nu}_{0}}\big{[}\chi\{\sigma_{0}=\zeta\}\mathcal{J}(x,y)\big{]}\big{|}.

Since $\mathcal{J}(x,y)$ is bounded, we immediately see that there is a constant $c>0$ such that for all $x\in\mathcal{E}$ and all $\zeta\in\Gamma$ ,

H_{2,\zeta}(t,x)\leq c\int_{0}^{t}y^{N}_{s}ds.

Furthermore, by definition,

(216)

\displaystyle H_{3,\zeta}(t,x)\leq\phi^{N}_{t}.

We have thus established (214).

It now follows from (212) and (214) that for all $t\leq b_{1}$ ,

(217)

\displaystyle y^{N}_{t}\leq\rm{Const}\int_{0}^{t}\big{\{}y^{N}_{s}+\phi^{N}_{s% }\big{\}}ds.

Thanks to Gronwall’s Inequality,

\displaystyle y^{N}_{b_{1}}\leq\rm{Const}\exp\big{(}b_{1}\rm{Const}\big{)}\big% {(}\phi^{N}_{b_{1}}+\big{|}\mathbb{E}^{(\sigma_{0},y)\sim\nu_{0}}\big{[}\chi\{% \sigma_{0}=\zeta\}\mathcal{J}(x,y)\big{]}-\mathbb{E}^{(\sigma_{0},y)\sim\hat{% \nu}_{0}}\big{[}\chi\{\sigma_{0}=\zeta\}\mathcal{J}(x,y)\big{]}\big{|}\big{)}.

We have thus established (200), since by assumption

\big{|}\mathbb{E}^{(\sigma_{0},y)\sim\nu_{0}}\big{[}\chi\{\sigma_{0}=\zeta\}% \mathcal{J}(x,y)\big{]}-\mathbb{E}^{(\sigma_{0},y)\sim\hat{\nu}_{0}}\big{[}% \chi\{\sigma_{0}=\zeta\}\mathcal{J}(x,y)\big{]}\big{|}\to 0

uniformly as $N\to\infty$ .

One can then repeat this argument and find that

(218)

\displaystyle\lim_{N\to\infty}y^{N}_{b_{2}}\leq\epsilon_{b_{2}},

for arbitrarily small $\epsilon_{b_{2}}$ .

There are a finite number of intervals over which $\mu$ is Lipschitz. We can thus continue in this manner to obtain the Lemma.

Lemma 4.23.

For any $\epsilon>0$ ,

(219)

\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\bigg{(% }d\big{(}\Phi^{-1}_{\nu}(\grave{\mu}^{N}),\Phi^{-1}_{\hat{\nu}^{N}_{0}}(\grave% {\mu}^{N})\big{)}>\epsilon\bigg{)}=-\infty.

Lemma 4.24.

There exists a constant $c>0$ such that for all $T\geq 0$ , all $\nu,\widetilde{\nu}_{0}\in\mathcal{P}(\Gamma\times\mathcal{E})$ and all $\mu,\widetilde{\mu}\in\mathcal{M}(\mathcal{E}\times\mathbb{R}^{+})^{\Gamma% \times\Gamma}$ ,

(220)		$\displaystyle\sup_{\theta\in\mathcal{E}}\sup_{\alpha,\beta\in\Gamma}\big{\|}% \Lambda_{(\alpha,\beta),t}(\theta,\nu_{0},\mu)-\Lambda_{(\alpha,\beta),t}(% \theta,\widetilde{\nu}_{0},\widetilde{\mu})\big{\|}$	$\displaystyle\leq ctd_{t}(\mu,\widetilde{\mu})+ctd(\nu_{0},\widetilde{\nu}_{0})$
(221)		$\displaystyle\sup_{\theta\in\mathcal{E}}\sup_{\alpha,\beta\in\Gamma}\sup_{t% \leq TC_{f}}\big{\|}\Lambda^{-1}_{(\alpha,\beta,\nu_{0},\mu)}(\theta,t)-\Lambda% ^{-1}_{(\alpha,\beta,\widetilde{\nu}_{0},\widetilde{\mu})}(\theta,t)\big{\|}$	$\displaystyle\leq cT\sup_{t\in[0,T]}\big{\{}d_{t}(\mu,\widetilde{\mu})+d(\nu_{% 0},\widetilde{\nu}_{0})\big{\}}.$

Proof 4.25.

Write $(\nu_{t})_{t\geq 0}=\Psi(\nu_{0},\mu)$ and $(\widetilde{\nu}_{t})_{t\geq 0}=\Psi(\widetilde{\nu}_{0},\widetilde{\mu})$ , and

(222)		$\displaystyle W_{s,\zeta}(\theta)$	$\displaystyle=\mathbb{E}^{(\zeta,\widetilde{\theta})\sim\nu_{s}}\big{[}\chi\{% \zeta\}\mathcal{J}(\theta,\widetilde{\theta})\big{]}$
(223)		$\displaystyle\widetilde{W}_{s,\zeta}(\theta)$	$\displaystyle=\mathbb{E}^{(\zeta,\widetilde{\theta})\sim\widetilde{\nu}_{s}}% \big{[}\chi\{\zeta\}\mathcal{J}(\theta,\widetilde{\theta})\big{]}.$

Noting the definition in (155),

(224)

\displaystyle\sup_{\theta\in\mathcal{E}}\sup_{\zeta\in\Gamma}\sup_{s\in[0,t]}% \big{|}W_{s,\zeta}(\theta)-\widetilde{W}_{s,\zeta}(\theta)\big{|}\leq C_{% \mathcal{J}}\sup_{s\in[0,t]}d(\nu_{s},\widetilde{\nu}_{s}),

where we recall that $C_{\mathcal{J}}$ is such that

(225)		$\displaystyle\sup_{\theta\in\mathcal{E}}\big{\|}\mathcal{J}(\theta,\eta)-% \mathcal{J}(\theta,\beta)\big{\|}\leq$	$\displaystyle C_{\mathcal{J}}d(\eta,\beta)$
(226)		$\displaystyle\sup_{\theta,\widetilde{\theta}\in\mathcal{E}}\big{\|}\mathcal{J}(% \theta,\widetilde{\theta})\big{\|}\leq$	$\displaystyle C_{\mathcal{J}}.$

Since (by assumption) $f_{(\beta)}$ is uniformly bounded below and Lipschitz,

(227)

\displaystyle\sup_{\alpha,\beta\in\Gamma}\sup_{\theta\in\mathcal{E}}\big{|}f_{% (\beta)}(\theta,\alpha,W_{s}(\theta))-f_{(\beta)}(\theta,\alpha,\widetilde{W}_% {s}(\theta))\big{|}\leq C_{f}\sup_{\theta\in\mathcal{E}}\sup_{\zeta\in\Gamma}% \big{|}W_{s,\zeta}(\theta)-\widetilde{W}_{s,\zeta}(\theta)\big{|}.

Finally, the definition of $\Psi$ implies that

(228)

\displaystyle d(\nu_{s},\widetilde{\nu}_{s})\leq d(\nu_{0},\widetilde{\nu}_{0}% )+\sum_{\alpha,\beta\in\Gamma}d_{s}\big{(}\mu_{\alpha\mapsto\beta},\widetilde{% \mu}_{\alpha\mapsto\beta}\big{)}

The Lemma now follows from (153), (227) and (228).

Appendix A Large Deviations of the Uncoupled System

The Large Deviations of Poisson Random Fields has already been studied by numerous authors [53, 30, 41, 20]. Our system is similar, but not identical to the systems studied in these papers. The chief difference is that for a spatially-distributed Poisson Random Field over $\mathcal{E}$ , spikes can occur at any spatial location. However in our system, spikes can only occur at the spatial locations of the channels. The large $N$ limiting equations are identical however, since the channels are uniformly distributed over $\mathcal{E}$ . An additional novelty to our proof (beyond the proofs in [53, 30, 41, 20]) is that we obtain the Large Deviations for a slightly stronger topology.

As previously, let the $j^{th}$ channel be located at $x^{j}_{N}\in\mathcal{E}$ . We write $\{Y^{j}_{\alpha\beta}(t)\}_{\alpha\beta\in\Gamma}$ to be independent Poisson Processes of unit intensity. We define the empirical reaction flux $\grave{\mu}^{N}_{\alpha\mapsto\beta}\in\mathcal{M}\big{(}\mathcal{E}\times% \mathbb{R}^{+}\big{)}$ to be such that for any $A\in\mathfrak{B}(\mathcal{E})$ and an interval $[a,b]\subset\mathbb{R}^{+}$ ,

(229)

\displaystyle\grave{\mu}^{N}_{\alpha\mapsto\beta}\big{(}A\times[a,b]\big{)}=N^% {-1}\sum_{j\in I_{N}}\sum_{t\in[a,b]}\chi\big{\{}x^{j}_{N}\in A,Y^{j}_{\alpha% \beta}(t^{-})\neq Y^{j}_{\alpha\beta}(t)\big{\}}.

We write $\grave{\mu}^{N}=\big{(}\grave{\mu}^{N}_{\alpha\mapsto\beta}\big{)}_{\alpha,% \beta\in\Gamma}\in\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{% \Gamma\times\Gamma}$ .

Define the rate function $\mathcal{I}:\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{\Gamma% \times\Gamma}\to\mathbb{R}$ as follows. For $\mu\in\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{\Gamma\times\Gamma}$ , we stipulate that

(230)

\displaystyle\mathcal{I}(\mu)=\infty

if $\mu_{\alpha\mapsto\beta}$ is not absolutely continuous with respect to Lebesgue measure for some $\alpha,\beta\in\Gamma$ . Otherwise, we let $p_{\alpha\mapsto\beta}$ be the density of $\mu_{\alpha\mapsto\beta}$ , and define

(231)		$\displaystyle\mathcal{I}(\mu)=$	$\displaystyle\sum_{\alpha,\beta\in\Gamma}\int_{\mathcal{E}}\int_{0}^{\infty}% \ell\big{(}p_{\alpha\mapsto\beta}(x,t)\big{)}\rho(x)dtdx\text{ where }$
(232)		$\displaystyle\ell(a)=$	$\displaystyle a\log a-a+1.$

In the above expression, we recall that $\rho:\mathcal{E}\to\mathbb{R}^{+}$ is the density of the measure $\kappa\in\mathcal{P}(\mathcal{E}\times\mathbb{R})$ that $N^{-1}\sum_{j\in I_{N}}\delta_{x^{j}_{N}}$ converges to as $N\to\infty$ . Note also that $\ell(a)\geq 0$ . This means that the integral in (231) is well-defined (and could be $\infty$ ). We can now state a Large Deviation Principle for the uncoupled system.

Theorem A.1.

Let $\mathcal{A},\mathcal{O}\subseteq\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^% {+}\big{)}^{\Gamma\times\Gamma}$ be (respectively) closed and open. Then

(233)		$\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}% \grave{\mu}^{N}\in\mathcal{A}\big{)}$	$\displaystyle\leq-\inf_{\mu\in\mathcal{A}}\mathcal{I}(\mu)$
(234)		$\displaystyle\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(% }\grave{\mu}^{N}\in\mathcal{O}\big{)}$	$\displaystyle\geq-\inf_{\mu\in\mathcal{O}}\mathcal{I}(\mu).$

Furthermore, $\mathcal{I}$ is lower semicontinuous and has compact level sets.

A.1 Proof of Theorem A.1

Fix $T>0$ and write $\mathcal{X}_{T}=\mathcal{M}\big{(}\mathcal{E}\times[0,T]\big{)}$ . Our main result in this subsection is the following.

Lemma A.2.

Let $\mathcal{A},\mathcal{O}\subseteq\mathcal{M}\big{(}\mathcal{E}\times[0,t]\big{)}$ be (respectively) closed and open (with respect to the topology of weak convergence). Then for any $\alpha,\beta\in\Gamma$ ,

(235)		$\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}% \grave{\mu}^{N}_{\alpha\mapsto\beta}\in\mathcal{A}\big{)}$	$\displaystyle\leq-\inf_{\mu\in\mathcal{A}}\mathcal{I}_{T}(\mu)$
(236)		$\displaystyle\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(% }\grave{\mu}^{N}_{\alpha\mapsto\beta}\in\mathcal{O}\big{)}$	$\displaystyle\geq-\inf_{\mu\in\mathcal{O}}\mathcal{I}_{T}(\mu).$

Furthermore, $\mathcal{I}$ is lower semicontinuous and has compact level sets.

We first notice that Theorem A.1 is a corollary of Lemma A.2.

Proof A.3.

Write $\pi_{t}:\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{\Gamma\times% \Gamma}\to\mathcal{M}\big{(}\mathcal{E}\times[0,t]\big{)}^{\Gamma\times\Gamma}$ to be the projection of a measure onto its marginal upto time $t$ . Evidently $\pi_{t}$ is continuous.

By definition, the topology on $\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}\big{)}^{\Gamma\times\Gamma}$ is generated by open sets of the form, for some $t>0$ , any $\alpha,\beta\in\Gamma$ , and a continuous bounded function $h:\mathcal{E}\times[0,t]\to\mathbb{R}$ and $u,\delta\in\mathbb{R}$ ,

(237)

\displaystyle\big{\{}\mu\in\mathcal{M}\big{(}\mathcal{E}\times\mathbb{R}^{+}% \big{)}^{\Gamma\times\Gamma}\;:\;\big{|}\mathbb{E}^{\mu_{\alpha\mapsto\beta}}[% h]-u\big{|}<\delta.

Since the projection $\pi_{t}$ is continuous, the Dawson-Gartner Projective Limits Theorem [26, 28] implies that the Large Deviation Principle in (which holds for arbitrary $T>0$ ) implies the Large Deviations Principle in Theorem A.1, with rate function

(238)		$\displaystyle\mathcal{I}(\mu)=$	$\displaystyle\sum_{\alpha,\beta\in\Gamma}\sup_{T>0}\mathcal{I}_{T}(\mu_{\alpha% \mapsto\beta})$
(239)		$\displaystyle=$	$\displaystyle\sum_{\alpha,\beta\in\Gamma}\lim_{T\to\infty}\mathcal{I}_{T}(\mu_% {\alpha\mapsto\beta}),$

since $T\to\mathcal{I}_{T}$ is nondecreasing. One should also note that $\grave{\mu}^{N}_{\alpha\mapsto\beta}$ is independent of $\grave{\mu}^{N}_{\zeta\mapsto\xi}$ if either $\alpha\neq\zeta$ and / or $\beta\neq\xi$ . This means that the Large Deviations Rate functions can be summed.

Write $\Pi$ to be the set of all partitions of $\mathcal{E}\times[0,T]$ into a finite number of disjoint measurable sets, satisfying the following property. Any partition $\pi$ in $\Pi$ is assumed to be of the form

(240)	$\displaystyle\pi=$	$\displaystyle\big{\{}B_{i}\times\iota_{i}\}_{1\leq i\leq\|\pi\|}\text{ where }$
(241)	$\displaystyle\bigcup_{1\leq i\leq\|\pi\|}B_{i}\times\iota_{i}=$	$\displaystyle\mathcal{E}\times[0,T]$
(242)	$\displaystyle(B_{i}\times\iota_{i})\cap(B_{j}\times\iota_{j})=$	$\displaystyle\emptyset\text{ if }i\neq j$

where $\iota_{i}$ is an interval, and $B_{i}\subseteq\mathcal{E}$ has nonzero measure with respect to $\mu_{Leb}$ . For $\pi,\widetilde{\pi}\in\Pi$ , we write $\pi\leq\widetilde{\pi}$ whenever $\widetilde{\pi}$ is a subpartition of $\pi$ , i.e. for any $(B,\iota)\in\widetilde{\pi}$ there must exist $(C,\upsilon)\in\pi$ such that $(C,\upsilon)\subseteq(B,\iota)$ .

Let $\mathcal{T}_{T}$ be the topology on $\mathcal{X}_{T}$ , generated by the set of all open sets $\mathcal{O}_{\pi}$ of the following form: for a partition $\pi\in\Pi$ , and open sets $\{\mathcal{O}_{i}\}_{1\leq i\leq|\pi|}\subset\mathbb{R}^{+}$ ,

(243)

\displaystyle\mathcal{O}=\bigg{\{}\mu\in\mathcal{X}_{T}\;:\;\mu(B_{i}\times% \iota_{i})\in\mathcal{O}_{i}\text{ for each }1\leq i\leq|\pi|\bigg{\}}.

$(\mathcal{X}_{T},\mathcal{T}_{T})$ can be understood as a projective limit system in the sense of Section 4.6 of [28]. To see this, for any $\pi\in\Pi$ and $\mu\in\mathcal{X}_{T}$ , let $\mu_{\pi}\in\mathbb{R}^{|\pi|}$ denote the measures of all the sets in $\pi$ - i.e. $\mu_{\pi}=\big{(}\mu(B_{i}\times\iota_{i})\big{)}_{(B_{i}\times\iota_{i})\in\pi}$ . For $\pi,\widetilde{\pi}\in\Pi$ , with $\pi\leq\widetilde{\pi}$ , let $\mathfrak{P}_{\pi,\widetilde{\pi}}:\mathbb{R}^{|\widetilde{\pi}|}\to\mathbb{R}% ^{|\pi|}$ be the natural projection, i.e. for any $(C,\upsilon)\in\pi$ ,

(244)

\displaystyle\big{(}\mathfrak{P}_{\pi,\widetilde{\pi}}\mu\big{)}(C,\upsilon)=% \sum_{(B,\iota)\in\widetilde{\pi}:(B,\iota)\subseteq(C,\upsilon)}\mu(B,\iota).

Its easy to check that $\mathfrak{P}_{\pi,\widetilde{\pi}}:\mathbb{R}^{|\pi|}\to\mathbb{R}^{|% \widetilde{\pi}|}$ is continuous. Let $\widetilde{\mathcal{X}}_{T}\subset\prod_{\pi\in\Pi}\mathbb{R}_{+}^{|\pi|\times% |\Gamma|}$ be the subset of the product space satisfying (244). Standard measure theory dictates that $\widetilde{\mathcal{X}}_{T}$ can be identified with $\mathcal{X}_{T}$ (since by definition the measure is uniquely defined by the measure of the sets generating the $\sigma$ -algebra).

We metrize convergence in $\mathcal{X}_{T}$ as follows. Let $\{\pi^{(m)}\}_{m\geq 1}\subset\Pi$ be a sequence of partitions such that $\pi^{(m)}\leq\pi^{(m+1)}$ , and every set in $\pi^{(m)}$ of the form $(B^{(m)}_{i}\times\iota^{(m)}_{i})_{1\leq i\leq|\pi|}$ is such that the diameter of $B_{i}$ is less than or equal to $m^{-1}$ , and the Lebesgue Measure of $\iota_{i}$ is less than or equal to $m^{-1}$ . The metric is defined to be such that

(245)

\displaystyle\widetilde{d}_{T}(\mu,\nu)=\sum_{m=1}^{\infty}2^{-m}\sup_{B^{(m)}% _{i}\times\iota^{(m)}_{i}\in\pi^{(m)}}\big{|}\mu\big{(}B^{(m)}_{i}\times\iota^% {(m)}_{i}\big{)}-\nu\big{(}B^{(m)}_{i}\times\iota^{(m)}_{i}\big{)}\big{|}.

Write

(246)

\displaystyle\mathcal{E}_{t}=\mathcal{E}\times[0,t].

For any $\pi\in\Pi$ , define $\grave{\mu}^{(\pi),N}_{\alpha\mapsto\beta,\mathbf{Y}}\subset\mathbb{R}_{+}^{|% \pi|}$ to be such that

(247)

\displaystyle\grave{\mu}^{(\pi),N}_{\alpha\mapsto\beta,\mathbf{Y}}=\big{(}% \grave{\mu}^{(\pi),N}_{\alpha\mapsto\beta,\mathbf{Y}}(B\times\iota)\big{)}_{(B% \times\iota)\in\pi}

Let $\kappa_{T}\in\mathcal{M}(\mathcal{E}\times[0,T])$ be such that for any measurable subset $A\subset\mathcal{E}$ ,

(248)

\displaystyle\kappa_{T}(A\times[a,b])=(b-a)\int_{A}\rho(x)dx.

Define the rate function, for any $\pi\in\Pi$ ,

(249)	$\displaystyle\mathcal{I}_{T,\pi}$	$\displaystyle:\mathcal{X}_{T}\to\mathbb{R}_{+}\cup\infty$
(250)	$\displaystyle\mathcal{I}_{T,\pi}(\mu)$	$\displaystyle=\sum_{B\in\pi}\sup_{a\in\mathbb{R}}\big{\{}a\mu(\mathbf{x})-\nu_% {T}(\mathbf{x})\exp(a)+\nu_{T}(\mathbf{x})\big{\}}$
(251)		$\displaystyle=\sum_{B\in\pi}\kappa_{T}(B)\bigg{\{}-\frac{\mu(B)}{\kappa_{T}(B)% }+1+\frac{\mu(B)}{\kappa_{T}(B)}\log\left(\frac{\mu(B)}{\kappa_{T}(B)}\right)% \bigg{\}},$

and one obtains the second expression (251) from (250) by applying Calculus to compute the supremum. Note that in (251) (and throughout this paper) we interpret $0/0=1$ and $0\log 0=0$ .

Lemma A.4.

Let $\mathcal{A},\mathcal{O}\subseteq\mathbb{R}_{+}^{|\pi|}$ be (respectively) closed and open sets, with respect to the Euclidean topology. Then for any $\alpha,\beta\in\Gamma$ ,

(252)		$\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}% \grave{\mu}^{(\pi),N}_{\alpha\mapsto\beta,\mathbf{Y}}\in\mathcal{A}\big{)}\leq% -\inf_{\vec{a}\in\mathcal{A}}\mathcal{I}_{T,\pi}(\vec{a})$
(253)		$\displaystyle\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(% }\grave{\mu}^{(\pi),N}_{\alpha\mapsto\beta,\mathbf{Y}}\in\mathcal{O}\big{)}% \geq-\inf_{\vec{a}\in\mathcal{O}}\mathcal{I}_{T,\pi}(\vec{a})$

Furthermore $\mathcal{J}_{T,\pi}$ is lower-semi-continuous and convex.

Proof A.5.

Observe that $\big{(}\grave{\mu}^{(\pi),N}_{\alpha\mapsto\beta,\mathbf{Y}}(B)\big{)}_{B\in\pi}$ constitute $|\pi|$ independent homogeneous Poisson random variables. Write $\iota_{N}(B)$ to be the intensity of $\grave{\mu}^{(\pi),N}_{\alpha\mapsto\beta,\mathbf{Y}}(B)$ . The definitions imply that

(254)

\displaystyle\lim_{N\to\infty}N^{-1}\iota_{N}(B)=\kappa_{T}(B)

We therefore find that the logarithmic moment generating function $\Lambda:\mathbb{R}_{+}^{|\Gamma|\times|\pi|}\to\mathbb{R}$ takes the form, for constants $\vec{c}\in\mathbb{R}_{+}^{|\pi|}$ (written $\vec{c}=\big{(}c(B)\big{)}_{B\in\pi}$ )

(255)		$\displaystyle\Lambda(\vec{c})=$	$\displaystyle\lim_{N\to\infty}N^{-1}\log\mathbb{E}\bigg{[}\exp\bigg{(}b\sum_{B% \in\pi}\sum_{j\in I_{N}:x^{j}_{N}\in B,\sigma^{j}_{0}=\eta}c(B)\big{(}Y^{j}_{% \alpha\mapsto\beta}(b)-Y^{j}_{\alpha\mapsto\beta}(a)\big{)}\bigg{)}\bigg{]}$
(256)		$\displaystyle=$	$\displaystyle\sum_{B\in\pi}\kappa_{T}(B)\big{(}\exp(c(B))-1\big{)}.$

Observe that $\vec{c}\to\Lambda(\vec{c})$ is (i) non-infinite for all $\vec{c}$ , and (ii) smooth. The Large Deviation Principle is thus a consequence of Cramer’s Theorem [28].

Corollary A.6.

If $\widetilde{\pi}\leq\pi$ , then for any $\mu\in\mathcal{X}_{T}$ ,

(257)

\displaystyle\mathcal{I}_{T,\widetilde{\pi}}\big{(}\{\mu(B)\}_{B\in\widetilde{% \pi}}\big{)}\leq\mathcal{I}_{T,\pi}\big{(}\{\mu(B)\}_{B\in\pi}\big{)}.

Proof A.7.

Since the projection $\mathfrak{P}_{\pi\widetilde{\pi}}$ (defined in (244)) is continuous, an application of the Contraction Principle [28] to Lemma A.4 implies that

(258)

\displaystyle\mathcal{I}_{T,\widetilde{\pi}}\big{(}\{\mu(B)\}_{B\in\widetilde{% \pi}}\big{)}=\inf_{\pi\in\Pi:\widetilde{\pi}\leq\pi}\mathcal{I}_{T,\pi}\big{(}% \{\mu(B)\}_{B\in\pi}\big{)}.

This proves the corollary.

Now define the rate function $\mathcal{I}_{T}:\mathcal{X}_{T}\to\mathbb{R}^{+}$ ,

(259)

\displaystyle\mathcal{I}_{T}(\mu)=\sup_{\pi\in\Pi}\mathcal{J}_{T,\pi}\big{(}\{% \mu(B)\}_{B\in\pi}\big{)}.

We can now prove the general Large Deviation Principle.

Lemma A.8.

Let $\mathcal{A},\mathcal{O}\subseteq\mathcal{X}_{T}$ be (respectively) closed and open sets, with respect to the topology induced by the metric $\widetilde{d}_{T}$ . Then for any $\alpha,\beta\in\Gamma$ ,

(260)		$\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\big{(}% \grave{\mu}^{N}_{\alpha\mapsto\beta}\in\mathcal{A}\big{)}\leq-\inf_{\mu\in% \mathcal{A}}\mathcal{I}_{T}(\mu)$
(261)		$\displaystyle\underset{N\to\infty}{\underline{\lim}}N^{-1}\log\mathbb{P}\big{(% }\grave{\mu}^{N}_{\alpha\mapsto\beta}\in\mathcal{O}\big{)}\geq-\inf_{\mu\in% \mathcal{O}}\mathcal{I}_{T}(\mu)$

Proof A.9.

This follows after an application of the Dawson-Gartner Projective Limits Theorem [26] to Lemma A.4. See also the exposition in the textbook [28].

We next prove Lemma A.2.

Proof A.10.

Once can check that the topology $\mathcal{T}_{T}$ is a refinement of the topology of weak convergence on $\mathcal{M}(\mathcal{E}\times[0,T])$ . Indeed, one checks that if $\widetilde{d}_{T}(\mu_{n},\nu)\to 0$ , then necessarily for any bounded continuous function $h$ on $\mathcal{E}\times[0,T]$ (which must also be uniformly continuous), it holds that

\mathbb{E}^{\mu_{n}}[h]\to\mathbb{E}{\nu}[h].

The Lemma is therefore an immediate consequence of Lemma A.8.

To finish, we wish to obtain a more tractable form for the rate function $\mathcal{I}_{T}$ .

Lemma A.11.

If $\mathcal{I}_{T}(\mu)<\infty$ then $\mu$ must be absolutely continuous with respect to $\kappa_{T}$ . That is, there must exist measurable $p:\mathcal{E}\times[0,T]\to\mathbb{R}_{+}$ such that for any measurable $B\subset\mathcal{E}\times[0,T]$

(262)

\mu(B)=\kappa(\mathcal{E}\times[0,T])\int_{B}p_{\eta}(\theta,t)\kappa_{T}(d% \theta,dt).

Furthermore,

(263)

\displaystyle\mathcal{I}_{T}(\mu)=\int_{\mathcal{E}}\int_{0}^{T}\bigg{\{}1-% \frac{p(\theta,t)}{\rho(\theta)}+\frac{p(\theta,t)}{\rho(\theta)}\log\frac{p(% \theta,t)}{\rho(\theta)}\bigg{\}}\rho(\theta)dtd\theta.

Proof A.12.

Suppose that $\mathcal{I}_{T}(\mu)\leq L<\infty$ . It follows from Lemma A.13 that, as long as $\delta$ is sufficiently small, $\mu\in\mathcal{V}_{\epsilon,\delta}$ where $\mathcal{V}_{\epsilon,\delta}\subset\mathcal{X}_{T}$ is such that

(264)

\mathcal{V}_{\epsilon,\delta}=\bigg{\{}\mu\in\mathcal{X}_{T}\;:\;\text{For all% }B\in\widetilde{\mathcal{B}}\text{ such that }\kappa_{T}(B)\leq\delta,\;\text% { it holds that }\mu(B)\leq\epsilon\bigg{\}}.

It is then a standard result from real analysis [42, Section 7.3] that $\mu$ is absolutely continuous with respect to $\kappa_{T}$ . Let its density be $p:\mathcal{E}\times[0,T]\to\mathbb{R}$ .

Let $(\pi_{i})_{i\geq 1}\subset\Pi$ be any sequence of partitions such that

(265)

\displaystyle\lim_{i\to\infty}\mathcal{I}_{T,\pi_{i}}(\mu)=\mathcal{I}_{T}(\mu).

Its also assumed that the largest diameter of any set in $\pi_{i}$ goes to zero as $i\to\infty$ . This assumption is possible thanks to Corollary A.6: if one takes a sub-partition of a partition, the associated rate function cannot decrease.

Let $f_{i}:\mathcal{E}\times[0,T]\to\mathbb{R}^{+}$ be such that for each $B\in\pi_{i}$ , for all $(\theta,t)\in B$ ,

(266)

\displaystyle f_{i}(\theta,t)=\mu(B)/\kappa_{T}(B).

Write $\hat{\mathcal{F}}_{i}$ to be the $\sigma$ -algebra generated by the sets in $\pi_{i}$ . Observe that $f_{i}$ is a Radon-Nikodym derivative with respect to the $\sigma$ -algebra $\hat{\mathcal{F}}_{i}$ , and we will employ Levy’s Downwards Theorem to compute the limit as $i\to\infty$ . To this end, define the probability measure $\hat{\kappa}_{T}(B)=\kappa_{T}(B)/\kappa_{T}(\mathcal{E}\times[0,T])$ . With respect to the filtration $\big{(}\hat{\mathcal{F}}_{i}\big{)}_{i\geq 1}$ , $(f_{i})_{i\geq 1}$ is a $\hat{\kappa}_{T}$ -Martingale. Thanks to the Martingale Convergence Theorem, $\hat{\kappa}_{T}$ almost-surely,

(267)

\displaystyle\lim_{i\to\infty}f_{i}(\theta,t)=\frac{p(\theta,t)}{\rho(\theta)}

Since the function $a\mapsto a\log a-a+1$ is bounded, non-negative and continuous over finite intervals, we find that the expression in (251) converges to $\mathcal{I}_{T}(\mu)$ .

Lemma A.13.

For every $\epsilon,L>0$ , there exists $\delta>0$ such that

(268)

\displaystyle\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\bigg{(% }\text{For some }\alpha,\beta\in\Gamma,\;\grave{\mu}^{N}_{\alpha\mapsto\beta,% \mathbf{Y}}\notin\mathcal{V}_{\epsilon,\delta}\bigg{)}\leq-L.

Furthermore if $\mathcal{I}_{T}(\mu)\leq L<\infty$ , then for any $\epsilon>0$ there exists $\delta(\epsilon,L)$ such that $\mu\in\mathcal{V}_{\epsilon,\delta}$

Proof A.14.

For any $\epsilon>0$ , let $\Pi_{\epsilon}\subset\Pi$ consist of all partitions $\pi$ such that for every $\eta\in\Gamma$ and every $B\in\pi$ , $\kappa_{T}(\eta,B)\leq\epsilon$ . For any $n\in\mathbb{Z}^{+}$ , let $\pi^{(n)}$ be any particular partition in $\Pi_{n^{-1}}$ .

Define the set

(269)

\mathcal{L}_{n}=\bigg{\{}\mu\in\mathcal{X}_{T}\;:\;\text{ For any }\{B_{i}\}_{% i=1}^{m}\subseteq\pi^{(n)}\text{ such that }\sum_{i=1}^{m}\kappa_{T}(B_{i})% \leq 2\delta\\ \text{ it holds that }\sum_{i=1}^{m}\mu(B_{i})\leq\epsilon\bigg{\}}

Thanks to the Large Deviations estimate, writing $B=\cup_{i=1}^{m}B_{i}$ ,

(270)

\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\bigg{(}\grave{\mu}^% {N}_{\alpha\mapsto\beta}\notin\mathcal{L}_{n}\bigg{)}\leq-\inf_{\mu\notin% \mathcal{L}_{n}}\bigg{\{}\nu_{T}(\Gamma,B)-\mu(B)+\mu(B)\log\left(\frac{\mu(B)% }{\nu_{T}(B)}\right)\bigg{\}}

using the fact that for any $A\in\widetilde{\mathcal{B}}$ ,

-\mu(A)+\nu_{T}(A)+\mu(A)\log\left(\frac{\mu(A)}{\nu_{T}(A)}\right)\geq 0.

We thus find that

(271)

\underset{N\to\infty}{\overline{\lim}}N^{-1}\log\mathbb{P}\bigg{(}\grave{\mu}^% {N}_{\alpha\mapsto\beta}\notin\mathcal{L}_{n}\bigg{)}\leq-\bigg{\{}\delta-% \epsilon+\epsilon\log\bigg{(}\frac{\epsilon}{\delta}\bigg{)}\bigg{\}}:=-x(% \epsilon,\delta).

Now for any $\epsilon>0$ , $x(\epsilon,\delta)\to\infty$ as $\delta\to 0^{+}$ . In particular, we take $\delta$ small enough that $x(\epsilon,\delta)\geq L$ . Finally, for large enough $n$ , it must be that

(272)

\displaystyle\mathcal{L}_{n}\subseteq\mathcal{V}_{\epsilon,\delta}.

References

[1] Report-449-1995.
[2] Z. Agathe-Nerine, Multivariate hawkes processes on inhomogeneous random graphs, Stochastic Processes and their Applications, 152 (2022), pp. 86–148.
[3] A. Agazzi, L. Andreis, R. I. Patterson, and D. M. Renger, Large deviations for markov jump processes with uniformly diminishing rates, Stochastic Processes and their Applications, (2022).
[4] A. Agazzi, A. Dembo, and J. P. Eckmann, Large deviations theory for markov jump models of chemical reaction networks, Annals of Applied Probability, 28 (2018), pp. 1821–1855.
[5] L. J. Allen, A primer on stochastic epidemic models: Formulation, numerical simulation, and analysis, Infectious Disease Modelling, 2 (2017), pp. 128–142.
[6] D. Avitabile and J. Maclaurin, Neural fields and noise-induced patterns in neurons on large disordered networks, Arxiv 2408.12540v1, (2024).
[7] A.-L. Barabasi and R. Albert, Emergence of scaling in random networks, Mat. Res. Soc. Symp. Proc, 74 (1999), p. 677.
[8] G. Barbet, J. MacLaurin, and M. Silverstein, Large deviations of piecewise-deterministic-markov-processes with application to calcium signalling, SIAM Journal of Applied Mathematics (Submitted), (2023).
[9] P. Bernuzzi and T. Grafke, Large deviation minimisers for stochastic partial differential equations with degenerate noise, (2024).
[10] B. Bollobas, C. Borgs, J. Spencer, and G. Tusnady, The degree sequence of a scale-free random graph process, The degree sequence of a scale-free random graph process, 18 (2001).
[11] C. Borgs, J. T. Chayes, H. Cohn, and Y. Zhao, An lp theory of sparse graph convergence ii: Ld convergence, quotients and right convergence, The Annals of Probability, 46 (2018).
[12] , An lp theory of sparse graph convergence i: Limits, sparse random graph models, and power law distributions, Transactions of the American Mathematical Society, 372 (2019), pp. 3019–3062.
[13] J. Bramburger and M. Holzer, Pattern formation in random networks using graphons, SIAM Journal on Mathematical Analysis, 55 (2023), pp. 2150–2185.
[14] J. J. Bramburger, M. Holzer, and J. Williams, Persistence of steady-states for dynamical systems on large networks, (2024).
[15] P. Bremaud, Point Processes and Queues, Springer-Verlag, 1981.
[16] P. C. Bressloff, Spatiotemporal dynamics of continuum neural fields, Journal of Physics A: Mathematical and Theoretical, 45 (2012).
[17] P. C. Bressloff and J. M. Newby, Path integrals and large deviations in stochastic hybrid systems, Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 89 (2014), pp. 1–15.
[18] T. Britton and E. Pardoux, Stochastic Epidemic Models with Inference, Springer, 2019.
[19] Z. Brzeźniak, X. Peng, and J. Zhai, Well-posedness and large deviations for 2d stochastic navier–stokes equations with jumps, Journal of the European Mathematical Society, 25 (2023), pp. 3093–3176.
[20] A. Budhiraja, J. Chen, and P. Dupuis, Large deviations for stochastic partial differential equations driven by a poisson random measure, Stochastic Processes and their Applications, 123 (2013), pp. 523–560.
[21] A. Budhiraja and P. Dupuis, Analysis and Approximation of Rare Events, vol. 94, Springer, 2019.
[22] J. Chevallier and G. Ost, Fluctuations for spatially extended hawkes processes, Stochastic Processes and their Applications, (2020), pp. 1–33.
[23] F. Coppini, A. D. Crescenzo, and H. Pham, Nonlinear graphon mean-field systems, (2024).
[24] D. Daley and D. Vere-Jones, An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods, Second Edition, Springer, 2003.
[25] , An introduction to the theory of Point Processes. Volume 2: General Theory and Structure. Second Edition, Springer, 2008.
[26] D. A. Dawson and J. Gartner, Large deviations from the mckean-vlasov limit for weakly interacting diffusions, Stochastics, 20 (1987), pp. 247–308.
[27] S. Delattre, N. Fournier, and M. Hoffman, Hawkes processes on large networks, The Annals of Applied Probability, 26 (2016).
[28] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications 2nd Edition, Springer, 1998.
[29] P. Dupuis, K. Ramanan, and W. Wu, Large deviation principle for finite-state mean field interacting particle systems, Arxiv Preprint, (2016).
[30] D. Florens and H. Pham, Large deviation principle in nonparametric estimation of marked point processes, Statistics and Probability Letters, 41 (1999), pp. 383–388.
[31] N. Fournier and E. Löcherbach, On a toy model of interacting neurons, Annales de l’institut Henri Poincare (B) Probability and Statistics, 52 (2016), pp. 1844–1876.
[32] W. Gerstner, W. Kistler, R. Naud, and L. Paninski, Neuronal Dynamics From Single Neurons to Networks and Models of Cognition, Cambridge University Press, 2014.
[33] M. Goebel, M. S. Mizuhara, and S. Stepanoff, Stability of twisted states on lattices of kuramoto oscillators, Chaos, 31 (2021).
[34] T. Grafke, T. Schäfer, and E. Vanden-Eijnden, Sharp asymptotic estimates for expectations, probabilities, and mean first passage times in stochastic systems with small noise, Communications on Pure and Applied Mathematics, 77 (2024), pp. 2268–2330.
[35] A. G. Hawkes, Hawkes processes and their applications to finance: a review, 2 2018.
[36] M. Heymann and E. Vanden-Eijnden, The geometric minimum action method: A least action principle on the space of curves, Communications on Pure and Applied Mathematics, 61 (2008), pp. 1052–1117.
[37] P. Ji, Y. Wang, T. Peron, C. Li, J. Nagler, and J. Du, Structure and function in artificial, zebrafish and human neural networks, 7 2023.
[38] C. Kuehn and M. G. Riedler, Large deviations for nonlocal stochastic neural fields, Journal of Mathematical Neuroscience, 4 (2014), pp. 1–33.
[39] P. J. Laub, Y. Lee, and T. Taimre, The Elements of Hawkes Processes, Springer International Publishing, 1 2021.
[40] P. Lewis and G. Shedler, Simulation of nonhomogeneous poisson processes by thinning, Naval research logistics quarterly, (1978).
[41] R. S. Liptser and A. A. Pukhalskii, Limit theorems on large deviations for semimartingales, (2005).
[42] S. Lojasiewicz, An Introduction to the Theory of Real Functions, Wiley, 1988.
[43] L. Lovasz, Large Networks and Graph Limits, 2012.
[44] E. Lucon, Quenched asymptotics for interacting diffusions on inhomogeneous random graphs, Stochastic Processes and their Applications, (2020), pp. 1–52.
[45] J. MacLaurin and J. M. Newby, Extreme first passage times for populations of identical rare events, SIAM Journal of Applied Mathematics (Accepted for Publication), (2024).
[46] A. D. Masi, A. Galves, E. Löcherbach, and E. Presutti, Hydrodynamic limit for interacting neurons, Journal of Statistical Physics, 158 (2014), pp. 866–902.
[47] E. Pardoux and B. Samegni-Kepgnou, Large deviation principle for epidemic models, Source: Journal of Applied Probability, 54 (2017), pp. 905–920.
[48] R. I. Patterson and D. R. Renger, Large deviations of jump process fluxes, Mathematical Physics Analysis and Geometry, 22 (2019).
[49] L. Pellis, F. Ball, S. Bansal, K. Eames, T. House, V. Isham, and P. Trapman, Eight challenges for network epidemic models, Epidemics, 10 (2015), pp. 58–62.
[50] S. Riley, K. Eames, V. Isham, D. Mollison, and P. Trapman, Five challenges for spatial epidemic models, Epidemics, 10 (2015), pp. 68–71.
[51] S. Strogatz and D. Watts, Collective dynamics of ’small-world’ networks, Nature, 393 (1998).
[52] S. Tang, M. Tuerkoen, and H. Zhou, On the identifiability of nonlocal interaction kernels in first-order systems of interacting particles on riemannian manifolds, SIAM Journal on Applied Mathematics, 84 (2024), pp. 2067–2086.
[53] A. D. Wentzell, Limit theorems on large deviations for Markov stochastic processes, Kluwer Academic Publishers, 1990.
[54] Y. Xing and K. H. Johansson, Concentration in gossip opinion dynamics over random graphs, SIAM Journal on Control and Optimization, 62 (2024), pp. 1521–1545.
[55] R. Zakine and E. Vanden-Eijnden, Minimum-action method for nonequilibrium phase transitions, Physical Review X, 13 (2023).

	$\displaystyle\big{\|}\widetilde{p}_{\alpha\mapsto\beta}(x,t)-\hat{p}^{N}_{% \alpha\mapsto\beta}(x,t)\big{\|}\leq$	$\displaystyle C\big{\|}\Lambda^{N}_{(\alpha,\beta),t}(x)-\widetilde{\Lambda}^{N% }_{(\alpha,\beta),t}(x,\widetilde{p})\big{\|}+Ct\sup_{x\in\mathcal{E}}\big{\|}% \widetilde{W}^{N}_{t}(\widetilde{p},x)-W^{N}_{t}(x)\big{\|}$
(212)		$\displaystyle\leq$	$\displaystyle 2Ct\sup_{x\in\mathcal{E}}\big{\|}\widetilde{W}^{N}_{t}(\widetilde% {p},x)-W^{N}_{t}(x)\big{\|}$