hypothesisHypothesis \newsiamthmclaimClaim \newsiamremarkremarkRemark
Large Deviations of Hawkes Processes on Structured Sparse Random Graphs
Abstract
We prove a Large Deviation Principle for Hawkes Processes on sparse large disordered networks with a graphon structure. We apply our results to a stochastic epidemiological model on a disordered networks, and determine Euler-Lagrange equations that dictate the most likely transition path between different states of the network.
1 Introduction
We study the dynamics of high-dimensional Hawkes processes on networks with a particular average structure. Hawkes Processes are continuous-time jump-Markov processes whose intensity function is itself stochastic [24, 25, 39]. Applications include high-dimensional spiking neuron models [46, 31, 22], epidemics on structured populations [49, 50, 5], sociological models [54], mathematical finance [35], machine learning [52] and various other population dynamics models. In all of the above applications, a central aim is to understand how the combination of stochasticity and network structure shapes the resulting dynamics [37].
There has been much recent effort directed towards determining deterministic ‘neural field’ equations to describe the large limiting behavior of spatially-extended Hawkes Processes [27, 22, 2, 23, Baars2024]. In particular, a recent emphasis has been to understand how the graphon determines pattern formation [33, 13, 14] and other coherent structures. This paper builds on these works by determining a Large Deviation Principle. The Large Deviation Principle determines an asymptotic estimate for the probability of a deviation from the large limiting behavior. It is useful for estimating large-scale transitions of the system induced by rare finite-size fluctuations of the system [17, 55, 34, 45]. Early work on the Large Deviations of neural fields has been done by Kuehn and Riedler [38]. This paper parallels a recent preprint [6] by the authors of this paper, that determines neural field equations, and a Large Deviation Principle, for rate neurons on a compact space.
There exists a well-developed literature for the Large Deviations of stochastic processes driven by Poisson Random Measures [21]. (Note that Hawkes Processes can be represented as a double-time integral with respect to a standard Poisson Random Measure on [40].) Most treatments of the Large Deviations of spatially-distributed systems driven by Poisson Random Measures concern PDEs perturbed by a spatially-distributed Poisson Random Measure [20, 19]. Our system differs from these papers because the noise can only occur at the locations of the nodes of the graph. Since the nodes are approximately uniformly distributed throughout the spatial domain, in the large size limit the Large Deviations rate function becomes an integral over of a Lagrangian function.
On a related note, there has been much recent interest in the Large Deviations of Chemical Reaction Networks [48, 8]. These are high-dimensional jump Markovian Processes, just as in this paper, but without the spatial extension. Recent works include those of Dupuis, Ramanan and Wu [29], Pardoux and Samegni-Kepgnou [47], Agazzi, Eckmann and Dembo [4], and Patterson and Renger [48]. Patterson and Renger [48] and Agazzi, Patterson, Renger and [3] prove the Large Deviations Principle in a general setting by also studying the convergence of the reaction fluxes (like in this paper).
To this end, we consider the large dynamics of jump-Markov processes on an inhomogeneous network (i.e. a graph, with edges and nodes). The ‘agents’ correspond to the nodes of the graph. The state of the agent is written as , and it takes on values in a finite state-space . For epidemiological applications [18], one typically takes (susceptible, infected, recovered) or just . For neuroscience applications [32], one might take (i.e. spiking or non-spiking).
The population is assumed to exist on a disordered static network. We make minimal assumptions on the network; in particular, we do not require that the edges are sampled from a probability distribution. Our main requirement is that the typical number of edges connected to each vertex asymptotes to as , and that the edge connectivity resembles that of a ‘graphon’. It is already known that these conditions ensure that the large limiting dynamics resembles the all-to-all connectivity case [44, 2]. One way to generate the graphon structure is to sample the graph randomly from a probability distribution known as a W-random graph [12, 11]. Essentially this means that the connections are sampled independently, where the probability of a connection is a function of the locations of the afferent vertices. The formalism is flexibile enough to accommodate a wide range of models, including a power-law model (often considered a paradigm for populations with a clustered social structure), and populations with a geometric spatially-distributed (i.e. the probability of a connection correlates with the geometric distance between the people).
Important cases covered by the -random-graph formalism include the following:
-
•
Sparse Power Law Graphs: these were originally defined in the seminal paper by Barabasi and Albert [7], and further developed by Bollobas et al [10]. In the original paper, Barabasi and Albert [7] constructed this graph iteratively, by successively adding vertices, and then connecting them, with the probability of a connection being proportional to the existing degree of the node. It was shown by [11] that one obtains an asymptotically excellent approximation to a power law graph in the -random graph formalism, as long as one chooses , uniformly distributed over , and for some parameters such that .
- •
-
•
Small World Graphs were first defined in the seminal paper of Watts and Strogatz [51]. These Graphs are constructed by taking a ring of nearest-neighbor-connected vertices, and then randomly reassigning some edges.
1.1 Notation
The particles are indexed by . If is a Polish Space, let denote the space of all Borel measures on . Let denote the space of all measures with total mass of one. We endow both of these spaces with the topology of weak convergence: i.e. the topology generated by open sets of the form, for a continuous bounded function , and ,
The Wasserstein distance on is defined to be
(1) |
and the infimum is over all couplings of and . Write to denote the Borel sigma algebra. Let denote the Skorohod space of all cadlag functions.
Define the following metric on :
(2) |
where the supremum is taken over all functions that are (i) Lipschitz with Lipschitz constant less than or equal to and (ii) such that .
Write
to be the projection of a measure onto the measure upto time . We also naturally consider to be a semi-metric on , as follows
(3) |
Finally, define
(4) |
We endow with the cylinder topology generated by open sets of the form, for open in ,
(5) |
2 Model Outline and Main Result
2.1 Geometry of the Connectivity
We make general assumptions on the geometry of the connectivity. These assumptions will hold in a variety of circumstances. The nodes of the graph are assigned positions on a compact smooth Riemannian Manifold . We let denote the (normalized) volume measure on . The position of particle is denoted (a non-random constant that depends on too), and we write . Write the empirical measure of initial conditions to be
(6) |
-
1.
It is assumed that converges weakly to some as .
-
2.
It is assumed that has a smooth density with respect to .
-
3.
It is assumed that is metrizable, with a metric .
-
4.
It is assumed that is compact.
The strength of connection from particle to particle is denoted . The connectivity can be both excitatory and inhibitory; and symmetric or asymmetric. We assume that the connectivity converges to a ‘graphon’-type structure as [43], although it must be emphasized that we do not per se require that it is sampled from a probability distribution. These assumptions are broadly similar to those made by Lucon [44] in treating the large limiting dynamics. {hypothesis} (i) We assume that there is a sequence that increases to , and a constant such that for all ,
(7) | ||||
(8) |
(ii) It is assumed that there exists a function such that
(9) |
where
(10) |
(iii) Finally, we assume that is uniformly Lipschitz in both arguments, i.e. for all
(11) | ||||
(12) |
We next note that these assumptions are guaranteed to be satisfied if the edges are sampled independently from a distribution whose probability varies continuously over .
Lemma 2.1.
Suppose that are -valued random variables, and that they are either (i) mutually independent or (ii) independent, except that . Suppose also that there are continuous functions such that
(13) | ||||
(14) | ||||
(15) |
Suppose that the scaling is such that for any positive constant ,
(16) |
Then Hypothesis (9) is satisfied.
A proof is provided in [6].
2.2 Stochastic Transitions
The transitions of the states are taken to be Poissonian and to assume the following mean-field form. For each , there must exist a function
(17) |
such that for and ,
(18) |
Here and is such that
(19) |
For alll , we write
(20) |
Define the empirical occupation measure at time , to be such that
(21) |
The only assumption on the initial conditions is that converges weakly to a limit , as noted in the following hypothesis. {hypothesis} We assume that there is a measure such that
(22) |
It must be emphasized that the initial conditions can be dependent on the specific choice of the connectivity. It is well-known that a Large Deviation Principle for Poisson Random Measures may not be possible if the intensity function hits zero [48, 3]. Hence we make the following assumptions. {hypothesis} Its assumed that is has strictly positive upper and lower bounds, i.e. there are constants such that for all ,
(23) |
Its also assumed that is globally Lipschitz, i.e. for all and all ,
(24) |
A key object used to study the large behavior of the system is the empirical reaction flux [48]. The empirical reaction flux is defined to count the total number of transitions over a specified time interval (and scaled by ), i.e. in the case that for a measurable subset and ,
(25) |
We stipulate that always. Finally we denote the empirical process
(26) | ||||
(27) |
The joint state space for the empirical reaction flux, and the empirical measure is denoted by
(28) |
2.3 Main Results
We first prove that the empirical reaction flux and the empirical occupation measure both concentrate in the large limit. (This is basically already known [2, 23]).
Theorem 2.2.
There exists unique to which the system converges as . Write the density of to be , i.e. for any ,
(29) |
The density evolves as
(30) | ||||
(31) | ||||
(32) |
Furthermore the limits of the empirical reaction fluxes are given by
(33) | ||||
(34) |
We next state the Large Deviation Principle for the empirical reaction fluxes. We first define the rate function:
(35) |
For , we stipulate that
(36) |
in the case that for some with , does not have a density (with respect to ). If has nonzero measure for some , then we also stipulate that . Otherwise, writing to be the density, i.e. the function such that
(37) |
we define
(38) |
Here,
(39) | ||||
(40) | ||||
(41) | ||||
(42) | ||||
(43) |
The main theoretical result of this paper is the following theorem.
Theorem 2.3.
Let be (respectively) closed and open. Then
(44) | ||||
(45) |
Furthermore, is lower semicontinuous and has compact level sets.
We next note that the LDP holds for arbitrary stopping times.
Corollary 2.4.
Suppose that the dynamics is identical to the previous section, except that now the initial conditions are arbitrary constants (and we don’t require Hypothesis 2.2 either). Let be any stopping time such that there is a positive sequence such that and a measure such that with unit probability
(46) |
Then Theorem 2.2 and Theorem 2.3 hold true if we take as initial conditions . (We emphasize in particular that could diverge to as ).
2.4 Contracted Rate Function
In this section, we determine the structure of the Large Deviation rate function for the empirical occupation measure only. One easily checks that the empirical occupation measures can be obtained by applying a continuous transformation to the empirical reaction fluxes. This is noted in the following Lemma.
Lemma 2.5.
For and , define to be such that, writing , for any , and any ,
(47) |
For each and , is continuous. Also is continuous for any . Write . Furthermore with unit probability, for all ,
(48) |
The proof of Lemma 2.5 follows almost immediately from the definitions and is neglected.
We can now define the contracted rate function:
(49) | ||||
(50) |
and we recall that is the limit of the empirical occupation measure at time .
Corollary 2.6.
Let be (respectively) closed and open. Then
(51) | ||||
(52) |
Furthermore, is lower semicontinuous and has compact level sets.
Proof 2.7.
We desire a more workable definition of . To this end, write
(53) |
to consist of all such that (i) is continuous and (ii) there exist functions (the set of all functions that are integrable with respect to upto finite times) such that for all ,
(54) | ||||
(55) |
For any , define the set
(56) |
and define the function to be
(57) |
Lemma 2.8.
If , then
(58) |
Otherwise
(59) | ||||
(60) | ||||
(61) |
Furthermore, is strictly convex in its first argument.
3 An Application: Transition Paths For Hawkes Models in Epidemiology and Neuroscience
We consider a simple stochastic SIS model for a structured population. There is certainly much work that as determined the large limiting dynamics for these models [18]. However to the knowledge of these authors, there does not exist a spatially-extended large Deviation Principle in the manner of this paper. The computation of optimal transition paths for spatially extended systems has become of increasing interest in recent years [9].
We first outline a model of people on a structured network. The nodes of the network reside in a domain . The position of the person is , and their state is (i.e. susceptible or infected). The probability of a positive connection from is : i.e. , and . We assume symmetric connections, so that . We also assume that is piecewise continuous.
The probability that a susceptible person transitions to being infected over the time interval (For ) is, for a positive parameter ,
(62) |
The probability that an infected person transitions back to being susceptible over a time interval is constant, i.e. for some it is
(63) |
Write to represent the proportion of susceptible people at position at time in the large limit, and let be the proportion of infected people. Since these are the only two possibilities, it must be that
(64) |
Lets first write out the large limiting dynamics. This is non-stochastic, and such that
(65) |
The Large Deviations Rate function governing the proportion of susceptible people is assumes the following form. The time derivative of is written as .
The rate function assumes the form
(66) |
where for any , we define
is defined to be such that
(67) | ||||
(68) | ||||
(69) | ||||
(70) |
We note that is uniquely minimized for the large limiting dynamics, i.e. if and only if
In computing the Large Deviations rate function for trajectories that differ from the above, we are trying to understand the relative likelihood of rare noise-induced events that differ from the above dynamics.
It turns out that the infimum in (67) is uniquely realized. We note this in the following lemma.
Lemma 3.1.
For and , and any ,
(71) |
where is such that
(72) |
Proof 3.2.
Fixing , this is effectively a 1d optimization problem (fixing ) of the function
with domain . Since is convex, it must be that is convex, and the infimum must occur at points such that . We differentiate and find that the optimal must be such that
(73) |
This means that
(74) |
and therefore
(75) |
Since , the only valid root is
(76) |
Lemma 3.3.
For every and , the function is strictly convex.
Proof 3.4.
First, it is proved in Lemma 3.1 that the infimum in (67) is always realized at a unique . Consider and suppose that for some , . Let be the (respective) values of that realize the infimum, i.e. they are such that
(77) | ||||
(78) | ||||
(79) | ||||
(80) |
Write , and notice that . If we substitute into the RHS of (67), then since the function is strictly convex,
(81) |
3.1 Euler-Lagrange Equations for the Optimal Trajectory
Fix an initial distribution of population and a final population distribution . Assume that
(82) | ||||
(83) |
Our main result in this section is that any optimal trajectory must satisfy the following Euler-Lagrange equations. Unfortunately, in general there will not be a unique solution to these equations. See for instance [36, 55] for more details on how to compute the optimal path numerically.
Theorem 3.5.
Suppose that is such that
(84) |
and for each , is twice continuously differentiable, with first and second derivatives written (respectively) as and . Any minimizer must satisfy the second-order integro-differential equation, for all and ,
(85) |
and are bounded nonlocal smooth operators defined in the course of the proof. Furthermore (since is convex in its first argument)
(86) |
Lemma 3.6.
There is a unique satisfying (84). The optimal trajectory is such that at each ,
(87) |
where
(88) |
is defined to be such that for any ,
(89) |
Proof 3.7.
The fact that the infimum is realized follows from the fact that is lower semi-continuous. The identity in (87) is a standard result from Calculus of Variations.
We now compute an expression for . To this end, let be the Frechet Derivative of in the direction , i.e.
(90) |
and let be the Frechet Derivative of in the direction , i.e.
(91) |
We next compute the partial derivatives with respect to .
Lemma 3.8.
(92) | ||||
(93) | ||||
(94) | ||||
(95) |
Lemma 3.9.
(96) |
where are such that
(97) |
and
(98) |
Proof 3.10.
Differentiating, we find that
(100) | ||||
(101) | ||||
(102) |
It remains to find a convenient expression for .
Lemma 3.11.
where
(103) | ||||
(104) |
4 Proofs
There are two main steps to our proof of Theorem 2.3. The first step is to show that the system can be approximated very well by a system with averaged interactions. The next step is to prove the Large Deviation Principle for the system with averaged interactions (this is Theorem 4.2). The main result of this paper (Theorem 2.3) will follow from these results thanks to [28, Theorem 4.2.13].
4.1 Proof Outline
Our proof proceeds by transforming the Large Deviations of the uncoupled system to the Large Deviations of the averaged system through a time-rescaling transformation. Lets first outline the Large Deviations for the uncoupled system.
Let be independent Poisson Processes of unit intensity. We define the empirical reaction flux to be such that for any and an interval ,
(105) |
We write . Define the rate function as follows. For , if there exists such that is not absolutely continuous with respect to Lebesgue measure then define . Otherwise, writing to be the density of , define
(106) | ||||
(107) |
Note that . This means that the integral in (231) is well-defined (and could be ). We can now state a Large Deviation Principle for the uncoupled system.
Theorem 4.1.
Let be (respectively) closed and open. Then
(108) | ||||
(109) |
Furthermore, is lower semicontinuous and has compact level sets.
4.2 System with Averaged Interactions
We next define an approximate process with ‘averaged’ interactions. Let be a system of jump-Markov Processes such that, for , and ,
(110) |
where and
(111) |
We take the initial conditions to be the same for the two systems, i.e. . Later on, in the proofs, it will be useful to represent as a time-rescaled version of the uncoupled system. To this end, define to ‘count’ the number of transitions in the coupled system, i.e. be such that
(112) |
and for any ,
(113) | ||||
(114) |
Since (with unit probability) only makes a finite number of jumps over a bounded time interval, one easily checks that there exists a unique satisfying (112) - (114).
Theorem 4.2.
Let be (respectively) closed and open. Then
(115) | ||||
(116) |
Furthermore, is lower semicontinuous and has compact level sets.
This will be proved further below, in Section 4.3.
We define the empirical reaction flux to be such that for any and an interval ,
(117) |
We write .
We next use Girsanov’s Theorem to compare the Large Deviations in our main result (Theorem 2.3) to the Large Deviation Principle in Theorem 4.2. Let be the probability law of . Let be the probability law of the original system . Thanks to Girsanov’s Theorem [15],
(118) |
where
(119) |
In the following lemma we prove that the Girsanov Exponent is with very high probability uniformly upperbounded.
Lemma 4.3.
For any ,
(120) |
We can now state the proof of our main result, Theorem 2.3.
Proof 4.4.
Let
Starting with the upper bound, let be closed. Then for any ,
(121) |
The second term on the RHS is , thanks to Lemma 4.3. It thus suffices that we demonstrate that
(122) |
Now thanks to the Girsanov Expression in (118)
(123) | ||||
(124) |
Turning to the lower bound, let be open, we find that for any ,
(125) |
thanks to (118). Now
(126) |
and since , it must hold that
(127) |
Taking , we obtain that
(128) |
as required.
We next prove Lemma 4.3.
Proof 4.5.
It suffices to demonstrate the following three inequalities
(129) | ||||
(130) | ||||
(131) |
The demonstration of (131) is very similar to that of (130) and will be neglected.
For each , it follows from the fact that is Lipschitz that there is a constant such that
(132) |
Furthermore Assumption 2.1 implies that there must exist a non-random sequence that decreases to and such that
(133) |
Once is large enough that , (129) must hold.
Turning to (130), since is (i) Lipschitz and (ii) uniformly lower-bounded by a positive constant and (iii) uniformly upperbounded, there exists a constant such that (for the constant defined in Assumption 2.1)
(134) |
Thanks to Chernoff’s Inequality, for a constant ,
(135) |
We next claim that for arbitrarily large
(136) |
Now Assumption 2.1 implies that there exists a non-random constant such that with unit probability. We thus obtain that, for small enough that for all , ,
(137) | ||||
(138) |
Assumption 2.1 implies that
(139) | ||||
(140) |
We have thus established (136).
For a reaction , define the empirical flux measure for the averaged system to be such that, for measurable and a time interval ,
(142) |
Writing , define the set
(143) |
Lemma 4.6.
There exists such that for all ,
(144) |
Furthermore is compact.
Proof 4.7.
Using a union of events bound,
(145) |
For a positive integer , and , thanks to Chernoff’s Inequality,
(146) |
Thanks to the inequality ,
(147) |
We choose and , and we obtain that
(148) |
Taking , we obtain the lemma. Since is compact, the compactness of is immediate from Prokhorov’s Theorem.
4.3 Large Deviations of the Averaged System
We prove a Large Deviation Principle for the system with averaged interactions. Recall that is the empirical reaction flux for the system with averaged interactions (142). Our first step is to prove the upper bound of Theorem 4.2. Our method is to show that the empirical reaction flux of the driving Poisson Processes can be written as an almost-continuous transformation of the empirical reaction flux associated to the coupled system.
We start with the upper bound. We are going to show that there exists a measurable map such that, with unit probability,
(149) |
Furthermore will have the useful property that, with very high probability, it can be approximated extremely well by a continuous function , which we now define.
For a positive integer , let be disjoint sets such that (i) , (ii) the interior of is nonempty and
(150) | ||||
(151) |
Let be any point in . Next, define
(152) |
as follows. For any and write
and then define for any and any ,
(153) | ||||
(154) | ||||
(155) |
For any , , we write to be such that
(156) |
Write to be the inverse-function of . This exists (and is continuously-differentiable with respect to time) because (by assumption) is uniformly bounded away from zero.
We now define as follows. For any and a time interval , we stipulate that
(157) |
We obtain the following property.
Lemma 4.8.
is uniquely well-defined for any . Furthermore, is continuous in both of its arguments, as long as is endowed with the topology defined in the Appendix..
The proof follows almost immediately from the definitions.
Lemma 4.9.
For any , there exists such that for all ,
(158) |
Proof 4.10.
Let and be arbitrary. Write to consist of all such that for any set whose diameter is less than , and any sub-interval with and ,
(159) | ||||
(160) |
For , , , , define
(161) |
We next claim that for any , there exists such that as long as , and writing , it must be that
(162) |
Indeed (162) will hold as long as is big enough that
(163) |
which is possible because is uniformly continuous. Indeed (163) implies that for any , then necessarily
(164) |
and therefore (162) holds. For any , through taking to be sufficiently small, it must therefore hold that as long as , for all ,
(165) |
We next define
(166) |
as long as the limit exists, where is an increasing sequence such that for all ,
(167) |
(It has just been proved in Lemma 4.9 that the sequence exists).
Lemma 4.11.
is identically distributed (in probability law) to .
Proof 4.12.
This follows from the time-rescaled representation of the averaged system in (112): with this representation, .
Lemma 4.13.
For any ,
(168) |
Proof 4.14.
Thanks to (LABEL:eq:_to_prove_lipschitz_Lambda), for any ,
(169) |
uniformly as ,
We can now prove the upper bound in Theorem 4.2.
Lemma 4.15.
Let be closed. Then
(170) |
Furthermore, is lower semicontinuous and has compact level sets.
Proof 4.16.
Thanks to Lemma 4.11,
(171) |
Define the event
(172) |
Furthermore for the integer ,
(173) |
where is the closed -blowup of . Thanks to Lemma 4.13,
(174) |
Since is closed, the Large Deviations of the uncoupled system in Theorem 4.1 implies that
(175) |
Taking , and exploiting the lower semicontinuity of ,
Finally, since , it holds that
We now turn to proving the lower bound. For , define
(176) |
to consist of all such that has a density such that there exist constants such that for all and ,
(177) |
and .
Lemma 4.17.
For any and , there exists a unique such that .
Proof 4.18.
Let have density . We define to have density . By inspection, it must be that for all , and ,
(178) |
where , and
(179) |
and
(180) |
We are going to show that (i) there exists a mapping such that
(181) |
and that (ii) is contractive with respect to the supremum norm.
Observe that there is a constant such that
(182) |
Thus for small enough , a fixed point argument implies that there is a unique satisfying (201). This argument can then be iterated for increasing .
We now turn to proving the lower bound (116).
Lemma 4.19.
Suppose that is open. Then
(183) |
Proof 4.20.
If , then the Lemma is immediate. Otherwise let be any member of such that . We must show that
(184) |
Since , it must be that for all , has a density , and that
(185) |
Since , ,
(186) |
For , let consist of all such that
(187) |
For some integer , and , let be such that
(188) |
For , we stipulate that where , and
(189) |
and
(190) |
We note that is well defined for : it is the density that would result from the large limiting dynamics in Theorem 2.2. Since
it must be that
(191) |
(By definition, the Lebesgue integral is the limit of piecewise-constant approximations). Write to be the measure with density . We therefore find that
(192) |
Furthermore
(193) |
Thus for large enough values of , we may assume that . Write to be such that
Thanks to Lemma 4.21, there exists such that
(194) |
It therefore follows from the Large Deviations Lower Bound in Theorem 4.1 that
(195) | ||||
(196) |
Furthermore, upon performing a change of variable,
(197) |
as , thanks to (192). This implies the Lemma.
Lemma 4.21.
For any , and any , there exists such that if
(198) |
then
(199) |
Proof 4.22.
Let be such that . Recall that, by definition, must be piecewise Lipschitz over intervals of the form , with Lipschitz constant less than or equal to . We start by proving that for arbitrary ,
(200) |
Let be the density of . Write and define the density of to be , i.e.
(201) |
where , and
(202) |
Now define
(203) | ||||
and let have density , which is such that for all ,
(204) |
We first claim that for arbitrary , for all small enough it must be that
(205) |
Indeed writing to be the function-inverse of the function , it must be that for any bounded continuous function ,
(206) | ||||
(207) |
One easily checks that is differentiable-in-time, with derivative lower-bounded by . It therefore follows from the definition of the bounded-Lipschitz metric that
(208) |
We have therefore established (205). We write
(209) |
and we note that (thanks to (205)), for any there must exist such that as long as (205) is satisfied,
(210) |
Now
(211) |
Using the fact that (i) is upperbounded by , (ii) the time-derivative of is upperbounded by and (ii) , we obtain that there is a constant such that
(212) |
Write
(213) |
We next claim that there exists a constant such that for all ,
(214) |
Indeed we find that
(215) |
By definition
Since is bounded, we immediately see that there is a constant such that for all and all ,
Furthermore, by definition,
(216) |
We have thus established (214).
It now follows from (212) and (214) that for all ,
(217) |
Thanks to Gronwall’s Inequality,
We have thus established (200), since by assumption
uniformly as .
One can then repeat this argument and find that
(218) |
for arbitrarily small .
There are a finite number of intervals over which is Lipschitz. We can thus continue in this manner to obtain the Lemma.
Lemma 4.23.
For any ,
(219) |
Lemma 4.24.
There exists a constant such that for all , all and all ,
(220) | ||||
(221) |
Appendix A Large Deviations of the Uncoupled System
The Large Deviations of Poisson Random Fields has already been studied by numerous authors [53, 30, 41, 20]. Our system is similar, but not identical to the systems studied in these papers. The chief difference is that for a spatially-distributed Poisson Random Field over , spikes can occur at any spatial location. However in our system, spikes can only occur at the spatial locations of the channels. The large limiting equations are identical however, since the channels are uniformly distributed over . An additional novelty to our proof (beyond the proofs in [53, 30, 41, 20]) is that we obtain the Large Deviations for a slightly stronger topology.
As previously, let the channel be located at . We write to be independent Poisson Processes of unit intensity. We define the empirical reaction flux to be such that for any and an interval ,
(229) |
We write .
Define the rate function as follows. For , we stipulate that
(230) |
if is not absolutely continuous with respect to Lebesgue measure for some . Otherwise, we let be the density of , and define
(231) | ||||
(232) |
In the above expression, we recall that is the density of the measure that converges to as . Note also that . This means that the integral in (231) is well-defined (and could be ). We can now state a Large Deviation Principle for the uncoupled system.
Theorem A.1.
Let be (respectively) closed and open. Then
(233) | ||||
(234) |
Furthermore, is lower semicontinuous and has compact level sets.
A.1 Proof of Theorem A.1
Fix and write . Our main result in this subsection is the following.
Lemma A.2.
Let be (respectively) closed and open (with respect to the topology of weak convergence). Then for any ,
(235) | ||||
(236) |
Furthermore, is lower semicontinuous and has compact level sets.
Proof A.3.
Write to be the projection of a measure onto its marginal upto time . Evidently is continuous.
By definition, the topology on is generated by open sets of the form, for some , any , and a continuous bounded function and ,
(237) |
Since the projection is continuous, the Dawson-Gartner Projective Limits Theorem [26, 28] implies that the Large Deviation Principle in (which holds for arbitrary ) implies the Large Deviations Principle in Theorem A.1, with rate function
(238) | ||||
(239) |
since is nondecreasing. One should also note that is independent of if either and / or . This means that the Large Deviations Rate functions can be summed.
Write to be the set of all partitions of into a finite number of disjoint measurable sets, satisfying the following property. Any partition in is assumed to be of the form
(240) | ||||
(241) | ||||
(242) |
where is an interval, and has nonzero measure with respect to . For , we write whenever is a subpartition of , i.e. for any there must exist such that .
Let be the topology on , generated by the set of all open sets of the following form: for a partition , and open sets ,
(243) |
can be understood as a projective limit system in the sense of Section 4.6 of [28]. To see this, for any and , let denote the measures of all the sets in - i.e. . For , with , let be the natural projection, i.e. for any ,
(244) |
Its easy to check that is continuous. Let be the subset of the product space satisfying (244). Standard measure theory dictates that can be identified with (since by definition the measure is uniquely defined by the measure of the sets generating the -algebra).
We metrize convergence in as follows. Let be a sequence of partitions such that , and every set in of the form is such that the diameter of is less than or equal to , and the Lebesgue Measure of is less than or equal to . The metric is defined to be such that
(245) |
Write
(246) |
For any , define to be such that
(247) |
Let be such that for any measurable subset ,
(248) |
Define the rate function, for any ,
(249) | ||||
(250) | ||||
(251) |
and one obtains the second expression (251) from (250) by applying Calculus to compute the supremum. Note that in (251) (and throughout this paper) we interpret and .
Lemma A.4.
Let be (respectively) closed and open sets, with respect to the Euclidean topology. Then for any ,
(252) | |||
(253) |
Furthermore is lower-semi-continuous and convex.
Proof A.5.
Observe that constitute independent homogeneous Poisson random variables. Write to be the intensity of . The definitions imply that
(254) |
We therefore find that the logarithmic moment generating function takes the form, for constants (written )
(255) | ||||
(256) |
Observe that is (i) non-infinite for all , and (ii) smooth. The Large Deviation Principle is thus a consequence of Cramer’s Theorem [28].
Corollary A.6.
If , then for any ,
(257) |
Proof A.7.
Now define the rate function ,
(259) |
We can now prove the general Large Deviation Principle.
Lemma A.8.
Let be (respectively) closed and open sets, with respect to the topology induced by the metric . Then for any ,
(260) | |||
(261) |
Proof A.9.
We next prove Lemma A.2.
Proof A.10.
Once can check that the topology is a refinement of the topology of weak convergence on . Indeed, one checks that if , then necessarily for any bounded continuous function on (which must also be uniformly continuous), it holds that
The Lemma is therefore an immediate consequence of Lemma A.8.
To finish, we wish to obtain a more tractable form for the rate function .
Lemma A.11.
If then must be absolutely continuous with respect to . That is, there must exist measurable such that for any measurable
(262) |
Furthermore,
(263) |
Proof A.12.
Suppose that . It follows from Lemma A.13 that, as long as is sufficiently small, where is such that
(264) |
It is then a standard result from real analysis [42, Section 7.3] that is absolutely continuous with respect to . Let its density be .
Let be any sequence of partitions such that
(265) |
Its also assumed that the largest diameter of any set in goes to zero as . This assumption is possible thanks to Corollary A.6: if one takes a sub-partition of a partition, the associated rate function cannot decrease.
Let be such that for each , for all ,
(266) |
Write to be the -algebra generated by the sets in . Observe that is a Radon-Nikodym derivative with respect to the -algebra , and we will employ Levy’s Downwards Theorem to compute the limit as . To this end, define the probability measure . With respect to the filtration , is a -Martingale. Thanks to the Martingale Convergence Theorem, almost-surely,
(267) |
Since the function is bounded, non-negative and continuous over finite intervals, we find that the expression in (251) converges to .
Lemma A.13.
For every , there exists such that
(268) |
Furthermore if , then for any there exists such that
Proof A.14.
For any , let consist of all partitions such that for every and every , . For any , let be any particular partition in .
Define the set
(269) |
Thanks to the Large Deviations estimate, writing ,
(270) |
using the fact that for any ,
We thus find that
(271) |
Now for any , as . In particular, we take small enough that . Finally, for large enough , it must be that
(272) |
References
- [1] Report-449-1995.
- [2] Z. Agathe-Nerine, Multivariate hawkes processes on inhomogeneous random graphs, Stochastic Processes and their Applications, 152 (2022), pp. 86–148.
- [3] A. Agazzi, L. Andreis, R. I. Patterson, and D. M. Renger, Large deviations for markov jump processes with uniformly diminishing rates, Stochastic Processes and their Applications, (2022).
- [4] A. Agazzi, A. Dembo, and J. P. Eckmann, Large deviations theory for markov jump models of chemical reaction networks, Annals of Applied Probability, 28 (2018), pp. 1821–1855.
- [5] L. J. Allen, A primer on stochastic epidemic models: Formulation, numerical simulation, and analysis, Infectious Disease Modelling, 2 (2017), pp. 128–142.
- [6] D. Avitabile and J. Maclaurin, Neural fields and noise-induced patterns in neurons on large disordered networks, Arxiv 2408.12540v1, (2024).
- [7] A.-L. Barabasi and R. Albert, Emergence of scaling in random networks, Mat. Res. Soc. Symp. Proc, 74 (1999), p. 677.
- [8] G. Barbet, J. MacLaurin, and M. Silverstein, Large deviations of piecewise-deterministic-markov-processes with application to calcium signalling, SIAM Journal of Applied Mathematics (Submitted), (2023).
- [9] P. Bernuzzi and T. Grafke, Large deviation minimisers for stochastic partial differential equations with degenerate noise, (2024).
- [10] B. Bollobas, C. Borgs, J. Spencer, and G. Tusnady, The degree sequence of a scale-free random graph process, The degree sequence of a scale-free random graph process, 18 (2001).
- [11] C. Borgs, J. T. Chayes, H. Cohn, and Y. Zhao, An lp theory of sparse graph convergence ii: Ld convergence, quotients and right convergence, The Annals of Probability, 46 (2018).
- [12] , An lp theory of sparse graph convergence i: Limits, sparse random graph models, and power law distributions, Transactions of the American Mathematical Society, 372 (2019), pp. 3019–3062.
- [13] J. Bramburger and M. Holzer, Pattern formation in random networks using graphons, SIAM Journal on Mathematical Analysis, 55 (2023), pp. 2150–2185.
- [14] J. J. Bramburger, M. Holzer, and J. Williams, Persistence of steady-states for dynamical systems on large networks, (2024).
- [15] P. Bremaud, Point Processes and Queues, Springer-Verlag, 1981.
- [16] P. C. Bressloff, Spatiotemporal dynamics of continuum neural fields, Journal of Physics A: Mathematical and Theoretical, 45 (2012).
- [17] P. C. Bressloff and J. M. Newby, Path integrals and large deviations in stochastic hybrid systems, Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 89 (2014), pp. 1–15.
- [18] T. Britton and E. Pardoux, Stochastic Epidemic Models with Inference, Springer, 2019.
- [19] Z. Brzeźniak, X. Peng, and J. Zhai, Well-posedness and large deviations for 2d stochastic navier–stokes equations with jumps, Journal of the European Mathematical Society, 25 (2023), pp. 3093–3176.
- [20] A. Budhiraja, J. Chen, and P. Dupuis, Large deviations for stochastic partial differential equations driven by a poisson random measure, Stochastic Processes and their Applications, 123 (2013), pp. 523–560.
- [21] A. Budhiraja and P. Dupuis, Analysis and Approximation of Rare Events, vol. 94, Springer, 2019.
- [22] J. Chevallier and G. Ost, Fluctuations for spatially extended hawkes processes, Stochastic Processes and their Applications, (2020), pp. 1–33.
- [23] F. Coppini, A. D. Crescenzo, and H. Pham, Nonlinear graphon mean-field systems, (2024).
- [24] D. Daley and D. Vere-Jones, An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods, Second Edition, Springer, 2003.
- [25] , An introduction to the theory of Point Processes. Volume 2: General Theory and Structure. Second Edition, Springer, 2008.
- [26] D. A. Dawson and J. Gartner, Large deviations from the mckean-vlasov limit for weakly interacting diffusions, Stochastics, 20 (1987), pp. 247–308.
- [27] S. Delattre, N. Fournier, and M. Hoffman, Hawkes processes on large networks, The Annals of Applied Probability, 26 (2016).
- [28] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications 2nd Edition, Springer, 1998.
- [29] P. Dupuis, K. Ramanan, and W. Wu, Large deviation principle for finite-state mean field interacting particle systems, Arxiv Preprint, (2016).
- [30] D. Florens and H. Pham, Large deviation principle in nonparametric estimation of marked point processes, Statistics and Probability Letters, 41 (1999), pp. 383–388.
- [31] N. Fournier and E. Löcherbach, On a toy model of interacting neurons, Annales de l’institut Henri Poincare (B) Probability and Statistics, 52 (2016), pp. 1844–1876.
- [32] W. Gerstner, W. Kistler, R. Naud, and L. Paninski, Neuronal Dynamics From Single Neurons to Networks and Models of Cognition, Cambridge University Press, 2014.
- [33] M. Goebel, M. S. Mizuhara, and S. Stepanoff, Stability of twisted states on lattices of kuramoto oscillators, Chaos, 31 (2021).
- [34] T. Grafke, T. Schäfer, and E. Vanden-Eijnden, Sharp asymptotic estimates for expectations, probabilities, and mean first passage times in stochastic systems with small noise, Communications on Pure and Applied Mathematics, 77 (2024), pp. 2268–2330.
- [35] A. G. Hawkes, Hawkes processes and their applications to finance: a review, 2 2018.
- [36] M. Heymann and E. Vanden-Eijnden, The geometric minimum action method: A least action principle on the space of curves, Communications on Pure and Applied Mathematics, 61 (2008), pp. 1052–1117.
- [37] P. Ji, Y. Wang, T. Peron, C. Li, J. Nagler, and J. Du, Structure and function in artificial, zebrafish and human neural networks, 7 2023.
- [38] C. Kuehn and M. G. Riedler, Large deviations for nonlocal stochastic neural fields, Journal of Mathematical Neuroscience, 4 (2014), pp. 1–33.
- [39] P. J. Laub, Y. Lee, and T. Taimre, The Elements of Hawkes Processes, Springer International Publishing, 1 2021.
- [40] P. Lewis and G. Shedler, Simulation of nonhomogeneous poisson processes by thinning, Naval research logistics quarterly, (1978).
- [41] R. S. Liptser and A. A. Pukhalskii, Limit theorems on large deviations for semimartingales, (2005).
- [42] S. Lojasiewicz, An Introduction to the Theory of Real Functions, Wiley, 1988.
- [43] L. Lovasz, Large Networks and Graph Limits, 2012.
- [44] E. Lucon, Quenched asymptotics for interacting diffusions on inhomogeneous random graphs, Stochastic Processes and their Applications, (2020), pp. 1–52.
- [45] J. MacLaurin and J. M. Newby, Extreme first passage times for populations of identical rare events, SIAM Journal of Applied Mathematics (Accepted for Publication), (2024).
- [46] A. D. Masi, A. Galves, E. Löcherbach, and E. Presutti, Hydrodynamic limit for interacting neurons, Journal of Statistical Physics, 158 (2014), pp. 866–902.
- [47] E. Pardoux and B. Samegni-Kepgnou, Large deviation principle for epidemic models, Source: Journal of Applied Probability, 54 (2017), pp. 905–920.
- [48] R. I. Patterson and D. R. Renger, Large deviations of jump process fluxes, Mathematical Physics Analysis and Geometry, 22 (2019).
- [49] L. Pellis, F. Ball, S. Bansal, K. Eames, T. House, V. Isham, and P. Trapman, Eight challenges for network epidemic models, Epidemics, 10 (2015), pp. 58–62.
- [50] S. Riley, K. Eames, V. Isham, D. Mollison, and P. Trapman, Five challenges for spatial epidemic models, Epidemics, 10 (2015), pp. 68–71.
- [51] S. Strogatz and D. Watts, Collective dynamics of ’small-world’ networks, Nature, 393 (1998).
- [52] S. Tang, M. Tuerkoen, and H. Zhou, On the identifiability of nonlocal interaction kernels in first-order systems of interacting particles on riemannian manifolds, SIAM Journal on Applied Mathematics, 84 (2024), pp. 2067–2086.
- [53] A. D. Wentzell, Limit theorems on large deviations for Markov stochastic processes, Kluwer Academic Publishers, 1990.
- [54] Y. Xing and K. H. Johansson, Concentration in gossip opinion dynamics over random graphs, SIAM Journal on Control and Optimization, 62 (2024), pp. 1521–1545.
- [55] R. Zakine and E. Vanden-Eijnden, Minimum-action method for nonequilibrium phase transitions, Physical Review X, 13 (2023).