May 10th 2020

Check out a few videos on how we use sulfur isotopic biosignatures to understand past earth environmental conditions, and how using metabolic markers can help in this endeavor. Videos are hosted at https://johnstonlab.fas.harvard.edu/ and are available in English, Spanish, and French! Contact me if you’d like to know more!

June 11th 2020

**Disclaimer **These are writing exercises for my own good as I revisit old concepts, and I figured it would be of interest to others out there looking to understand these same concepts as well. They are purely informative, with no agenda, and meant for general dissemination of scientific concepts. Thus, I will try to stick to freely accessible references for the content of these entries.

June 12th 2020

**SIR Models**

The year 2020 has so far been a challenge for everyone. The fast spread of a disease, illness, death, recommended lock downs induce a constant sense of uncertainty that spikes up stress in all of us. And model projections for how things will develop can be daunting. This entry is meant for those who would like to understand how a basic epidemiology model is built, and how it can provide vital information. What is presented next is a simple model, and the reader in search of more detailed and sophisticated ones is encouraged to use the references provided and go down this rabbit hole.

The spread of an illness is reflected in the state of the population it affects. As a pathogen spreads through a generally healthy population of size *N*, said population can be divided into three groups: susceptible, infected, and recovered individuals. Every person that was susceptible to catching the pathogen will become infected and eventually recover, and so individuals transition between groups/states/compartments (Figure 1).

SIR models are built to track these three groups and understand how fast individuals transition between states. Note the wonderfully straightforward name: Susceptible, Infected, Recovered (Ref. 1, 2). The simplest version of such models is the Kermack-McKendrick model. It was originally designed to explain the sharp rise and fall of the number of infected patients during major European epidemics, such as the plague (London 1665-1666, Bombay 1906) and cholera (London 1865) (Ref. 1, 2).

Mathematically, this model centers around four interdependent equations (equations 1 through 4). For those that are not fluent in math, here, *dX/dt* means the *change (d)* of the number of whatever one is interested in (here, *X*) per time interval (*dt*, and can be a day, a month, a year).

Equation 1 is particularly powerful. Given its time dependence, it tells us the sum of all groups (*S(t), I(t), and R(t)*) is equal to *N* – the total number of people in your population – *at all times*. This implies the size of your population is constant, no matter how much *S*, *I*, and *R* change.

Equations 2, 3, and 4 are the gears of the model. They have the typical form of *system of coupled ordinary differential equations*. This means the values of *S, I,* and *R* are shared in more than one equation (Ref. 1, 2).

So, according to these equations, individuals move between categories at a rate specified by β, γ, and the number of individuals affected. β corresponds to the rate of infection, and γ to the rate of recovery. The higher the fraction of susceptible individuals, the higher the rate at which they will join the infected population. The same effect can be obtained in the case of a pathogen with a high rate of infection β. The only way to decrease these numbers is via recovery (achieved, for instance, with a high value for γ), which makes the infected individuals join the recovered category and bring the number of infected down (Figure 2).

In general, learning the dynamics of how the disease spreads is useful to determine how it should be managed. This is why models like this one are useful. One can gain insight into how quickly an infectious disease will spread and determine what proportion of the population will be infected and recovered at any given time. This helps determine whether health care services can cope and take appropriate measures to protect the most at-risk members of a population. For instance, we can take the rate of infection, β, and the rate of recovery, γ, to compute R₀ using R₀ = β/γ. This value is the the basic reproduction number, in other words, the total number of people an infected person infects before they recover (Ref. 2, 4). Such numbers come in when making decisions including social distancing rules: if that R₀ value is very high, then choosing to not isolation will only help the pathogen spread.

Now, the version of this model that I am presenting here is a simple one, though surprisingly useful and robust, that needs expansion and refinement. For instance, we might want to allow the population to change size (so *N* is not a constant), as a result of vital dynamics (birth and death), or migration (as is a huge factor in today’s ultra-connected world). The β and γ values are here assumed to be the same for all individuals, when in fact they could vary. Depending on the age-group and the existence of aggravating conditions, these values will differ. Also, what if the illness can be caught twice? What if you are interested in identifying spreaders, individuals that have a higher transmission rate than average (Ref. 3)? Finally, there could be a whole new group, such as people that become exposed to the pathogen, but don’t necessarily become infected. If you’d like to know more about more sophisticated models, ones that introduce new compartments such as the *Exposed* individuals just introduced, please visit * reference 4*. In there is a fully fleshed out explanation of a model that includes such a compartment, and the author even shares their python code!

At this point you might ask me, why would a self-proclaimed geobiologist want to think about the models behind epidemics? First of all because numbers and math help us understand how the world works, and for a naturally anxious person stuck at home, this was a soothing exercise. Most importantly, because the core idea behind the SIR model is found in geobiology: the idea of \textit{compartments}. Although, because we are cool, we call them boxes instead. When trying to understand how a system will respond to a perturbation (say, the global carbon cycle responding to a sudden influx of carbon dioxide into the atmosphere), we establish our boxes, how these boxes are connected, and how material flows between them using model parameters that are either measured, or for which we are trying to solve. Just like the SIR model tries to do. On a final, lighter note, if this entry was not your idea of excitement, SIR models are featured in the episode “Vector” in season 1 of the show NUMB3RS, where math meets murder.

References:

- Weisstein, Eric W. “SIR Model.” From MathWorld–A Wolfram Web Resource. https://mathworld.wolfram.com/SIRModel.html
- https://towardsdatascience.com/infectious-disease-modelling-part-i-understanding-sir-28d60e29fdfc
- Shakarian P., Bhatnagar A., Aleali A., Shaabani E., Guo R. (2015) The SIR Model and Identification of Spreaders. In: Diffusion in Social Networks. SpringerBriefs in Computer Science. Springer, Cham
- https://towardsdatascience.com/infectious-disease-modelling-beyond-the-basic-sir-model-216369c584c4