I use network inference methods to model interspecies interactions between marine microbial species and predict how marine microbial communities will be affected by global change.
To fully understand an ecological system, mere abundance data of individual species is insufficient. Recent global marine surveys indicate that interspecies interactions can have larger impacts on microbial community structure than environmental and geographic factors. But it is difficult to study these interactions because of the high complexity of many natural communities. On top of that, only a small fraction of the species in a natural community can be cultivated in a laboratory. I use mathematical models and computational approaches to learn species interactions from observational data. The natural variation in species' abundancies (e.g. over time) allows to infer species interaction networks and derive causal relationships. The nature of the interaction can be that one species produces a substrate for another or that species compete for nutrients, so the interactions are not necessarily physical.
How the direction of the interactions can be inferred by looking at more than two features at the same time and making use of specific characteristics of the data is illustrated below:
Imagine an organism that requires two other species to be present in the community to sustain its living. We call the ‘supporting’ species A and B and the dependent species C. When only one of the two ‘supporting’ species A or B are in the same community, the dependent organism C will never be found. Only when both A and B are present, species C will be found. That is, when the dependent species is present, we know that the other two must also be present, while when the dependent organism is absent, we know nothing about the other two. This phenomenon is called conditional independence. Dependent on the condition of C, A, and B are independent or not. This independence structure can be recovered from observational data and represented in a Bayesian network as arrows from A and B converging in C. Based on these structures, further parts of the interaction graph can be directed and allow to infer information flow.
My models can also be used for applied research. They can aid in the design of synthetic consortia for desired applications. For example, causal modeling can propose changes to the community that will improve the yield of useful metabolites or other natural products of interest. For example, communities that optimize degradation of plastic particles in the ocean.
I contribute sequence data analysis to marker gene studies of marine communities, e.g. amplicon-based rRNA gene locus analyses. I also use the abundance data to generate networks of species interactions.
I perform metagenomic sequence data analysis in collaboration with experimental marine scientists.