Recently I revisited finding association between bacteria. We know bacteria both produce and consume metabolites and chemicals, as well as bacteriocins that will inhibit other bacteria. “Bacteriocins are **potential alternatives to traditional antibiotics**. These peptides, which are produced by many bacteria, can have a high potency and a low toxicity” {Nature 2012]. Finding the relationships has been a challenge because of the nature of the distribution (not a bell curve — see this post on the solution that I use for identifying abnormal values) Post #1 Post #2.

This is a technical note (WARNING: Geek Speak) on the **262,603** relationship with correlation coefficient R^{2} of 0.10 or higher that is available on the site.

## Example of Classic Association

For our example, we will compare two families: Brucellaceae and Caulobacteraceae. Their ancestry is shown below

- Proteobacteria; Alphaproteobacteria; Hyphomicrobiales Brucellaceae
- Proteobacteria; Alphaproteobacteria; Caulobacterales Caulobacteraceae

Because they have some shared ancestry, you would usually expect them to be friendly and suppurative of each other. The standard analysis is shown below, charting the counts from samples that have both bacteria.

### Classic Approach

After an intro course to statistics, most people would do a regression. It is unlikely they would look at the chart because there are 2,669,956 charts that would be produced with the dataset that I am working with.

The regression and the chart is shown below, logical conclusion – no relationship.

### Alternative Approach

The alternative is to use what is called monotonic increasing functions on the counts. We scale the function so that it’s range is 0 to 100. This preserves the nature of the data and discard **the naïve assumption of linearity**

*.*The result is shown below. With this approach, we get the following chart.

**same data!!!**We could for each pair of bacteria derive the absolute optimal monotonic functions. This approach I find problematic because your appear to be fiddling with the data *too much*. I have put the additional constraint that you are allowed only one monotonic function per bacteria. I believe this will inhibit over-fitting the data to the model.

#### How many relationships over 0.1?

We have 1621 bacteria with at least one, and the top ones are shown below

taxonomy rank | taxonomy name | Count |

family | Halanaerobiaceae | 546 |

class | Fibrobacteria | 526 |

class | Dehalococcoidia | 506 |

family | Fibrobacteraceae | 505 |

order | Fibrobacterales | 501 |

genus | Fibrobacter | 483 |

family | Nitrosomonadaceae | 474 |

genus | Dehalogenimonas | 474 |

order | Acidobacteriales | 467 |

family | Micromonosporaceae | 461 |

genus | Nitrosomonas | 460 |

genus | Acinetobacter | 459 |

family | Colwelliaceae | 455 |

family | Acidobacteriaceae | 453 |

## What benefit does this give?

The impact of one bacteria on the other may be computed as slope * r^{2} . So R^{2} of .5 and a slope of .4 = .5 * .4 = .20 or 20%, thus for every 10 steps of one, the other will increase by 2.

We can use this when some bacteria X is high or low *and we have no information on modifying it.* We can look at the related bacteria with highest impact and its modifiers. We are trying to cascade by changing the associated bacteria to change our target bacteria! We are attempting to model the modifiers secondary changes into our suggestions.

To be continued and updated…

## Recent Comments