Quantcast
Viewing all articles
Browse latest Browse all 5

Use of an Alternating Decision Tree Algorithm to Study Viruses

From “Decoding Flu Viruses Before an Outbreak“:

Machine learning algorithms have been used to study DNA and protein sequences for more than 20 years, but only in the past few years have scientists applied them to viruses. Inspired by the growing amount of viral sequence data available for analysis, Ben-Tal’s team used an approach called supervised learning. With this method, each piece of input data used to train the algorithm is tagged with a category, in this case whether the virus sequence was derived from swine or humans. The resulting algorithm defines a decision tree capable of accurately sorting the viruses into the proper group — human or swine. The nodes of the tree point to the specific amino acids, or building blocks of proteins, that reliably differentiate the groups. Ben-Tal said this specific approach, called an alternating decision tree algorithm, is standard in machine learning but had rarely been applied to biological data before his team’s study.

In the study, the researchers identified 13 amino acid changes that appeared to distinguish human viruses from viruses that remained in swine and an additional 10 amino acid changes that distinguished the pandemic strain from standard seasonal flus. One or more of these candidates, which the scientists have since analyzed in more detail, could explain the virus’s dangerous transformation. (Ben-Tal, Webby and their collaborators will soon publish a paper characterizing a mutation that they say helped H1N1 become a pandemic virus.)

One of the key benefits of the computational approach was that researchers were able to look beyond the standard targets, regions of the genome known to be involved in traits such as transmissibility. For example, some of the candidate mutations lie nearby but outside the specific site where hemagglutinin binds to the host cell.

“The residue that was important wasn’t in a part of the protein that we had ever predicted,” Webby said. “Had I gone in using the old ways of looking at changes, I wouldn’t have been looking at this particular part of the protein.”

The entire article can be read here.

The good news is that machine learning is becoming an essential tool for biologists. However, it is concerning that use of an old machine learning algorithm such as an alternating decision tree is viewed as cutting edge research in biology. What other possible methods could be used for the type of investigation performed in the article? It is difficult to imagine what other methods besides machine learning could be effective given the inherently ambiguous, messy, and non-linear nature of genetics and gene expression. Hopefully, such success stories will spark more biologists to adopt machine learning in their research.

Information about the alternating decision tree algorithm can be found here.


Viewing all articles
Browse latest Browse all 5

Trending Articles