Abstract
INTRODUCTION
The importance of protein amyloidogenesis, associated with various diseases and functional roles, has driven the creation of computational predictors of amyloidogenicity. The accuracy of these predictors, particularly those utilizing artificial intelligence technologies, heavily depends on the quality of the data.
METHODS
We built Cross-Beta DB, a database containing high-quality data on known cross-β amyloids formed under natural conditions. We used it to train and benchmark several machine-learning (ML) algorithms to predict amyloid-forming potential of proteins.
RESULTS
We developed the Cross-Beta predictor using an Extra trees ML algorithm, which outperforms other amyloid predictors with the highest F1 score (0.852) and accuracy (0.844) compared to existing methods.
DISCUSSION
The development of the Cross-Beta DB database and a new ML-based Cross-Beta predictor may enable the creation of personalized risk profiles for neurodegenerative diseases and other amyloidoses—especially as genome sequencing becomes more affordable.
Highlights

Accuracy of ML-based predictors depends on the quality of training data
We built Cross-Beta DB, a database of high-quality data on naturally-occurring amyloids
Using this data, we developed an amyloid predictor that outperforms other predictors
This computational tool enables the creation of risk profiles for neurodegenerative diseases


If you do not see content above, kindly GO TO SOURCE.
Not all publishers encode content in a way that enables republishing at Neuro.vip.

This post is Copyright: Valentin Gonay,
Michael P. Dunne,
Javier Caceres‐Delpiano,
Andrey V. Kajava | January 8, 2025

Wiley: Alzheimer’s & Dementia: Table of Contents