NanoProNet: An “ImageNet” dataset for protein corona research!

📌 2.76 M annotated nanomaterial–protein samples 📌

2.76M+

Nanomaterial–protein
samples

33.7k+

Number of
unique proteins

28

Nanomaterial, incubation
& separation parameters

Features

AI at the forefront: empowering nanomaterial-protein interaction research 🤖

Largest public dataset

Over 2.76 million annotated nanomaterial–protein interaction samples and 33k unique proteins to advance research and model training.

Support generalizable predictions

The use of universal text and protein language models supports generalized prediction on unseen samples and proteins.

Easy to use

Researchers with a basic machine learning background can easily follow the detailed guidelines and clear usage instructions provided.

High predictive performance

The model is capable of accurate base predictions, handling predictions with missing feature information effectively, and generalizing reliably to unseen data.

Adaptable with fine-tuning

Serving as a foundation model, it can be fine-tuned to specific applications, improving its ability to learn from few examples.

Active research community

This will drive progress in protein corona research, positioning it as a vital component in the rapidly evolving field of AI for Science.

From the authors

We are thankful for the support and collaboration that made this work possible.

"We hope this dataset and model accelerate AI-driven protein corona research and its nanomedicine and broader applications!"

Hengjie Yu
Hengjie Yu
Postdoc at Westlake University

"We are committed to application-driven and trustworthy AI research, exploring its broad applications in industry, science, and art."

Yaochu Jin
Yaochu Jin
Chair Professor of AI at Westlake University

Deploying the foundation model or fine-tuning for specific applications

As easy as 1, 2, 3!