Research on molecular generation models utilizing machine learning has been actively conducted, with De Novo molecular generation models attracting significant attention in the field of drug discovery. These models aim to design novel molecular structures from scratch without relying on existing compound databases.
Recent studies have shown a growing interest in De Novo molecular generation using Transformer models. In particular, molecular structure generation using a pure Transformer has the potential to dramatically enhance molecular representation capabilities. Furthermore, methods such as TransORGAN, which effectively utilize SMILES representations while more faithfully reflecting the structural characteristics of molecules, are gaining attention, raising expectations for more precise molecular design.
With the advancement of precision medicine, the utilization of gene expression profiles for targeted drug generation has gained significant attention in recent years. Traditionally, drug discovery has primarily focused on directly targeting disease-related proteins. However, the development of next-generation sequencing (NGS) and single-cell RNA sequencing (scRNA-seq) has enabled the analysis of disease characteristics at the gene expression level, making it possible to design novel drugs based on this information.
A gene expression profile refers to comprehensive data that measures gene expression levels in cells or tissues. This data is obtained through RNA sequencing (RNA-Seq) or microarray technology. By analyzing disease-specific gene expression patterns, researchers can gain a deeper understanding of disease mechanisms and establish guidelines for targeted drug design.
The rapid advancement of large language models (LLMs) has dramatically improved the efficiency of data analysis and molecular design in the drug discovery process. Traditionally, identifying target proteins and optimizing compounds required vast amounts of experimental data and computational resources. However, the introduction of LLMs has enabled automated literature analysis, chemical structure generation, and drug-target interaction predictions, significantly enhancing both the speed and accuracy of new drug development.
In particular, leveraging multimodal data, which integrates diverse data sources such as gene expression, molecular structures, chemical properties, and clinical data, allows for more precise drug development. This integration enables targeted drug design that considers disease-specific characteristics and facilitates applications in precision medicine, which were difficult to achieve with traditional single-modality approaches.