基于大模型的肺癌表皮生长因子受体突变患者生存预测模型构建与验证
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划项目(2018YFC1707410);中国中医科学院科技创新工程重大攻关项目(CI2021A00702)


Construction and Validation of a Large Model-based Survival Prediction Model for Lung Cancer Patients with EGFR Mutations
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    目的:探索肺癌表皮生长因子受体(EGFR)突变三代靶向药(奥希替尼)患者的中医证素分布规律,并基于机器学习与生成式大语言模型构建生存预测模型。方法:回顾性收集国家健康大数据中心2020—2023年新发Ⅲ~Ⅳ期非小细胞肺癌患者,按7:3随机分训练集与测试集。进行人群与中医证素描述性统计分析,运用LASSO-Cox回归筛选变量,建立Cox比例风险模型并绘制列线图。采用受试者工作特征曲线(ROC)评估模型曲线下面积(AUC),一致性指数(C-index)衡量预测准确性。进一步基于Llama架构的生成式大语言模型构建预测系统,对比传统机器学习性能。结果:中医证素分布以气虚(50.44%)、血瘀(25.01%)、痰(22.53%)为主。LASSO-Cox回归筛选出年龄、纤维蛋白原、CYFRA21-1、合并脑梗死、家族史及既往一代TKI治疗6项独立预后因素。多因素Cox模型显示训练集AUC=0.80(95%CI为0.76~0.84)和测试集AUC=0.78(95%CI为0.73~0.83)的C-index分别为0.77和0.75。生成式模型在40个epoch、学习率5.00×10-5时表现最优,准确率86.6%、召回率95.7%、F1-score 92.8%,显著优于传统方法。结论:本研究构建的肺癌EGFR突变口服三代靶向药患者生存预测模型可以有效预测患者的生存预后风险,并为预测方法提供新方法新思路。

    Abstract:

    To investigate the distribution pattern of traditional Chinese medicine(TCM) syndrome elements in lung cancer patients with epidermal growth factor receptor(EGFR) mutations treated with third-generation targeted therapy(Osimertinib) and to construct a survival prediction model using machine learning and generative large language models(LLM).Methods:Retrospective data from the National Health Big Data Center(2020 to 2023) were collected for newly diagnosed stage Ⅲ to Ⅳ non-small cell lung cancer patients.The dataset was randomly assigned to training(70%) and testing(30%) sets.Descriptive statistical analysis was performed on demographic and TCM syndrome elements.LASSO-Cox regression was used for variable selection,followed by the construction of a Cox proportional hazards model and a nomogram.Model performance was evaluated using the receiver operating characteristic(ROC) curve(area under the curve,AUC) and concordance index(C-index).Additionally,a predictive system based on the Llama architecture generative large language model was developed and compared with traditional machine learning approaches.Results:The predominant TCM syndrome elements were qi deficiency(50.44%),blood stasis(25.01%),and phlegm retention(22.53%).LASSO-Cox regression identified six independent prognostic factors:age,fibrinogen level,CYFRA21-1,comorbid cerebral infarction,family history,and prior first-generation TKI therapy.The multivariate Cox model achieved an AUC of 0.80(95%CI 0.76 to 0.84) in the training set and 0.78(95%CI 0.73 to 0.83) in the testing set,with C-indices of 0.77 and 0.75,respectively.The generative model achieved optimal performance at 40 epochs with a learning rate of 5.00×10-5,yielding 86.6% accuracy,95.7% recall,and an F1-score of 92.8%,significantly outperforming traditional methods.Conclusion:The survival prediction model developed in this study effectively stratifies prognosis risk for EGFR-mutant non-small cell lung cancer patients receiving third-generation targeted therapy and provides novel insights for the prediction method.

    参考文献
    相似文献
    引证文献
引用本文

陈子佳,韩宇,刘欣,阿孜古丽,张德政,谢雁鸣,王志飞.基于大模型的肺癌表皮生长因子受体突变患者生存预测模型构建与验证[J].世界中医药,2025,(08).

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-12-13
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-06-17
  • 出版日期:
文章二维码