目的应用液体芯片-飞行时间质谱系统分析肺癌与对照组血清差异表达蛋白,筛选肺癌标志蛋白。方法将105例肺癌患者和90例对照者[44名正常人,46例良性肺疾病(BLDs)]的血清随机分成训练组(肺癌和对照各60例)和验证组[肺癌45例(早期13例,晚期32例),对照30例(正常及BLDs各15例)]。应用ClinProt及相关分析工具软件ClinProTools结合遗传算法(GA)等生物统计学和生物信息学方法分析上述195份血清。进行归一化平滑处理总离子流图(TIC),消除化学及电物理噪声;分析组间差异蛋白并计算差异大小,后按差异大小由大到小排列;应用GA对各差异蛋白的敏感性及特异性进行初步评价,建立判别模型并验证。结果比较肺癌及对照血清蛋白表达谱时共发现98个差异蛋白峰。质荷比(m/z)分别为1 865.81 u和4 054 u的蛋白在两组间差异最大,且这两种蛋白在肺癌组各样品谱图中的丰度大于正常对照组;以蛋白1 865.81 u为X轴,4 054 u为Y轴建立坐标系(坐标值代表相应蛋白丰度),观察样品分布,可见两组样品混杂差区域较小,说明这两种蛋白的区分肺癌和对照(正常人和BLDs)的能力良好。应用GA,以训练组数据建立判别模型时,系统筛选出一个由m/z分别为3 192.082、862.554、643.495、336.64、3 954.88、9 288.98 u及4 209.64 u的7个特征峰组成的诊断模型。以验证组数据进行交叉验证时,该模型对肺癌及对照的识别率分别为82.22%(37/45)和80.00%(24/30),对早期肺癌的识别率为76.92%(10/13)。鉴定所得到的m/z分别为1 778、1 8654、209 u蛋白峰,得出前两个蛋白峰均为C3f,而4 209 u则为真核细胞肽链释放因子。结论肺癌患者与正常人及BLDs患者血清蛋白质表达谱之间存在差异;使用ClinProTools结合GA等生物信息学方法有望筛选出肺癌诊断标志蛋白。
Objective To use liquid and matrix assisted laser desorption/ionization time-of-flight mass spectrometry(MALDI-TOF-MS) system to analyze the differentially expressed proteins in the serum of lung cancer patients and control group so as to filter out protein markers of lung cancer.Methods We randomly divided the serum of the 105 lung cancer patients and 90 control patients(including 44 healthy people and 46 cases of BLDs) into training group(including 60 lung cancer cases and 60 control cases) and validation group [including 45 lung cancer cases(13 patients with early lung cancer and 32 patients with advanced lung cancer) and 30 control cases(both 15 cases of normal and BLDs)].We analyzed the 195 cases of serum with ClinProt and related software analysis tools-ClinProTools,genetic algorithm(GA) and other biometric methods.Then we smoothed the TIC normally to eliminate chemical and physical electrical noise,analyzed the difference protein between the groups and calculated the differences,and then arrayed the protein according to the degree of the difference in a descending order.GA was used to evaluate the sensitivity and specificity of the difference proteins to establish and validate the discriminable model.Results Totally 98 difference protein peaks were found when comparing the serum protein of lung cancer patients and control group.The proteins of m/z value1 865.81u and4 054u had the most difference between the two groups,and the abundance of the two proteins inlung cancer group was higher thanthat in control group.We established a coordinate systemusing the protein1 865.81u as X-axis and protein4 054u as Y-axis(the value represents protein abundance);the few mixed regions showed that the ability of this model to distinguishlung cancer and control group(including nor mal and BLDs) was good.Using GA,the system gained a diagnosis model with7m/z value of3 192.08,2 862.55,4 643.49,5 336.64,3 954.88,9 288.98u and4 209.64u,when using the training group data to construct the model.And