multi_cox.r 基因表达量做多因素cox分析

multi_cox.r 多因素cox分析

使用方法:

指定多个基因表达量做多因素cox回归分析,并构建预后模型,以及评估预后模型:


$Rscript $scriptdir/multi_cox.r -h
usage: /share/nas1/huangls/test/TCGA_immu/scripts/multi_cox.r
       [-h] -i data -t time -e event -v variate [variate ...]
       [-P predict.time [predict.time ...]] [-c cut.score] [-s seed]
       [-o outdir] [-p prefix]
multi variate cox regression analysis using gene expression
optional arguments:
  -h, --help            show this help message and exit
  -i data, --data data  input data file path[required]
  -t time, --time time  set suvival time column name [required]
  -e event, --event event
                        set event column name must 0 or 1 code format
                        [required]
  -v variate [variate ...], --variate variate [variate ...]
                        variate for cox analysis [required]
  -P predict.time [predict.time ...], --predict.time predict.time [predict.time ...]
                        Time point to draw the ROC curve [default 365 1095
                        1825]
  -c cut.score, --cut.score cut.score
                        set cut score value to divide high and low risk groups
                        [default median]
  -s seed, --seed seed  set random seed [default 2021]
  -o outdir, --outdir outdir
                        output file directory [default cwd]
  -p prefix, --prefix prefix
                        out file name prefix [default cox]

使用举例:



Rscript $scriptdir/multi_cox.r -i imm.unicox.metadata-exp.tsv -e EVENT -t TIME \
    -v PDGFRL CXCR4 PAK3 CSF1R PDCD1 -P 365 1095 1825 \
    -o multicox   -p  multicox

参数说明:

-i 输入生存数据与基因表达文件 


barcodeTIMEEVENTFGRCD38ITGALCX3CL1CEACAM21MATKCD79BMMP25
TCGA-B7-A5TK-01A-12R-A36D-31288016.3440886.8677240.26903603.01321.8685362.283423.45319813.72829
TCGA-BR-7959-01A-11R-2343-131010011.9673915.794517.35856626.913532.5719170.8641161.8799573.451148
TCGA-IN-8462-01A-11R-2343-1357205.3508463.1113423.76912520.222380.6108390.5197762.8221921.106563
TCGA-CG-4443-01A-01R-1157-1391201.538020.8629552.3735119.040971.0921270.7603481.9265920.878735
TCGA-KB-A93J-01A-11R-A39E-311124015.2401613.304738.0859114.152953.4835593.1929513.65174210.43186
TCGA-HU-A4H3-01A-21R-A251-3188206.2617612.6751737.0258864.0502710.5841591.0393361.9792142.312993
TCGA-RD-A8MV-01A-11R-A36D-313720027.0741520.1588534.9130934.718214.1131122.61555716.5194617.72674
TCGA-VQ-A91X-01A-12R-A414-3128911.0623410.7520182.3805134.4158150.5181420.2121971.2392030.582114


结果展示:


预后模型构建:

The Risk score was calculated with the following formula: The  risk score=


attachments-2021-06-PCuozLir60d59d023b1ad.png

, where Expri represents the expression level of gene i and coefi represents the regression coefficient of gene i in the signature.


风险评分

根据模型计算各样本分风险值,按照风险值的中位数将样本划分为高低风险组,分别绘制风险值分布散点图,生存时间散点图,signature基因表达热图。
attachments-2021-07-pixMJ7Ct60dd1ab017657.png

模型预测预后差异

高低风险组预后差异分析:绘制Kaplan-Meier生存曲线,并用Log Rank法检验两组的生存率是否有差异。

attachments-2021-07-kmtIVrSl60dd1ace16f74.png




模型预测性能评估

模型的好坏可以从区分度(Discrimination)和一致性(Calibration)两方面考虑。区分度主要用于反映预测模型的区分能力,是评估模型有多大把握确定它所预测的患者发生该事件的能力。一致性指结局实际发生的概率和预测的概率的一致性或者接近程度。前者可通过ROC曲线下面积(AUC)或C统计量来评价,后者可通过校准图来评价。以下为模型ROC曲线:


attachments-2021-07-S5ZQDzHc60dd1a799c093.png

To reflect the prediction ability of the XXXX‐based risk signature, we generated the time-dependent receiver operating characteristic curve (ROC) and calculated the area under the curve (AUC)  (R package “survivalROC” ) for 1-year, 3-year, and 5-year overall survival (OS). The Kaplan-Meier, log‐rank, ROC curve, and calibration analyses were all performed and visualized by the “survivalROC”, “rms”, “survival”, and “survminer” packages.


脚本获取与使用课程:https://study.163.com/course/introduction/1211864801.htm?share=1&shareId=1030291076

  • 发表于 2021-08-30 16:54
  • 阅读 ( 4546 )
  • 分类:TCGA

0 条评论

请先 登录 后评论
omicsgene
omicsgene

生物信息

702 篇文章

作家榜 »

  1. omicsgene 702 文章
  2. 安生水 351 文章
  3. Daitoue 167 文章
  4. 生物女学霸 120 文章
  5. xun 82 文章
  6. rzx 78 文章
  7. 红橙子 78 文章
  8. CORNERSTONE 72 文章