数据库地址:https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/
安装了ANNOVAR就可以直接下载:
perl annotate_variation.pl --downdb --buildver hg19 -downdb -webfrom annovar clinvar_20180603 ./
下载之后,得到一些注释信息,其中ANNOVAR包含5个注释信息CLNALLELEID, CLNDN, CLNDISDB, CLNREVSTAT, CLNSIG:
ALLELEID ="the ClinVar Allele ID" CLNDN ="ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB"
CLNDNINCL ="For included Variant : ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB"
CLNDISDB ="Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN"
CLNDISDBINCL ="For included Variant: Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN"
CLNHGVS ="Top-level (primary assembly, alt, or patch) HGVS expression."
CLNREVSTAT ="ClinVar review status for the Variation ID"
CLNSIG ="Clinical significance for this single variant"
export PATH=/share/work/biosoft/annovar/2018Apr16/annovar:/share/work/biosoft/vt/vt-0.57721/:$PATH
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar_20180805.vcf.gz
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar_20180805.vcf.gz.tbi
vt decompose clinvar_20180805.vcf.gz -o temp.split.vcf
prepare_annovar_user.pl -dbtype clinvar_preprocess2 temp.split.vcf -out temp.split2.vcf
vt normalize temp.split2.vcf -r ../../GRCH37/Homo_sapiens.GRCh37.dna.toplevel.fa -o temp.norm.vcf -w 2000000
prepare_annovar_user.pl -dbtype clinvar2 temp.norm.vcf -out hg19_clinvar_20180805.txt
#index_annovar.pl hg19_clinvar_20180805_raw.txt -out hg19_clinvar_20180805.txt -comment comment_20180805.txt
如果觉得我的文章对您有用,请随意打赏。你的支持将鼓励我继续创作!