成功得到了gvcf文件后,合并gvcf文件时,在导入db步骤中,由于gvcf文件超出了idx索引的处理范围,因此我使用CSI建立索引后,发现导入db的代码依旧无法运行,需要idx格式的索引,怎么办

attachments-2024-10-tbKoAeg4670f8793d3053.png

请先 登录 后评论

2 个回答

rzx

使用GATKA中的 IndexFeatureFile 工具来构建索引。gatk --java-options "-Xmx50g"  IndexFeatureFile -I  **.g.vcf.gz

请先 登录 后评论
omicsgene - 生物信息
擅长:重测序,遗传进化,转录组,GWAS

The problem is in the zipped (g.vcf.gz) files, for big genomes (chromosomes with size of > 500 mbp), the tbi index format can handle chromosomes up to ~ 530 Mbp. Indexing with csi is the option for large genomes with large chromosomes, but unfortunately, CombineGVCF and GenomicsDB don't accept it.

The only solution I found, is to work with unzipped g.vcf files (g.vcf)

Indexing with tabix is mandatory for gzipped vcf files (g.vcf.gz).
If files are unzipped, tbi indexing is not required. And it seems that GenomicsDB and combineGVCFs work ok with unzipped gVCF files.

Of course, the .idx is necessary. But .tbi is not required for unzipped vcf files.
参考这里:https://gatk.broadinstitute.org/hc/en-us/community/posts/4407400443803-GenomicsDBimport-and-CombineGVCF-does-not-show-variants-at-500-Mbp-onwards-although-gvcf-files-from-HapolypeCaller-report-variants



超过500M的染色体,GATK 索引不支持;

1.解压g.vcf.gz 

gunzip demo.g.vcf.gz

2. 解压后的vcf文件建立索引:

gatk --java-options "-Xmx50g"  IndexFeatureFile -I  demo.g.vcf
3.用解压后的g.vcf导入数据库


请先 登录 后评论