使用GATK的GenomicsDBImport合并gvcf导入db时报错Unexpected compressed block length: 1 for /work/GATK/BN14.g.vcf.gz

老师您好,我用重测序分析课程中的代码,.tbi文件都是完整对应的

gatk  --java-options "-Xmx128g" GenomicsDBImport  \

  -L chr.list --tmp-dir $tmpdir  -R $REF --batch-size 40 --reader-threads 40 --max-num-intervals-to-import-in-parallel 40 --genomicsdb-workspace-path db --sample-name-map cohort.sample_map

报错:

Using GATK jar /share/work/biosoft/GATK/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar

Running:

    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx128g -jar /share/work/biosoft/GATK/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar GenomicsDBImport -L chr.list --tmp-dir /work/tmp -R /work/ref/GCF_023373825.1_ASM2337382v1_genomic.fasta --batch-size 40 --reader-threads 40 --max-num-intervals-to-import-in-parallel 40 --genomicsdb-workspace-path db --sample-name-map cohort.sample_map

10:53:02.111 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/work/biosoft/GATK/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so

10:53:02.155 INFO  GenomicsDBImport - ------------------------------------------------------------

10:53:02.159 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.4.0.0

10:53:02.160 INFO  GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/

10:53:02.160 INFO  GenomicsDBImport - Executing as root@1fea4655ae23 on Linux v5.14.0-284.11.1.el9_2.x86_64 amd64

10:53:02.160 INFO  GenomicsDBImport - Java runtime: Java HotSpot(TM) 64-Bit Server VM v19.0.1+10-21

10:53:02.160 INFO  GenomicsDBImport - Start Date/Time: March 14, 2024 at 10:53:02 AM CST

10:53:02.160 INFO  GenomicsDBImport - ------------------------------------------------------------

10:53:02.160 INFO  GenomicsDBImport - ------------------------------------------------------------

10:53:02.161 INFO  GenomicsDBImport - HTSJDK Version: 3.0.5

10:53:02.161 INFO  GenomicsDBImport - Picard Version: 3.0.0

10:53:02.161 INFO  GenomicsDBImport - Built for Spark Version: 3.3.1

10:53:02.162 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2

10:53:02.162 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

10:53:02.162 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

10:53:02.162 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

10:53:02.163 INFO  GenomicsDBImport - Deflater: IntelDeflater

10:53:02.163 INFO  GenomicsDBImport - Inflater: IntelInflater

10:53:02.163 INFO  GenomicsDBImport - GCS max retries/reopens: 20

10:53:02.163 INFO  GenomicsDBImport - Requester pays: disabled

10:53:02.164 INFO  GenomicsDBImport - Initializing engine

10:53:02.425 INFO  IntervalArgumentCollection - Processing 530326621 bp from intervals

10:53:02.426 WARN  GenomicsDBImport - A large number of intervals were specified. Using more than 100 intervals in a single import is not recommended and can cause performance to suffer. If GVCF data only exists within those intervals, performance can be improved by aggregating intervals with the merge-input-intervals argument.

10:53:02.490 INFO  GenomicsDBImport - Done initializing engine

10:53:02.847 INFO  GenomicsDBLibLoader - GenomicsDB native library version : 1.4.4-ce4e1b9

10:53:02.848 INFO  GenomicsDBImport - Vid Map JSON file will be written to /work/GATK/db/vidmap.json

10:53:02.848 INFO  GenomicsDBImport - Callset Map JSON file will be written to /work/GATK/db/callset.json

10:53:02.849 INFO  GenomicsDBImport - Complete VCF Header will be written to /work/GATK/db/vcfheader.vcf

10:53:02.849 INFO  GenomicsDBImport - Importing to workspace - /work/GATK/db

10:53:02.849 WARN  GenomicsDBImport - GenomicsDBImport cannot use multiple VCF reader threads for initialization when the number of intervals is greater than 1. Falling back to serial VCF reader initialization.

10:53:05.169 INFO  GenomicsDBImport - Shutting down engine

[March 14, 2024 at 10:53:05 AM CST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.05 minutes.

Runtime.totalMemory()=7683964928

***********************************************************************


A USER ERROR has occurred: Failed to create reader from file:///work/GATK/BN14.g.vcf.gz because of the following error:

        Unable to parse header with error: java.io.IOException: Unexpected compressed block length: 1 for /work/GATK/BN14.g.vcf.gz, for input source: file:///work/GATK/BN14.g.vcf.gz


***********************************************************************

Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.


请问是什么原因呢,在网上查,有人提出可能是压缩文件的问题,建议对gvcf解压,但是我有560个样本,解压太大了,所以没有尝试,有没有什么其他解决方案呢?谢谢!
请先 登录 后评论

1 个回答

omicsgene - 生物信息
擅长:重测序,遗传进化,转录组,GWAS

检查这个文件上一步是否正确生成,如果没正常生成重新生成这个文件:BN14.g.vcf.gz

请先 登录 后评论