测序量低建议最低4X,测序量大建议10X以上保证质量;最高深度不要超过1000X
例如:1)猪这篇文章报道,深度低于4XSNP错误率大幅上升:https://link.springer.com/article/10.1186/s12859-019-3164-z
2) 超高深度的SNP位点可能位于基因组重复区,建议删除:https://www.nature.com/articles/nbt.2053
GATK过滤命令行:
SNP
gatk VariantFiltration \ -V snps.vcf.gz \ -filter "QD < 2.0" --filter-name "QD2" \ -filter "QUAL < 30.0" --filter-name "QUAL30" \ -filter "SOR > 3.0" --filter-name "SOR3" \ -filter "FS > 60.0" --filter-name "FS60" \ -filter "MQ < 40.0" --filter-name "MQ40" \ -filter "MQRankSum < -12.5" --filter-name "MQRankSum-12.5" \ -filter "ReadPosRankSum < -8.0" --filter-name "ReadPosRankSum-8" \ -O snps_filtered.vcf.gz
INDEL
gatk VariantFiltration \ -V indels.vcf.gz \ -filter "QD < 2.0" --filter-name "QD2" \ -filter "QUAL < 30.0" --filter-name "QUAL30" \ -filter "FS > 200.0" --filter-name "FS200" \ -filter "ReadPosRankSum < -20.0" --filter-name "ReadPosRankSum-20" \ -O indels_filtered.vcf.gz
如果想自己过滤,这里有视频课程操作过程课程:https://bdtcd.xetslk.com/s/1VQOjQ
如果觉得我的文章对您有用,请随意打赏。你的支持将鼓励我继续创作!