我分染色体执行GATK硬过滤的时候出现发现输出文件显著小于原文件,报错内容如下
(base) [jychu@localhost chr_hardf gatk VariantFiltration -R /public/jychu/refs/Gallus_gallus.GRCg6a.dna.toplevel.fa -V chr2-1_typed.snp.vcf --filter-expression " QUAL < 30.0 || QD < 2.0 || MQ < 40.0 || FS > 60.0 || SOR > 3.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filter-name "my_snp_filters" -O chr2_typed.snp.filter2.vcf
Using GATK jar /public/jychu/soft/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /public/jychu/soft/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar VariantFiltration -R /public/jychu/refs/Gallus_gallus.GRCg6a.dna.toplevel.fa -V chr2-1_typed.snp.vcf --filter-expression QUAL < 30.0 || QD < 2.0 || MQ < 40.0 || FS > 60.0 || SOR > 3.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0 --filter-name my_snp_filters -O chr2_typed.snp.filter2.vcf
10:32:26.101 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/public/jychu/soft/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Apr 12, 2021 10:32:26 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
10:32:26.456 INFO VariantFiltration - ------------------------------------------------------------
10:32:26.456 INFO VariantFiltration - The Genome Analysis Toolkit (GATK) v4.1.9.0
10:32:26.456 INFO VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
10:32:26.457 INFO VariantFiltration - Executing as jychu@localhost.localdomain on Linux v3.10.0-1062.el7.x86_64 amd64
10:32:26.457 INFO VariantFiltration - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_152-release-1056-b12
10:32:26.457 INFO VariantFiltration - Start Date/Time: April 12, 2021 10:32:26 AM CST
10:32:26.457 INFO VariantFiltration - ------------------------------------------------------------
10:32:26.457 INFO VariantFiltration - ------------------------------------------------------------
10:32:26.458 INFO VariantFiltration - HTSJDK Version: 2.23.0
10:32:26.458 INFO VariantFiltration - Picard Version: 2.23.3
10:32:26.458 INFO VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:32:26.458 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:32:26.458 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:32:26.458 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:32:26.458 INFO VariantFiltration - Deflater: IntelDeflater
10:32:26.459 INFO VariantFiltration - Inflater: IntelInflater
10:32:26.459 INFO VariantFiltration - GCS max retries/reopens: 20
10:32:26.459 INFO VariantFiltration - Requester pays: disabled
10:32:26.459 INFO VariantFiltration - Initializing engine
10:32:26.969 INFO FeatureManager - Using codec VCFCodec to read file file:///public/jychu/chicken_body_size/project/chr_hardfilter/chr2-1_typed.snp.vcf
10:32:27.008 INFO VariantFiltration - Done initializing engine
10:32:27.076 INFO ProgressMeter - Starting traversal
10:32:27.076 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
10:32:27.097 INFO VariantFiltration - Shutting down engine
[April 12, 2021 10:32:27 AM CST] org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=2081947648
java.lang.NumberFormatException: For input string: "nan"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at org.apache.commons.jexl2.JexlArithmetic.toDouble(JexlArithmetic.java:1016)
at org.apache.commons.jexl2.JexlArithmetic.compare(JexlArithmetic.java:699)
at org.apache.commons.jexl2.JexlArithmetic.lessThan(JexlArithmetic.java:774)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:967)
at org.apache.commons.jexl2.parser.ASTLTNode.jjtAccept(ASTLTNode.java:18)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:1283)
at org.apache.commons.jexl2.parser.ASTOrNode.jjtAccept(ASTOrNode.java:18)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:1274)
at org.apache.commons.jexl2.parser.ASTOrNode.jjtAccept(ASTOrNode.java:18)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:1274)
at org.apache.commons.jexl2.parser.ASTOrNode.jjtAccept(ASTOrNode.java:18)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:1274)
at org.apache.commons.jexl2.parser.ASTOrNode.jjtAccept(ASTOrNode.java:18)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:1274)
at org.apache.commons.jexl2.parser.ASTOrNode.jjtAccept(ASTOrNode.java:18)
at org.apache.commons.jexl2.Interpreter.interpret(Interpreter.java:232)
at org.apache.commons.jexl2.ExpressionImpl.evaluate(ExpressionImpl.java:65)
at htsjdk.variant.variantcontext.JEXLMap.evaluateExpression(JEXLMap.java:186)
at htsjdk.variant.variantcontext.JEXLMap.get(JEXLMap.java:95)
at htsjdk.variant.variantcontext.JEXLMap.get(JEXLMap.java:15)
at htsjdk.variant.variantcontext.VariantContextUtils.match(VariantContextUtils.java:338)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.matchesFilter(VariantFiltration.java:453)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.filter(VariantFiltration.java:407)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.apply(VariantFiltration.java:354)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
java.lang.NumberFormatException: For input string: "nan"
是关键
我用下面的命令查找了一下nan字符
grep "nan" chr2-1_typed.snp.vcf | less -S
2 52143528 rs736396988 T C 2.63197e+07 PASS AC=831;AF=0.85;AN=978;BaseQRankSum=1.44;DB;DP=866067;ExcessHet=-0;FS=0;InbreedingCoeff=0.4636;MLEAC=831;MLEAF=0.85;MQ=nan;MQRankSum=0;QD=31.94;ReadPosRankSum=0.149;SOR=0.652 GT:AD:DP:GQ:PGT:PID:PL:PS 0/1:176,260:436:99:.:.:6698,0,4122:. 0/1:187,292:479:99:.:.:7870,0,4743:. 0/0:139,0:139:99:.:.:0,120,1800:. 0/0:177,0:177:99:.:.:0,120,1800:. 0/0:111,0:111:99:.:.:0,101,1800:. 0/1:130,170:300:99:.:.:4614,0,3301:. 0/1:165,242:407:99:.:.:6602,0,4168:. 0/1:171,235:406:99:.:.:6287,0,4354:. 0/0:46,0:46:99:.:.:0,120,1800:. 1/1:2,424:426:99:.:.:13712,1221,0:. 1/1:4,3109:3115:99:.:.:104093,9185,0:. 1/1:2,3473:3483:99:.:.:115239,10331,0:. 1|1:5,3330:3341:99:1|1:52143462_A_G:113349,9837,0:52143462 1/1:0,3592:3603:99:.:.:118507,10756,0:. 0/1:1337,2419:3764:99:.:.:67702,0,32559:. 0/1:1340,2225:3568:99:.:.:62309,0,32972:. 1/1:0,787:788:99:.:.:28052,2367,0:. 1/1:0,638:639:99:.:.:23114,1919,0:. 1/1:0,898:898:99:.:.:32572,2701,0:. 1/1:2,838:840:99:.:.:30124,2482,0:. 1/1:1,759:761:99:.:.:26779,2245,0:. 1/1:0,839:841:99:.:.:29775,2524,0:. 1/1:0,718:719:99:.:.:25869,2160,0:. 1/1:0,748:751:99:.:.:26837,2250,0:. 1/1:0,857:862:99:.:.:30621,2578,0:. 1/1:1,808:823:99:.:.:28874,2394,0:. 1/1:1,735:739:99:.:.:26633,2176,0:. 1/1:0,699:703:99:.:.:25202,2103,0:. 1/1:0,619:625:99:.:.:22230,1861,0:. 1/1:1,773:780:99:.:.:27609,2319,0:. 1/1:0,681:684:99:.:.:24409,2047,0:. 1/1:1,685:689:99:.:.:24773,2026,0:. 1/1:0,929:932:99:.:.:33397,2793,0:.
我认为是不能识别这个字符,所以我把它替换为0
sed -i "s/nan/0/g" chr2-1_typed.snp.vcf
- 结果运行成功!