fastq文件基本信息统计工具
阅读原文时间:2023年07月10日阅读:3

之前写的一个小工具,写的很简陋,名字取的也很随意就叫skr,哈哈。主要是fq转fa、合并多个染色体的vcf文件等,功能不多(主要是C写起来太操蛋了T_T),通常我也只用来统计fastq文件信息:

这里给出工具地址:https://github.com/sharkLoc/skrTools

usage:

Program: skr

Usage: skr [options]

fq2fa      translate fastq file to fasta  
fqstat     summary statistics of fastq file  
mergeVcf   merge vcf files from list  
statVcf    summary statistics of vcf file  
makewind   make bed from a list file

统计fastq文件信息:

输出read的平均长度,GC含量,总read数量和总的碱基数量,当然还包括ATGC和N碱基的数量和百分比,最后就是Q20和Q30结果。

skr fqstat -i xx1.fq.gz -I xx2.fq.gz

输出文件:

Iterm reads_1.fq reads_2.fq
read average length: 150 150
read GC content(%): 48.42 48.48
total read Count: 34946389 34946389
total base Count: 5241958350 5241958350

base A Count: 1352284833(25.80%) 1342903044(25.62%)
base C Count: 1270459966(24.24%) 1246706604(23.78%)
base G Count: 1267522866(24.18%) 1294357728(24.69%)
base T Count: 1351401800(25.78%) 1357986115(25.91%)
base N Count: 288885(0.01%) 4859(0.00%)

Number of base calls with quality value of 20 or higher (Q20+) (%) 5113248711(97.54%) 5092440219(97.15%)
Number of base calls with quality value of 30 or higher (Q30+) (%) 4886887711(93.23%) 4832524601(92.19%)