最近在学习snakemake 用于生信流程管理,现在用一个snakemake 来完成小任务:将在某一文件夹下的多个bam文件截取一部分,然后建立索引,在提取出fastq序列,最后比对回基因组。
需要两个文件,一个配置文件config.yaml和snakemake文件。
config.yaml 文件内容用空格分隔,主要是要用到的软件,也可以包含一些流程中要用的参数
1 TMP_DIR: /home/share/work/TMP
2
3 HG19: /home/share/BioSoft/hg19_bwa/ucsc.hg19.fasta
4
5 BWA: /home/share/BioSoft/bin/bwa
snakemake文件:我一般都以 .py 做后缀名,主流程写在这里
1 configfile: "./config.yaml"
2
3 hg19=config["HG19"]
4 tmp=config["TMP_DIR"]
5 Bwa=config["BWA"]
6
7 SAMPLES,=glob_wildcards("bam/{sample}.05.mapped.bam")
8
9 rule all:
10 input:
11 expand("out/{sample}.bam.bai",sample=SAMPLES),
12 expand("out/bwa/{sample}.sam",sample=SAMPLES)
13
14 rule split:
15 input:
16 "bam/{sample}.05.mapped.bam"
17 output:
18 "out/{sample}.bam"
19 log:
20 "log/{sample}.log"
21 shell:
22 "samtools view {input} chr7:3240000-5260000 -b -o {output} 2> {log}"
23
24 rule index:
25 input:
26 "out/{sample}.bam"
27 output:
28 "out/{sample}.bam.bai"
29 shell:
30 "samtools index {input} {output}"
31
32 rule fq:
33 input:
34 "out/{sample}.bam"
35 output:
36 r1="out/{sample}_r1.fq",
37 r2="out/{sample}_r2.fq",
38 params:
39 null="/dev/null"
40 shell:
41 "samtools fastq -N {input} -1 {output.r1} -2 {output.r2} -0 {params.null} -s {params.null}"
42
43 rule bwa:
44 input:
45 r1="out/{sample}_r1.fq",
46 r2="out/{sample}_r2.fq"
47 output:
48 "out/bwa/{sample}.sam"
49 params:
50 RG=r"'@RG\tID:{sample}\tPL:illumina\tSM:{sample}'",
51 ref=hg19,
52 bwa=Bwa
53 threads: 3
54 log:
55 "log/bwa/{sample}.log"
56 shell:
57 "{params.bwa} mem -t {threads} -M -R {params.RG} {params.ref} {input.r1} {input.r2} -o {output}"
写完后伪运行下,检查各个步骤代码
1 snakemake -np -s snakemake.py
检查没有问题之后投递计算节点,任务结束后还可以做个流程图:
1 snakemake --dag -s snakemake.py |dot -Tpdf > pipeline.pdf
手机扫一扫
移动阅读更方便
你可能感兴趣的文章