地址：/mnt/x110/wus/BP_new/BWA_mapping/classification/new_pipline

待合并脚本：

重新整理希望将全部和为一步。再添加参数选项。

基本设计：

原本的1是取出非只有M的，现在将之与1.2加header合并，需加参数：用哪个fastq为其加header

与2合并需要删除中间文件，需添加的是输出文件名后缀或者前缀，保留onepair等信息。主要参数功能是加输出文件名前后缀。

与3合并，添加一个输出sort and index 功能开启参数，目的是为了输出可给igv使用的文件。单独自己创建文件夹，如果有-o，就在-o下，如果没-o，就在当前工作地址，此功能也设置-O，作为自己的输出。

输出文件夹设置，-o，如果文件夹不存在，可自动创建文件夹，如果路径不存在，则报错。

参数： -i /mnt/x110/wus/BP_new/BWA_mapping/SRR20150106_mapped

-I /mnt/S30/database/tair10/tair10.fa

-a SRR123

已知：直接变量+‘字符串’可以组成新字符串，在python中

目前：已合并脚本，但未给sort and index 单独设置目录，只能在当前目录下。且没设置-o 目前运行只能在脚本目录下运行，保存也是存在脚本目录。

使用取bp得到bam所需的最少参数：i I a

python get_bp_final.py -i /mnt/x110/wus/BP_new/BWA_mapping/SRR20150106_mapped -I /mnt/S30/database/tair10/tair10.fa -a SRR123

结果：

并没删除中间文件

仅仅使用sort and index 所需的最少参数：a I S

python get_bp_final.py -a SRR123 -I /mnt/S30/database/tair10/tair10.fa -S

中间出了些错，经验是：测试时多写try except 和print 多输出错误信息

结果：

接下来：合并取在intron 的序列以及取mate序列。

重看之前的流程，找到原先使用的脚本流程和前期文件处理：

先去重：

命令：sort -t ';' -k 2,2 -u /mnt/x110/wus/BP_new/BWA_mapping/Arabidopsis_thaliana.TAIR10.47.intron.gff3 >look

检查：look是有正有负，且原文件Arabidopsis_thaliana.TAIR10.47.intron.gff3 确实有重复

再改mt pt 并用start排序。

命令：

sed 's/^Mt/7/g' look>look_mt7

sed 's/^Pt/6/g' look_mt7>look_mt7_pt6

sort -t ' ' -k1n -k4n look_mt7_pt6>llook_mt7_pt6_sort_1,4

参数：

/mnt/x110/wus/BP_new/BWA_mapping/Arabidopsis_thaliana.TAIR10.47.intron.gff3

step1 测试完成

作用：对intron gff文件前期处理。

第二步：

文件开头第一句不是gff文件而是注释，因此出错。

解决办法：

加个next（f）

解决：

script完成解决script3

解决第四步问题在没默认参数，需要手动输入参数但是没输

设置默认参数但还是报错，注意：

找了半天问题，发现是return写在循环内，因此只做了一个循环就输出了。

write写在try外面才起作用。

最后，只留one_pair的即可，

已改进，写了注释和固定了中间文件的名称，接下来只要再去除冗余即可。

整理：流程所需的输入文件中间文件输出文件。中间文件设置固定文件名，保存在id中。给输入文件定变量名。

前半部分：

1.输入完整的刚比对完的sam文件，先将非全m的取出，保存为temporary_mapq_nozero_file_noheader.sam文件，问题：输出文件的位置问题。

2.加header，需要输入：fa文件，输出：+header的bam文件，作为中间文件，这里只有一个文件，问题依旧是位置，需要：-o + filename+ with_header 主要不是为了变bam，是为了加header

3.得到bp文件。输入：中间文件，各个文件的文件名，作为输出文件，未确定输出地址单独写还是添加进id（id[ ]）输出，因为不需要后续的sort，因此不需要作为后续的输入，因此可以只 -o + id+ with_header，但是会需要作为8的输入，因此8还是需要输入-o + id+ with_header，所以还是直接写为id_address好：五类型bp文件，需要：取出out_put_suffix，即，这个参数不再作为需要输入的内容。（需要id函数，保存五类bp文件的中间文件名。）

（4. 修改，+header ，sort and index作为两个函数。）

------

（5.为intron.gff前期处理，输入：ntron.gff,输出：look_mt7_pt6_sortstart，作为中间文件。

6.用中间文件取出longest_intron_in_all_chrom.gff3，输入文件：默认为前一个的输出文件，输出：longest_intron_in_all_chrom.gff3

###5 6 对文件的前期处理）

7.取得intronlist

8.获取在intron的序列，输入：3输出的bam的id，用循环，是id【】，输出：os.path.basename(bam_id)+"_seq_in_intron_3.sam"

9. 为8输出的sam加header变bam，输入为默认的前一步的id[]，输出变.bam

10.获取meta，输入为前一步的bam的id[]，输出为'meta_'+name

-------------

总结流程结束，开始整理：

1.工具函数：write add_header sort_and_index get_id

2.文件处理：

2.1为intron.gff前期处理，输入：ntron.gff,输出：look_mt7_pt6_sortstart，作为中间文件。

2.2用中间文件取出longest_intron_in_all_chrom.gff3，输入文件：默认为前一个的输出文件，输出：longest_intron_in_all_chrom.gff3

2.3取得intronlist

3.正式流程

1.输入完整的刚比对完的sam文件，先将非全m的取出，保存为temporary_mapq_nozero_file_noheader.sam文件，问题：输出文件的位置问题。拟用给write写文件名时候加-o的参数解决。

1.2.加header，需要输入：fa文件，输出：+header的bam文件，作为中间文件，这里只有一个文件，问题依旧是位置，需要：-o + filename+ with_header 主要不是为了变bam，是为了加header

2.得到bp文件。输入：中间文件，各个文件的文件名id[ ]作为输出文件，拟直接写为id_address，供后续4使用。输出：五类型bp文件，用的就是输入的加path的文件名。

2.1 +header。输入：加path 的id[ ] 输出：加path 的bam id[ ]

3.获取在intron的序列，输入：3.1输出的bam的id，用循环。输出：os.path.basename(bam_id)+"_seq_in_intron_3.sam"

3.1. +header变bam，输入为默认的前一步的id[]，依旧有address 。输出：.bam id[]

4.获取meta，输入为前一步的bam的id[0;2]，输出为'meta_'+name

参数：

sam：/mnt/x110/wus/BP_new/BWA_mapping/SRR20150106_mapped

fa：/mnt/S30/database/tair10/tair10.fa

intron：/mnt/x110/wus/BP_new/BWA_mapping/Arabidopsis_thaliana.TAIR10.47.intron.gff3

命令：

python full.py -i /mnt/x110/wus/BP_new/BWA_mapping/SRR20150106_mapped -f /mnt/S30/database/tair10/tair10.fa -I /mnt/x110/wus/BP_new/BWA_mapping/Arabidopsis_thaliana.TAIR10.47.intron.gff3 -o /mnt/x110/wus/BP_new/BWA_mapping/test_full/te

最后一部分测试完毕，流程合并完毕。

未删除中间文件，未测试meta结果。

地址：/mnt/x110/wus/BP_new/BWA_mapping/test_full

结果文件地址：/mnt/x110/wus/BP_new/BWA_mapping/test_full/te

命令： python full.py -i /mnt/x110/wus/BP_new/BWA_mapping/SRR20150106_mapped -f /mnt/S30/database/tair10/tair10.fa -I /mnt/x110/wus/BP_new/BWA_mapping/Arabidopsis_thaliana.TAIR10.47.intron.gff3 -o /mnt/x110/wus/BP_new/BWA_mapping/test_full/te

输入输出：只需要输入刚比对完的sam文件参考基因组fa文件 intron的gff文件即可输出5种类型的read在intron内的sam文件和两种one_pair的meta序列sam文件

重整理流程，合并脚本成一个

重整理流程，合并脚本成一个

使用取bp得到bam所需的最少参数：i I a

仅仅使用sort and index 所需的最少参数：a I S

地址：/mnt/x110/wus/BP_new/BWA_mapping/test_full

结果文件地址：/mnt/x110/wus/BP_new/BWA_mapping/test_full/te

命令： python full.py -i /mnt/x110/wus/BP_new/BWA_mapping/SRR20150106_mapped -f /mnt/S30/database/tair10/tair10.fa -I /mnt/x110/wus/BP_new/BWA_mapping/Arabidopsis_thaliana.TAIR10.47.intron.gff3 -o /mnt/x110/wus/BP_new/BWA_mapping/test_full/te

输入输出：只需要输入刚比对完的sam文件参考基因组fa文件 intron的gff文件即可输出5种类型的read在intron内的sam文件和两种one_pair的meta序列sam文件

重整理流程，合并脚本成一个

使用取bp得到bam所需的最少参数：i I a

仅仅使用sort and index 所需的最少参数：a I S

地址：/mnt/x110/wus/BP_new/BWA_mapping/test_full

结果文件地址：/mnt/x110/wus/BP_new/BWA_mapping/test_full/te

命令： python full.py -i /mnt/x110/wus/BP_new/BWA_mapping/SRR20150106_mapped -f /mnt/S30/database/tair10/tair10.fa -I /mnt/x110/wus/BP_new/BWA_mapping/Arabidopsis_thaliana.TAIR10.47.intron.gff3 -o /mnt/x110/wus/BP_new/BWA_mapping/test_full/te

输入输出：只需要输入刚比对完的sam文件 参考基因组fa文件 intron的gff文件即可输出5种类型的read在intron内的sam文件和两种one_pair的meta序列sam文件

输入输出：只需要输入刚比对完的sam文件参考基因组fa文件 intron的gff文件即可输出5种类型的read在intron内的sam文件和两种one_pair的meta序列sam文件