背景:将当前目录下的文件结构以树的形式输出,并在文件后面加上描述性信息。
刚接到这个需求的时候,我想这不是一个bash命令tree就可以了嘛,like this:
cd ~/04.workflow/08.scRNA_yanyt/03.reports_out/src #进入需要展示文件结构的路径
tree results/ #'展示文件结构
tree results >test.txt #'将文件结构保存到文件中
后来小胖打开别人做的文件展示页面,告诉我每一个文件后面还需要有对应的描述性信息,那可能就需要写代码来做一下。这里我用python来实现:
import os
import os.path
import pandas as pd
import re
def dfs_showdir(path, depth,annoText):
if depth == 0:
print("|--"+path)
for item in os.listdir(path):
if item in ['.git', '.idea', '__pycache__']:
continue #'如果文件以.git等结尾,那么跳过
#'正则又来了,删掉文件中的数字,因为小胖的文件夹有很多一模一样like fplot1.png/flot2.png...
pattern_item=re.sub("[0-9]","",item)
#'输出文件结构
print((" "*(depth+2))+"|--" +item+" "*4+annoText[annoText["files"]==pattern_item.split(".")[0]]["description"].tolist()[0])
new_item = path + '/' + item
#'递归
if os.path.isdir(new_item):
dfs_showdir(new_item, depth + 1)
if __name__ == '__main__':
#'构造文件说明数据框
annoText_1=pd.DataFrame()
annoText_1["files"]=["Feature_ber","hist","pearplot","Variable-ex","cycleplot"]
annoText_1["description"]=["基因数目和测序深度相关性文件","测序深度分布文件","质控信息文件","高变基因可视化文件","细胞周期文件"]
#'第二个文件信息构建
annoText_2=pd.DataFrame()
annoText_2["files"]=["umap","group","person","elbowplot","jackstrawplot","type_heatmap","cell_type","fplot","vln","celltype_singleR","Rplots","clusterplot","allmarker"]
annoText_2["description"]=["umap聚类可视化文件","umap聚类可视化文件","umap聚类可视化文件","PCA降维PC可视化文件","PCA降维相关文件","细胞相似性文件","细胞类型注释可视化文件","聚类差异基因表达可视化(umap)文件","聚类差异基因表达可视化(小提琴图)文件","singleR注释细
胞类型与聚类类型对照表文件","***","***","差异基因文件"]
#'第三个文件信息构建
annoText_3=pd.DataFrame()
annoText_3["files"]=["sig_dge_all"]
annoText_3["description"]=["***"]
my_path="./03.reports_out/src/results/"
print("流程的结果文件是{},包含文件有{}。".format(my_path, os.listdir(my_path)))
print("root:[" + my_path + "]")
for i in ["01.Data_filter","02.cell_cluster","03.DEG_enrichment"]:
the_path=my_path+i
if i=="01.Data_filter":
dfs_showdir(the_path, 0,annoText_1)
if i=="02.cell_cluster":
dfs_showdir(the_path, 0,annoText_2)
if i=="03.DEG_enrichment":
dfs_showdir(the_path, 0,annoText_3)
展示一下吧:
需求升级,不仅要描述文件信息,还要配置不同的颜色,首先使用python自带的print输出颜色,用法如下:
{
print("\33[31m"+"this is a test"+"\33[0m") #31代表红色
print("\33[33m"+"this is a test"+"\33[0m") #33黄色
print("\33[34m"+"this is a test"+"\33[0m") #34蓝色
print("\33[32m"+"this is a test"+"\33[0m") #32绿色
}
但是这个只能在python终端里输出,不能保存到文件里,直接在linux的bash命令行输入
python test.py
可以改颜色:但如果运行
python test.py>test.txt
保存到文件里就会失效:此时只能选择方法二:安装特定的模块
pip install colorama
from colorama import Fore, Back, Style
Style.RESET_ALL #清空设置,回到默认颜色
参数是:
爷彻底悟了,这和python不python 没有关系,文本能在终端显示颜色是靠linux终端来控制的,比如我直接在终端输入
echo -e "\e[34m流程的结果文件是./03.reports_out/src/results/,包含文件有['04.Trajectory', '03.DEG_enrichment', '01.Data_filter', 'sce.rds', '02.cell_cluster']。\e[0m"
文本颜色就直接改变了,python只是在字符串中加入了linux终端识别颜色的特点字符
\e[34m
等信息,那为啥输出到txt文件中linux终端就识别不了了呢?这和linux终端自己的查看命令有关系,当使用less test.txt
时它无法识别字符中的颜色标识:而当使用
cat test.txt
时它又认识了:那么问题来了,如何使用
less
命令也能认识颜色呢?能不能使用某种方法骗过linux终端,让它在使用less
的时候以为自己在使用cat
标准输出呢?我也搞不明白,付上一个别人的链接,我不敢试:怎样把Linux命令行带颜色的输出保存到文件? - 知乎 (zhihu.com)
爷很无助...
秉持着不相信这个需求没办法实现的想法,我使用python构造html网页来展示:
#!/usr/bin/env python
#-*- coding: utf-8 -*-
import pandas as pd
from dominate.tags import *
import dominate
import pandas as pd
import os
import re
from dominate.util import text
def dfs_showdir(path, depth,annoText):
if depth == 0:
with li():
span(path,cls="folder",style="color:blue")
with ul():
for item in os.listdir(path):
if item in ['.git', '.idea', '__pycache__']:
continue
pattern_item=re.sub("[0-9]","",item)
if pattern_item.split(".")[0] in annoText["files"].tolist():
with li():
span((" "*(depth+2))+"|--" +item+" "*4+annoText[annoText["files"]==pattern_item.split(".")[0]]["description"].tolist()[0],cls="file",style="color:red")
else:
with li():
span((" "*(depth+2))+"|--" +item+" "*4+"***nodescription",cls="files",style="color:yellow")
def main():
#'构造信息表格
if True:
annoText_1 = pd.DataFrame()
annoText_1["files"] = ["Feature_ber", "hist", "pearplot", "Variable-ex", "cycleplot"]
annoText_1["description"] = ["基因数目和测序深度相关性文件", "测序深度分布文件", "质控信息文件", "高变基因可视化文件", "细胞周期文件"]
# '第二个文件信息构建
annoText_2 = pd.DataFrame()
annoText_2["files"] = ["umap", "group", "person", "elbowplot", "jackstrawplot", "type_heatmap", "cell_type",
"fplot", "vln", "celltype_singleR", "Rplots", "clusterplot", "allmarker"]
annoText_2["description"] = ["umap聚类可视化文件", "umap聚类可视化文件", "umap聚类可视化文件", "PCA降维PC可视化文件", "PCA降维相关文件", "细胞相似性文件",
"细胞类型注释可视化文件", "亚群特异性高表达基因图(umap)文件", "亚群特异性高表达基因图(小提琴图)文件", "singleR注释细胞类型与聚类类型对照表文件"," ** * "," ** * ","差异基因文件"]
# '第三个文件信息构建
annoText_3 = pd.DataFrame()
annoText_3["files"] = ["sig_dge_all"]
annoText_3["description"] = ["***"]
# '第四个文件信息构建
annoText_4 = pd.DataFrame()
annoText_4["files"] = ["test"]
annoText_4["description"] = ["***"]
html_root = dominate.document(lang="en", doctype="<!DOCTYPE html>", title="this is a test")
with html_root.head:
meta(name="viewport",content="width=device-width, initial-scale=1.0")
link(rel="stylesheet",href="css/jquery.treeview.css")
script(src="js/jquery.min.js")
script(src="js/jquery.treeview.js",type="text/javascript")
script(type="text/javascript",src="js/myjs1.js")
my_path="./src/results/"
with html_root.body:
with div(id="main"):
with ul(id="treeview",cls="filetree"):
for i in ["01.Data_filter", "02.cell_cluster", "03.DEG_enrichment", "04.Trajectory"]:
the_path = my_path + i
if i == "01.Data_filter":
dfs_showdir(the_path, 0, annoText_1)
if i == "02.cell_cluster":
dfs_showdir(the_path, 0, annoText_2)
if i == "03.DEG_enrichment":
dfs_showdir(the_path, 0, annoText_3)
if i == "04.Trajectory":
dfs_showdir(the_path, 0, annoText_4)
with open('E:/***/01.资料/05.html_test/05.files_test/src/test.html','w') as f:
f.write(html_root.render())
if __name__ =='__main__':
main()
打开生成的网页看看吧:
诶,点击图标还能收缩:
总结:文件名字要规范,通常是
字母_数字.文件扩展名
,养成好习惯很重要。