Xialab 培训环境运行
目录
一、数据存放
二、环境配置
- 进入自己的环境,这里以zzh的为例
- 查看当前conda可利用的环境
- 尝试进入“training”环境,进入环境时报错
- 退出,并重新登录
- 尝试运行环境中已安装包的命令,检测环境配置
- 切换回自己的conda环境
一、数据存放
数据存在 /home/tmp/data
里面,直接复制或者软连接到自己环境中,之后培训数据都更新在该目录
(base) [zaohai_zeng@localhost ~]$ ln -s /home/tmp/data/
(base) [zaohai_zeng@localhost ~]$ cd data/
(base) [zaohai_zeng@localhost data]$ ll -htr
total 298M
-rw-r--r--. 1 root xialab 62M Jul 10 17:55 embryophyta_odb9.tar.gz
-rw-r--r--. 1 root xialab 102M Jul 10 17:55 eudicotyledons_odb10.tar.gz
-rw-r--r--. 1 root xialab 13M Jul 10 17:55 eukaryota_odb9.tar.gz
-r-xr-xr-x. 1 root xialab 117M Jul 10 18:00 Arabidopsis_thaliana.genome.fa
-r-xr-xr-x. 1 root xialab 4.8M Jul 10 18:00 Arabidopsis_thaliana.genome.gff3
二、环境配置
曾灶海和张艳青的我已经测试过,是可以用的,其他成员按照如下描述去配置,出现问题再看怎么解决。
1. 进入自己的环境,这里以zzh的为例
选择 3
进入zzh用户home目录
------------------------------------
-- Welcome to Terminal Menu --
------------------------------------
[1] > Start New Selection.(UTF-8 character)
[2] > Start New Selection.(GBK character)
History sessions:
[3] < 192.168.253.186 zaohai_zeng SSH UTF-8
[q] < Quit.
Choice: 3
Prepare to login to the target device, Please wait a second.
Last login: Wed Jul 10 12:05:41 2019 from 192.168.253.92
2. 查看当前conda可利用的环境
输入如下命令,查看conda环境,其中“flye_test”和“training”为培训所需的两个环境。
(base) [zaohai_zeng@localhost ~]$ /opt/anaconda/bin/conda info --envs
# conda environments:
#
* /home/zaohai_zeng/miniconda2
/home/zaohai_zeng/miniconda2/envs/py2
base /opt/anaconda
flye_test /opt/anaconda/envs/flye_test
training /opt/anaconda/envs/training
3. 尝试进入“training”环境,进入环境时报错
尝试激活‘training’环境,遇到报错如下:
(base) [zaohai_zeng@localhost ~]$ /opt/anaconda/bin/conda activate training
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run
$ conda init <SHELL_NAME>
Currently supported shells are:
- bash
- fish
- tcsh
- xonsh
- zsh
- powershell
See 'conda init --help' for more information and options.
IMPORTANT: You may need to close and restart your shell after running 'conda init'.
按照提示运行 /opt/anaconda/bin/conda init
命令,将conda写入.bashrc
文件
(base) [zaohai_zeng@localhost ~]$ /opt/anaconda/bin/conda init
no change /opt/anaconda/condabin/conda
no change /opt/anaconda/bin/conda
no change /opt/anaconda/bin/conda-env
no change /opt/anaconda/bin/activate
no change /opt/anaconda/bin/deactivate
no change /opt/anaconda/etc/profile.d/conda.sh
no change /opt/anaconda/etc/fish/conf.d/conda.fish
no change /opt/anaconda/shell/condabin/Conda.psm1
no change /opt/anaconda/shell/condabin/conda-hook.ps1
no change /opt/anaconda/lib/python3.7/site-packages/xontrib/conda.xsh
no change /opt/anaconda/etc/profile.d/conda.csh
modified /home/zaohai_zeng/.bashrc
==> For changes to take effect, close and re-open your current shell. <==
4. 退出,并重新登录
(base) [zaohai_zeng@localhost ~]$ exit
logout
------------------------------------
-- Welcome to Terminal Menu --
------------------------------------
[1] > Start New Selection.(UTF-8 character)
[2] > Start New Selection.(GBK character)
History sessions:
[3] < 192.168.253.186 zaohai_zeng SSH UTF-8
[q] < Quit.
Choice: 3
Prepare to login to the target device, Please wait a second.
Last login: Wed Jul 10 12:09:19 2019 from 192.168.253.92
(base) [zaohai_zeng@localhost ~]$
重新进入环境后,执行命令conda activate training
,如果 [user] (这里的user是zaohai_zeng@localhost) 前圆括号中的 base
—> training
,说明你已成功进入环境
(base) [zaohai_zeng@localhost ~]$ which conda
/opt/anaconda/bin/conda
(base) [zaohai_zeng@localhost ~]$ conda info --envs
# conda environments:
#
/home/zaohai_zeng/miniconda2
/home/zaohai_zeng/miniconda2/envs/py2
base * /opt/anaconda
flye_test /opt/anaconda/envs/flye_test
training /opt/anaconda/envs/training
(base) [zaohai_zeng@localhost ~]$ conda activate training
(training) [zaohai_zeng@localhost ~]$
5. 尝试运行环境中已安装包的命令,如果能出现如下提示那么恭喜你,你的环境配置好了
如 augustus
命令
(training) [zaohai_zeng@localhost ~]$ augustus
AUGUSTUS (3.3.2) is a gene prediction tool
written by M. Stanke, O. Keller, S. König, L. Gerischer and L. Romoth.
usage:
augustus [parameters] --species=SPECIES queryfilename
'queryfilename' is the filename (including relative path) to the file containing the query sequence(s)
in fasta format.
SPECIES is an identifier for the species. Use --species=help to see a list.
parameters:
--strand=both, --strand=forward or --strand=backward
--genemodel=partial, --genemodel=intronless, --genemodel=complete, --genemodel=atleastone or --genemodel=exactlyone
partial : allow prediction of incomplete genes at the sequence boundaries (default)
intronless : only predict single-exon genes like in prokaryotes and some eukaryotes
complete : only predict complete genes
atleastone : predict at least one complete gene
exactlyone : predict exactly one complete gene
--singlestrand=true
predict genes independently on each strand, allow overlapping genes on opposite strands
This option is turned off by default.
--hintsfile=hintsfilename
When this option is used the prediction considering hints (extrinsic information) is turned on.
hintsfilename contains the hints in gff format.
--AUGUSTUS_CONFIG_PATH=path
path to config directory (if not specified as environment variable)
--alternatives-from-evidence=true/false
report alternative transcripts when they are suggested by hints
--alternatives-from-sampling=true/false
report alternative transcripts generated through probabilistic sampling
--sample=n
--minexonintronprob=p
--minmeanexonintronprob=p
--maxtracks=n
For a description of these parameters see section 4 of README.TXT.
--proteinprofile=filename
When this option is used the prediction will consider the protein profile provided as parameter.
The protein profile extension is described in section 7 of README.TXT.
--progress=true
show a progressmeter
--gff3=on/off
output in gff3 format
--predictionStart=A, --predictionEnd=B
A and B define the range of the sequence for which predictions should be found.
--UTR=on/off
predict the untranslated regions in addition to the coding sequence. This currently works only for a subset of species.
--noInFrameStop=true/false
Do not report transcripts with in-frame stop codons. Otherwise, intron-spanning stop codons could occur. Default: false
--noprediction=true/false
If true and input is in genbank format, no prediction is made. Useful for getting the annotated protein sequences.
--uniqueGeneId=true/false
If true, output gene identifyers like this: seqname.gN
For a complete list of parameters, type "augustus --paramlist".
An exhaustive description can be found in the file README.TXT.
如 braker.pl
命令
(training) [zaohai_zeng@localhost ~]$ braker.pl
braker.pl Pipeline for predicting genes with GeneMark-ET and AUGUSTUS with
RNA-Seq
SYNOPSIS
braker.pl [OPTIONS] --genome=genome.fa --bam=rnaseq.bam
INPUT FILE OPTIONS
--genome=genome.fa fasta file with DNA sequences
--bam=rnaseq.bam bam file with spliced alignments from
RNA-Seq
--hints=hints.gff Alternatively to calling braker.pl with a
bam file, it is possible to call it with a
file that contains introns extracted from
RNA-Seq (or other data) in gff format.
This flag also allows the usage of hints
from additional extrinsic sources for gene
prediction with AUGUSTUS. To consider such
additional extrinsic information, you need
to use the flag --extrinsicCfgFiles to
specify parameters for all sources in the
hints file (including the source "E" for
intron hints from RNA-Seq).
--prot_seq=prot.fa A protein sequence file in multiple fasta
format. This file will be used to generate
protein hints for AUGUSTUS by running one
of the three alignment tools Exonerate
(--prg=exonerate), Spaln (--prg=spaln) or
GenomeThreader (--prg=gth). Default is
GenomeThreader if the tool is not
specified. Currently, hints from protein
sequences are only used in the prediction
step with AUGUSTUS.
--prot_aln=prot.aln Alignment file generated from aligning
protein sequences against the genome with
either Exonerate (--prg=exonerate), or
Spaln (--prg=spaln), or GenomeThreader
(--prg=gth).
To prepare alignment file, run Spaln2 with
the following command:
spaln -O0 ... > spalnfile
To prepare alignment file, run Exonerate
with the following command:
exonerate --model protein2genome \
--showtargetgff T ... > exfile
To prepare alignment file, run
GenomeThreader with the following command:
gth -genomic genome.fa -protein \
protein.fa -gff3out \
-skipalignmentout ... -o gthfile
A valid option prg=... must be specified
in combination with --prot_aln. Generating
tool will not be guessed.
Currently, hints from protein alignment
files are only used in the prediction step
with AUGUSTUS.
--AUGUSTUS_ab_initio output ab initio predictions by AUGUSTUS
in addition to predictions with hints by
AUGUSTUS
FREQUENTLY USED OPTIONS
--species=sname Species name. Existing species will not be
overwritten. Uses Sp_1 etc., if no species
is assigned
--softmasking Softmasking option for soft masked genome
files. (Disabled by default.)
--esmode Run GeneMark-ES (genome sequence only) and
train AUGUSTUS on long genes predicted by
GeneMark-ES. Final predictions are ab initio
--epmode Run GeneMark-EP with intron hints provided
from protein data. This mode is not
comptabile with using the aligners
GenomeThreader, Exonerate and Spaln within
braker.pl because etpmode and epmode require
a large database of proteins and such
mapping should be done outside of braker.pl
e.g. on a cluster.
--etpmode Run GeneMark-ETP with hints provided from
proteins and RNA-Seq data. This mode is not
compatible with using the aligners
GenomeThreader, Exonerate and Spaln within
braker.pl because etpmode and epmode require
a large database of proteins and such
mapping should be done outside of braker.pl
e.g. on a cluster.
--gff3 Output in GFF3 format (default is gtf
format)
--cores Specifies the maximum number of cores that
can be used during computation. Be aware:
optimize_augustus.pl will use max. 8
cores; augustus will use max. nContigs in
--genome=file cores.
--workingdir=/path/to/wd/ Set path to working directory. In the
working directory results and temporary
files are stored
--nice Execute all system calls within braker.pl
and its submodules with bash "nice"
(default nice value)
--alternatives-from-evidence=true Output alternative transcripts based on
explicit evidence from hints (default is
true).
--crf Execute CRF training for AUGUSTUS;
resulting parameters are only kept for
final predictions if they show higher
accuracy than HMM parameters.
--keepCrf keep CRF parameters even if they are not
better than HMM parameters
--UTR=on create UTR training examples from RNA-Seq
coverage data; requires options
--bam=rnaseq.bam and --softmasking.
Alternatively, if UTR parameters already
exist, training step will be skipped and
those pre-existing parameters are used.
--prg=gth|exonerate|spaln Alignment tool gth (GenomeThreader),
exonerate (Exonerate) or Spaln2
(spaln) that will be used to generate
protein alignments that will be the
basis for hints generation for gene
prediction with AUGUSTUS (if specified
in combination with --prot_seq) or that
was used to externally generate an
alignment file with the commands listed in
description of --prot_aln (if used in
combination with --prot_aln).
--gth2traingenes Generate training gene structures for
AUGUSTUS from GenomeThreader alignments.
(These genes can either be used for
training AUGUSTUS alone with
--trainFromGth; or in addition to
GeneMark-ET training genes if also a
bam-file is supplied.)
--trainFromGth No GeneMark-Training, train AUGUSTUS from
GenomeThreader alignments
--version Print version number of braker.pl
--help Print this help message
CONFIGURATION OPTIONS (TOOLS CALLED BY BRAKER)
--AUGUSTUS_CONFIG_PATH=/path/ Set path to config directory of AUGUSTUS
(if not specified as environment
variable). BRAKER1 will assume that the
directories ../bin and ../scripts of
AUGUSTUS are located relative to the
AUGUSTUS_CONFIG_PATH. If this is not the
case, please specify AUGUSTUS_BIN_PATH
(and AUGUSTUS_SCRIPTS_PATH if required).
The braker.pl commandline argument
--AUGUSTUS_CONFIG_PATH has higher priority
than the environment variable with the
same name.
--AUGUSTUS_BIN_PATH=/path/ Set path to the AUGUSTUS directory that
contains binaries, i.e. augustus and
etraining. This variable must only be set
if AUGUSTUS_CONFIG_PATH does not have
../bin and ../scripts of AUGUSTUS relative
to its location i.e. for global AUGUSTUS
installations. BRAKER1 will assume that
the directory ../scripts of AUGUSTUS is
located relative to the AUGUSTUS_BIN_PATH.
If this is not the case, please specify
--AUGUSTUS_SCRIPTS_PATH.
--AUGUSTUS_SCRIPTS_PATH=/path/ Set path to AUGUSTUS directory that
contains scripts, i.e. splitMfasta.pl.
This variable most only be set if
AUGUSTUS_CONFIG_PATH or AUGUSTUS_BIN_PATH
do not contains the ../scripts directory
of AUGUSTUS relative to their location,
i.e. for special cases of a global
AUGUSTUS installation.
--BAMTOOLS_PATH=/path/to/ Set path to bamtools (if not specified as
environment BAMTOOLS_PATH variable). Has
higher priority than the environment
variable.
--GENEMARK_PATH=/path/to/ Set path to GeneMark-ET (if not specified
as environment GENEMARK_PATH variable).
Has higher priority than environment
variable.
--SAMTOOLS_PATH=/path/to/ Optionally set path to samtools (if not
specified as environment SAMTOOLS_PATH
variable) to fix BAM files automatically,
if necessary. Has higher priority than
environment variable.
--ALIGNMENT_TOOL_PATH=/path/to/tool Set path to alignment tool
(GenomeThreader, Spaln, or Exonerate) if
not specified as environment
ALIGNMENT_TOOL_PATH variable. Has higher
priority than environment variable.
--BLAST_PATH=/path/to/blastall Set path to NCBI blastall and formatdb
executables if not specified as
environment variable. Has higher priority
than environment variable.
--PYTHON3_PATH=/path/to Set path to python3 executable (if not
specified as envirnonment variable and if
executable is not in your $PATH).
EXPERT OPTIONS
--augustus_args="--some_arg=bla" One or several command line arguments to
be passed to AUGUSTUS, if several
arguments are given, separated by
whitespace, i.e.
"--first_arg=sth --second_arg=sth".
--overwrite Overwrite existing files (except for
species parameter files)
--skipGeneMark-ES Skip GeneMark-ES and use provided
GeneMark-ES output (e.g. provided with
--geneMarkGtf=genemark.gtf)
--skipGeneMark-ET Skip GeneMark-ET and use provided
GeneMark-ET output (e.g. provided with
--geneMarkGtf=genemark.gtf)
--skipGeneMark-EP Skip GeneMark-EP and use provided
GeneMark-EP output (e.g. provided with
--geneMarkGtf=genemark.gtf)
--skipGeneMark-ETP Skip GeneMark-ETP and use provided
GeneMark-ETP output (e.g. provided with
--geneMarkGtf=genemark.gtf)
--geneMarkGtf=file.gtf If skipGeneMark-ET is used, braker will by
default look in the working directory in
folder GeneMarkET for an already existing
gtf file. Instead, you may provide such a
file from another location. If geneMarkGtf
option is set, skipGeneMark-ES/ET/EP/ETP is
automatically also set.
--rounds The number of optimization rounds used in
optimize_augustus.pl (default 5)
--skipAllTraining Skip GeneMark-EX (training and
prediction), skip AUGUSTUS training, only
runs AUGUSTUS with pre-trained and already
existing parameters (not recommended).
Hints from input are still generated.
This option automatically sets
--useexisting to true.
--useexisting Use the present config and parameter files
if they exist for 'species'
--filterOutShort It may happen that a "good" training gene,
i.e. one that has intron support from
RNA-Seq in all introns predicted by
GeneMark, is in fact too short. This flag
will discard such genes that have
supported introns and a neighboring
RNA-Seq supported intron upstream of the
start codon within the range of the
maximum CDS size of that gene and with a
multiplicity that is at least as high as
20% of the average intron multiplicity of
that gene.
--skipOptimize Skip optimize parameter step (not
recommended).
--skipGetAnnoFromFasta Skip calling the python3 script
getAnnoFastaFromJoingenes.py from the
AUGUSTUS tool suite. This script requires
python3, biopython and re (regular
expressions) to be installed. It produces
coding sequence and protein FASTA files
from AUGUSTUS gene predictions and provides
information about genes with in-frame stop
codons. If you enable this flag, these files
will not be produced and python3 and
the required modules will not be necessary
for running braker.pl.
--fungus GeneMark-ET option: run algorithm with
branch point model (most useful for fungal
genomes)
--rnaseq2utr_args=params Expert option: pass alternative parameters
to rnaseq2utr as string, default parameters:
-r 76 -v 100 -n 15 -i 0.7 -m 0.3 -w 70
-c 100 -p 0.5
--eval=reference.gtf Reference set to evaluate predictions
against (using the eval package)
--AUGUSTUS_hints_preds=s File with AUGUSTUS hints predictions; will
use this file as basis for UTR training;
only UTR training and prediction is
performed if this option is given.
--flanking_DNA=n Size of flanking region, must only be
specified if --AUGUSTUS_hints_preds is given
(for UTR training in a separate braker.pl
run that builds on top of an existing run)
--verbosity=n 0 -> run braker.pl quiet (no log)
1 -> only log warnings
2 -> also log configuration
3 -> log all major steps
4 -> very verbose, log also small steps
--downsampling_lambda=d The distribution of introns in training
gene structures generated by GeneMark-EX
has a huge weight on single-exon and
few-exon genes. Specifying the lambda
parameter of a poisson distribution will
make braker call a script for downsampling
of training gene structures according to
their number of introns distribution, i.e.
genes with none or few exons will be
downsampled, genes with many exons will be
kept. Default value is 2.
If you want to avoid downsampling, you have
to specify 0.
DEVELOPMENT OPTIONS (PROBABLY STILL DYSFUNCTIONAL)
--splice_sites=patterns list of splice site patterns for UTR
prediction; default: GTAG, extend like this:
--splice_sites=GTAG,ATAC,...
--extrinsicCfgFiles=file1,file2,... Depending on the mode in which braker.pl
is executed, it may require one ore several
extrinsicCfgFiles. Don't use this option
unless you know what you are doing!
--stranded=+,-,+,-,... If UTRs are trained, i.e.~strand-specific
bam-files are supplied and coverage
information is extracted for gene prediction,
create stranded ep hints. The order of
strand specifications must correspond to the
order of bam files. Possible values are
+, -, .
If stranded data is provided, ONLY coverage
data from the stranded data is used to
generate UTR examples! Coverage data from
unstranded data is used in the prediction
step, only.
The stranded label is applied to coverage
data, only. Intron hints are generated
from all libraries treated as "unstranded"
(because splice site filtering eliminates
intron hints from the wrong strand, anyway).
--optCfgFile=ppx.cfg Optional custom config file for AUGUSTUS
for running PPX (currently not
implemented)
DESCRIPTION
Example:
braker.pl [OPTIONS] --genome=genome.fa --species=speciesname \
--bam=accepted_hits.bam
braker.pl [OPTIONS] --genome=genome.fa --species=speciesname \
--hints=rnaseq.gff
To run with protein data from remote species and GeneMark-EP:
braker.pl [OPTIONS] --genome=genome.fa --hints=proteinintrons.gff --epmode=1
To run with protein data from a very closely related species:
braker.pl [OPTIONS] --genome=genome.fa --prot_seq=proteins.fa --prg=gth \
--gth2traingenes --trainFromGth
6. 如何切换回自己的conda环境
① 暂时切换到user的conda环境,退出shell重登切换会失效
如退出登录后,其是在/opt/anaconda
的名字为base
环境中,若想切换回自己/home/zaohai_zeng/miniconda2/envs/py2
名为py2
的环境中。
(base) [zaohai_zeng@localhost ~]$ which conda
/opt/anaconda/bin/conda
(base) [zaohai_zeng@localhost ~]$ conda info --envs
# conda environments:
#
/home/zaohai_zeng/miniconda2
/home/zaohai_zeng/miniconda2/envs/py2
base * /opt/anaconda
busco /opt/anaconda/envs/busco
flye_test /opt/anaconda/envs/flye_test
training /opt/anaconda/envs/training
找到当前用户 conda 环境的 conda
命令所在位置(一般在/path/to/miniconda2/bin
下面),如本user的:
(base) [zaohai_zeng@localhost ~]$ ls /home/zaohai_zeng/miniconda2/bin/conda
/home/zaohai_zeng/miniconda2/bin/conda
查看 当前用户自己的conda环境有哪些?
(base) [zaohai_zeng@localhost ~]$ /home/zaohai_zeng/miniconda2/bin/conda info --envs
# conda environments:
#
base /home/zaohai_zeng/miniconda2
py2 /home/zaohai_zeng/miniconda2/envs/py2
然后执行命令source /home/zaohai_zeng/miniconda2/bin/activate py2
切换进 py2
环境:
(base) [zaohai_zeng@localhost ~]$ source /home/zaohai_zeng/miniconda2/bin/activate py2
(py2) [zaohai_zeng@localhost ~]$ which python
~/miniconda2/envs/py2/bin/python
(py2) [zaohai_zeng@localhost ~]$ python --version
Python 2.7.13 :: Continuum Analytics, Inc.
② 永久删除/opt/anaconda
的环境
找到用户.bashrc
文件
vim .bashrc
然后删除这一段代码
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/anaconda/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/opt/anaconda/etc/profile.d/conda.sh" ]; then
. "/opt/anaconda/etc/profile.d/conda.sh"
else
export PATH="/opt/anaconda/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
截图如下:
退出重登 shell
即可永久回到user自己的conda
环境中。