目录
- 简介
- 安装准备
- 安装与配置
- 基础功能
- 核心应用程序
- Structure Prediction (结构预测)
- Protein Design (蛋白质设计)
- Docking (分子对接)
- Loop Modeling (环建模)
- Membrane Proteins (膜蛋白)
- Antibody Design (抗体设计)
- De novo Protein Design (从头设计)
- RNA Modeling (RNA建模)
- 高级功能
- Rosetta Scripts
- PyRosetta
- Fold and Dock
- Enzyme Design
- Symmetry Modeling
- 输入文件格式
- 输出分析
- 常见问题解决
- 资源与社区
简介
Rosetta是由华盛顿大学Baker实验室开发的一套用于计算蛋白质结构建模和设计的软件套件。它提供了多种功能模块,包括蛋白质折叠预测、蛋白质设计、分子对接、抗体设计等。Rosetta在生物医学研究、药物设计和蛋白质工程等领域有广泛应用。
安装准备
1. 获取许可证
-
学术许可证:
- 访问 Rosetta Commons官网
- 填写学术许可申请表格
- 等待审核邮件,通常1-3个工作日
-
商业许可证:
- 联系 Rosetta Design Group
- 了解定价和协议条款
2. 系统要求
-
计算机配置推荐:
- 处理器: 至少Intel i5以上或AMD同等性能
- 内存: 最低8GB,推荐16GB以上
- 存储空间: 至少20GB可用空间
- 操作系统: Windows 10/11 (64位)
3. Windows子系统准备
Rosetta主要设计用于Unix/Linux环境,在Windows上有三种主要运行方法:
方法一: 安装WSL (Windows Subsystem for Linux)【推荐】
-
启用WSL:
- 以管理员身份打开PowerShell
- 运行:
dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart
- 运行:
dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart
- 重启计算机
-
安装WSL2:
- 下载WSL2 Linux内核更新包
- 安装更新包
- 在PowerShell中设置WSL2为默认版本:
wsl --set-default-version 2
-
安装Linux发行版:
- 打开Microsoft Store
- 搜索并安装Ubuntu 20.04 LTS
- 启动Ubuntu并设置用户名和密码
方法二: 安装VirtualBox虚拟机
- 下载安装VirtualBox
- 下载Ubuntu 20.04 LTS ISO
- 在VirtualBox中创建新的虚拟机,分配至少4GB内存和50GB硬盘
- 安装Ubuntu系统
方法三: 使用Docker
- 安装Docker Desktop for Windows
- 获取Rosetta Docker镜像(需要联系Rosetta管理员)
安装与配置
以下步骤基于WSL(Ubuntu)环境,是Windows用户的推荐方法:
1. 安装依赖项
# 更新软件源
sudo apt update
sudo apt upgrade -y
# 安装编译工具和依赖库
sudo apt install -y build-essential scons zlib1g-dev libxml2-dev python-is-python3 python3-dev python3-pip git cmake
2. 下载并解压Rosetta
# 将下载的源代码包移至主目录
cp /mnt/c/Users/YourUsername/Downloads/rosetta_src_20XX.XX.XXXXX.tgz ~/
# 解压源代码包
tar -xvzf rosetta_src_20XX.XX.XXXXX.tgz
# 创建符号链接(可选)
ln -s rosetta_src_20XX.XX.XXXXX rosetta
3. 编译Rosetta
# 进入源代码目录
cd ~/rosetta/main/source
# 使用scons编译(使用4个线程,可根据CPU核心数调整)
./scons.py -j4 mode=release bin
编译过程通常需要1-2小时,取决于计算机性能。
4. 设置环境变量
编辑~/.bashrc
文件:
nano ~/.bashrc
添加以下内容:
# Rosetta环境变量
export ROSETTA3=~/rosetta
export PATH=$PATH:~/rosetta/main/source/bin
export ROSETTA_DB=~/rosetta/main/database
应用更改:
source ~/.bashrc
5. 测试安装
# 检查版本
~/rosetta/main/source/bin/score_jd2.default.linuxgccrelease -database ~/rosetta/main/database -help
如果输出Rosetta版本信息和帮助文档,说明安装成功。
基础功能
1. 文件系统结构
Rosetta的主要目录结构:
- bin/: 编译后的可执行程序
- database/: 包含评分函数、旋转异构体库和其他数据
- demo/: 演示程序和示例
- scripts/: 实用脚本
- src/: 源代码
2. 命令行结构
Rosetta命令通常遵循以下格式:
application_name.executable_type.platform -database path_to_database [options]
例如:
score_jd2.default.linuxgccrelease -database ~/rosetta/main/database -s input.pdb
其中:
-
score_jd2
- 应用程序名称 -
default
- 编译选项 -
linuxgccrelease
- 平台 -
-database
- 指定数据库路径的标志 -
-s
- 指定输入结构的标志
3. 常用命令行选项
-
-database
: 指定Rosetta数据库路径 -
-s
: 指定输入结构文件 -
-in:file:fasta
: 指定输入序列文件 -
-nstruct
: 生成的结构数量 -
-out:file:scorefile
: 输出得分文件名 -
-out:path
: 输出目录 -
-out:pdb
: 输出PDB文件 -
-ex1
,-ex2
: 使用额外的旋转异构体构象 -
-use_input_sc
: 使用输入结构的侧链构象 -
-relax:fast
: 使用快速弛豫算法 -
-overwrite
: 覆盖现有输出文件
核心应用程序
Structure Prediction (结构预测)
1. Ab initio结构预测
从氨基酸序列预测蛋白质三维结构:
# 准备fasta文件 (example.fasta)
echo ">target_protein" > example.fasta
echo "MKVSHPLLMGMAFAYDIILCLTIFMGPDLLNSA" >> example.fasta
# 运行Ab initio预测
~/rosetta/main/source/bin/AbinitioRelax.default.linuxgccrelease \
-database ~/rosetta/main/database \
-in:file:fasta example.fasta \
-abinitio:relax \
-nstruct 5 \
-out:file:silent abinitio_results.out
选项说明:
-
-abinitio:relax
: 在预测后进行弛豫优化 -
-nstruct 5
: 生成5个预测结构 -
-out:file:silent
: 以二进制格式保存结果
2. 片段组装
Ab initio预测需要片段库,可以使用Robetta服务器或本地生成:
使用Robetta服务器:
- 访问Robetta服务器
- 提交序列并下载片段库
- 将片段库(3mer和9mer)放在工作目录中
本地生成片段:
~/rosetta/main/source/bin/fragment_picker.default.linuxgccrelease \
-database ~/rosetta/main/database \
-in:file:fasta example.fasta \
-frags:n_frags 200 \
-frags:n_candidates 2000
3. 同源建模 (Comparative Modeling)
当有同源模板时,可以使用比较建模:
~/rosetta/main/source/bin/comparative_modeling.default.linuxgccrelease \
-database ~/rosetta/main/database \
-in:file:fasta target.fasta \
-in:file:alignment alignment.aln \
-in:file:template_pdb template.pdb \
-loops:build_terminal_loops \
-nstruct 10
4. 混合建模 (Hybrid Modeling)
当有部分结构信息时:
~/rosetta/main/source/bin/RosettaCM.default.linuxgccrelease \
-database ~/rosetta/main/database \
-cm:aln alignment.aln \
-cm:align_formats grishin \
-in:file:fasta target.fasta \
-out:nstruct 10
Protein Design (蛋白质设计)
1. 固定骨架设计 (Fixed Backbone Design)
保持骨架不变,优化侧链:
# 创建resfile指定设计位点
cat > design.resfile << EOF
NATAA
start
10 A ALLAA
11 A PIKAA YFWL
12 A NATRO
EOF
# 运行设计
~/rosetta/main/source/bin/fixbb.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s input.pdb \
-resfile design.resfile \
-ex1 -ex2 \
-nstruct 10
resfile说明:
-
NATAA
: 默认保持原始氨基酸 -
ALLAA
: 允许所有20种氨基酸 -
PIKAA YFWL
: 仅允许Y,F,W,L四种氨基酸 -
NATRO
: 不重新打包,保持原始构象
2. 侧链重打包 (Side-chain Repacking)
优化侧链构象而不改变氨基酸类型:
~/rosetta/main/source/bin/fixbb.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s input.pdb \
-suffix _repacked \
-packing:ex1 -packing:ex2 \
-packing:repack_only
3. 弛豫 (Relax)
对结构进行能量最小化:
~/rosetta/main/source/bin/relax.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s input.pdb \
-relax:fast \
-out:suffix _relaxed \
-nstruct 5
Docking (分子对接)
1. 蛋白质-蛋白质对接
~/rosetta/main/source/bin/docking_protocol.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s complex.pdb \
-partners A_B \
-dock_ppk \
-out:file:scorefile docking_scores.sc \
-nstruct 100
选项说明:
-
-partners A_B
: 指定对接的链 -
-dock_ppk
: 使用预打包的蛋白质 -
-nstruct 100
: 生成100个对接构象
2. 局部对接 (Local Docking)
当已知大致结合位置:
~/rosetta/main/source/bin/docking_protocol.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s complex.pdb \
-partners A_B \
-docking:local_refine \
-out:file:scorefile local_docking.sc \
-nstruct 50
3. 蛋白质-配体对接
# 首先准备配体文件
~/rosetta/main/source/scripts/python/public/molfile_to_params.py \
-n LIG -p LIG --conformers-in-one-file ligand.mol2
# 运行对接
~/rosetta/main/source/bin/ligand_docking.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s protein_with_ligand.pdb \
-extra_res_fa LIG.params \
-nstruct 100 \
-out:suffix _dock
Loop Modeling (环建模)
1. 基础环建模
# 创建loops文件指定要重建的环区域
cat > loops.txt << EOF
LOOP 25 35 0 0 0
LOOP 78 86 0 0 0
EOF
# 运行环建模
~/rosetta/main/source/bin/loopmodel.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s input.pdb \
-loops:input_loops loops.txt \
-loops:remodel quick_ccd \
-loops:refine refine_ccd \
-nstruct 10
loops文件格式说明:
- 第一列: LOOP关键字
- 第二列: 环起始残基号
- 第三列: 环终止残基号
- 后三列: 环的切割点,闭合点和构象设置(一般为0)
2. 环优化 (Loop Refinement)
~/rosetta/main/source/bin/loopmodel.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s input.pdb \
-loops:input_loops loops.txt \
-loops:refine refine_kic \
-ex1 -ex2 \
-nstruct 20
3. 无模板环建模
~/rosetta/main/source/bin/loopmodel.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s input.pdb \
-loops:input_loops loops.txt \
-loops:extended true \
-loops:build_initial \
-loops:remodel perturb_ccd \
-nstruct 50
Membrane Proteins (膜蛋白)
1. 膜蛋白弛豫
# 创建span文件定义跨膜区域
cat > membrane.span << EOF
TM region 10 30
TM region 50 70
TM region 90 110
EOF
# 运行膜蛋白弛豫
~/rosetta/main/source/bin/mp_relax.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s membrane_protein.pdb \
-mp:setup:spanfiles membrane.span \
-mp:scoring:hbond \
-relax:fast \
-nstruct 10
2. 膜蛋白设计
~/rosetta/main/source/bin/mp_design.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s membrane_protein.pdb \
-mp:setup:spanfiles membrane.span \
-resfile design.resfile \
-mp:lipids:composition DLPC \
-nstruct 10
3. 膜蛋白组装
~/rosetta/main/source/bin/mp_fold.default.linuxgccrelease \
-database ~/rosetta/main/database \
-in:file:fasta membrane_protein.fasta \
-mp:setup:spans_from_file membrane.span \
-mp:assembly:num_components 2 \
-nstruct 50
Antibody Design (抗体设计)
1. 抗体建模
# 创建抗体序列和模板文件
# ...
# 运行抗体建模
~/rosetta/main/source/bin/antibody.default.linuxgccrelease \
-database ~/rosetta/main/database \
-fasta antibody.fasta \
-antibody:numbering_scheme kabat \
-antibody:auto_generate_kink_constraint \
-nstruct 10
2. CDR环优化
~/rosetta/main/source/bin/antibody.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s antibody.pdb \
-antibody:refine_cdr h3 \
-antibody:cdr_dihedral_constraints \
-nstruct 100
3. 抗原-抗体对接
~/rosetta/main/source/bin/snugdock.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s antibody_antigen_complex.pdb \
-antibody:snug_loops \
-partners A_BC \
-nstruct 50
De novo Protein Design (从头设计)
1. 骨架生成
~/rosetta/main/source/bin/rosetta_scripts.default.linuxgccrelease \
-database ~/rosetta/main/database \
-parser:protocol denovo_design.xml \
-out:file:silent denovo_backbones.silent \
-out:nstruct 1000
denovo_design.xml示例:
<ROSETTASCRIPTS>
<MOVERS>
<BackboneGenerator name="bb_gen" ss="LHEELLLEEE"
legacy="0" n_samples="100" n_tries="100" rmsd="1.0"/>
<ScoreFilter name="score_filter" scorefxn="ref2015" threshold="10"/>
</MOVERS>
<PROTOCOLS>
<Add mover="bb_gen"/>
<Add filter="score_filter"/>
</PROTOCOLS>
</ROSETTASCRIPTS>
2. 序列设计
~/rosetta/main/source/bin/fixbb.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s denovo_backbone.pdb \
-design \
-ex1 -ex2 \
-use_input_sc \
-nstruct 10
3. 完整从头设计流程
~/rosetta/main/source/bin/rosetta_scripts.default.linuxgccrelease \
-database ~/rosetta/main/database \
-parser:protocol full_denovo.xml \
-nstruct 100
RNA Modeling (RNA建模)
1. RNA折叠
~/rosetta/main/source/bin/rna_denovo.default.linuxgccrelease \
-database ~/rosetta/main/database \
-fasta rna.fasta \
-nstruct 100 \
-out:file:silent rna_folded.out
2. RNA设计
~/rosetta/main/source/bin/rna_design.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s rna_structure.pdb \
-rna_design:sequence_constraints sequence.constraints \
-nstruct 100
3. RNA-蛋白质对接
~/rosetta/main/source/bin/rna_protein_dock.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s rna_protein_complex.pdb \
-partners P_R \
-nstruct 100
高级功能
Rosetta Scripts
Rosetta Scripts是一个基于XML的接口,允许用户创建自定义Rosetta协议:
~/rosetta/main/source/bin/rosetta_scripts.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s input.pdb \
-parser:protocol my_protocol.xml \
-nstruct 10
示例脚本(my_protocol.xml):
<ROSETTASCRIPTS>
<SCOREFXNS>
<ScoreFunction name="ref15" weights="ref2015"/>
<ScoreFunction name="ref15_soft" weights="ref2015_soft"/>
</SCOREFXNS>
<RESIDUE_SELECTORS>
<Chain name="chainA" chains="A"/>
<Chain name="chainB" chains="B"/>
<Interface name="interface" chain1="chainA" chain2="chainB" cutoff="8.0"/>
</RESIDUE_SELECTORS>
<TASKOPERATIONS>
<InitializeFromCommandline name="init"/>
<RestrictToRepacking name="repack_only"/>
<ReadResfile name="design_interface" filename="interface_design.resfile"/>
</TASKOPERATIONS>
<FILTERS>
<Ddg name="ddg" scorefxn="ref15" threshold="-15"/>
<Sasa name="sasa" threshold="800"/>
</FILTERS>
<MOVERS>
<Docking name="dock" score_high="ref15"/>
<PackRotamersMover name="design" scorefxn="ref15" task_operations="init,design_interface"/>
<MinMover name="minimize" scorefxn="ref15" chi="1" bb="1"/>
</MOVERS>
<PROTOCOLS>
<Add mover="dock"/>
<Add mover="design"/>
<Add mover="minimize"/>
<Add filter="ddg"/>
<Add filter="sasa"/>
</PROTOCOLS>
<OUTPUT scorefxn="ref15"/>
</ROSETTASCRIPTS>
PyRosetta
PyRosetta是Rosetta的Python接口,让用户能够通过Python脚本使用Rosetta功能:
1. 安装PyRosetta
# 在WSL中安装
pip install pyrosetta -f https://username:password@graylab.jhu.edu/download/PyRosetta4/latest/repository/linux/release/PyRosetta-4.subset.all.linux.wheel
注意:用自己的许可证用户名和密码替换username和password。
2. 基本PyRosetta脚本示例
# save as simple_pyrosetta.py
import pyrosetta
from pyrosetta import *
from pyrosetta.toolbox import *
# 初始化PyRosetta
pyrosetta.init()
# 加载蛋白质
pose = pyrosetta.pose_from_pdb("input.pdb")
# 创建评分函数
scorefxn = pyrosetta.get_fa_scorefxn()
# 评分
score = scorefxn(pose)
print(f"Initial score: {score}")
# 创建PackerTask
task = pyrosetta.standard_packer_task(pose)
task.restrict_to_repacking()
# 打包侧链
packer = pyrosetta.protocols.PackRotamersMover(scorefxn, task)
packer.apply(pose)
# 重新评分
score = scorefxn(pose)
print(f"After repacking score: {score}")
# 保存结果
pose.dump_pdb("output_repacked.pdb")
运行脚本:
python simple_pyrosetta.py
3. 高级PyRosetta脚本示例
# save as advanced_pyrosetta.py
import pyrosetta
from pyrosetta import *
from pyrosetta.toolbox import *
import pyrosetta.distributed.tasks.rosetta_scripts as rs
# 初始化
pyrosetta.init()
# 加载蛋白质
pose = pyrosetta.pose_from_pdb("input.pdb")
# 定义XML脚本
xml_script = """
<ROSETTASCRIPTS>
<SCOREFXNS>
<ScoreFunction name="ref15" weights="ref2015"/>
</SCOREFXNS>
<MOVERS>
<FastRelax name="relax" scorefxn="ref15"/>
</MOVERS>
<PROTOCOLS>
<Add mover="relax"/>
</PROTOCOLS>
</ROSETTASCRIPTS>
"""
# 运行XML脚本
rs.rosetta_scripts_task(pose, xml_script)
# 保存结果
pose.dump_pdb("output_relaxed.pdb")
Fold and Dock
用于同时预测多亚基蛋白质复合物结构:
~/rosetta/main/source/bin/fold_and_dock.default.linuxgccrelease \
-database ~/rosetta/main/database \
-in:file:fasta complex.fasta \
-in:file:native native.pdb \
-fold_and_dock:symmetry symm.def \
-out:nstruct 100
symm.def示例(二聚体C2对称):
symmetry_name C2
subunits 2
number_of_interfaces 1
E = 2*VRT0001 + 1*(VRT0001:VRT0002)
anchor_residue 1
virtual_coordinates_start
xyz VRT0001 0,0,0 0,0,1 0,1,0
xyz VRT0002 0,0,0 0,0,-1 0,-1,0
virtual_coordinates_stop
Enzyme Design
设计具有特定催化功能的酶:
~/rosetta/main/source/bin/enzyme_design.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s scaffold.pdb \
-enzdes:cstfile constraints.cst \
-enzdes:design_min_cycles 2 \
-ex1 -ex2 \
-nstruct 50
constraints.cst示例:
# 约束基质与活性位点氨基酸的相互作用
CST::BEGIN
TEMPLATE:: ATOM_MAP: 1 atom_name: C1 C2 O1
TEMPLATE:: ATOM_MAP: 2 atom_name: OG CB CA
CONSTRAINT:: distanceAB: 3.0 0.2 100.0 0
CONSTRAINT:: angle_A: 109.5 5.0 100.0 360.0
CONSTRAINT:: angle_B: 109.5 5.0 100.0 360.0
CONSTRAINT:: torsion_A: 180.0 15.0 100.0 360.0
CONSTRAINT:: torsion_AB: 180.0 15.0 100.0 360.0
CONSTRAINT:: torsion_B: 180.0 15.0 100.0 360.0
CST::END
Symmetry Modeling
对称蛋白设计:
# 创建对称定义文件
cat > C3.symm << EOF
symmetry_name C3
subunits 3
number_of_interfaces 1
E = 3*VRT0001 + 3*(VRT0001:VRT0001)
anchor_residue 1
virtual_coordinates_start
xyz VRT0001 0,0,0 0,0,1 1,0,0
virtual_coordinates_stop
EOF
# 运行对称蛋白设计
~/rosetta/main/source/bin/rosetta_scripts.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s monomer.pdb \
-parser:protocol symmetry_design.xml \
-symmetry:symmet
-nstruct 50
symmetry_design.xml示例:
```xml
<ROSETTASCRIPTS>
<SCOREFXNS>
<ScoreFunction name="ref15" weights="ref2015" symmetric="1"/>
</SCOREFXNS>
<MOVERS>
<SetupForSymmetry name="setupsymm" definition="C3.symm"/>
<SymPackRotamersMover name="symdes" scorefxn="ref15" task_operations="init,ifcdes"/>
<SymMinMover name="symmin" scorefxn="ref15"/>
</MOVERS>
<TASKOPERATIONS>
<InitializeFromCommandline name="init"/>
<DesignInterfaceOperation name="ifcdes" interface_weight="1.0" design_chain1="1" design_chain2="1"/>
</TASKOPERATIONS>
<PROTOCOLS>
<Add mover="setupsymm"/>
<Add mover="symdes"/>
<Add mover="symmin"/>
</PROTOCOLS>
<OUTPUT scorefxn="ref15"/>
</ROSETTASCRIPTS>
## 输入文件格式
### 1. PDB文件
标准蛋白质数据库格式文件,包含原子坐标和结构信息:
ATOM 1 N MET A 1 27.340 24.430 2.614 1.00 19.66 A N
ATOM 2 CA MET A 1 26.266 25.413 2.842 1.00 18.57 A C
...
### 2. FASTA文件
氨基酸或核苷酸序列文件:
protein_name
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
### 3. Resfile
指定设计或重打包的残基文件:
NATAA # 默认重新打包为相同氨基酸类型
start
10 A ALLAA # 10号位点允许任何氨基酸
11 A PIKAA YFWL # 11号位点只允许这四种氨基酸
12 A NATRO # 12号位点保持不变
### 4. Constraint Files (.cst)
定义结构约束:
AtomPair CA 10 CA 20 HARMONIC 10.0 0.5 # 残基10和20的CA原子之间距离约束为10Å
Angle CA 10 CA 15 CA 20 CIRCULARHARMONIC 120.0 5.0 # 角度约束
### 5. Fragments文件
用于结构预测的片段库(.200.9mers, .200.3mers):
position: 1 neighbors: 200
1 A1 PHE 1c3y_A 56 A 56 0.479 147.000 -127.000 175.000 -77.000 144.000 ...
...
### 6. Loops文件
指定要建模的蛋白质环区域:
LOOP 25 35 0 0 0
LOOP 78 86 0 0 0
### 7. Blueprint文件
用于从头设计骨架:
1 A . ABEGO
2 A . ABEGO
3 A . ABEGO
...
## 输出分析
### 1. 评分文件(.sc)
包含所有输出模型的能量分数和各项能量指标:
SCORE: score fa_atr fa_rep fa_sol fa_intra_rep fa_elec pro_close hbond_sr_bb hbond_lr_bb hbond_bb_sc hbond_sc dslf_fa13 rama omega fa_dun p_aa_pp yhh_planarity ref allatom_rms gdtmm gdtmm1.0 gdtmm2.0 gdtmm3.0 gdtmm4.0 gdtmm5.0 gdtmm0.5 gdtmm1_hires gdtmm0.5_hires gdthc irms maxsub maxsub2.0 rms sc_value time description
SCORE: -93.451 -367.213 58.402 195.307 0.402 -35.582 0.030 -22.613 -9.633 -9.059 -9.053 0.000 -4.066 -0.756 124.442 -8.748 0.000 -5.312 0.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 125 125 0.000 0.000 9.12 S_00000001
分析评分文件:
```bash
# 提取总分
grep "^SCORE:" scores.sc | grep -v "SCORE: score" | sort -nk2 > sorted_scores.txt
# 提取特定能量项(如界面能量)
awk '{print $2, $7}' sorted_scores.txt > interface_energy.txt
2. 结构分析
使用PyMOL进行可视化
# 启动PyMOL并加载结构
pymol output_*.pdb
# 在PyMOL中运行命令
align all, model_1
rms_cur model_*, model_1
spectrum b, rainbow
cartoon putty
show sticks, resn ALA+GLU+...
使用PyRosetta进行分析
# save as analyze_structures.py
import pyrosetta
from pyrosetta import *
from pyrosetta.toolbox import *
import numpy as np
# 初始化
pyrosetta.init()
# 加载结构
pose1 = pyrosetta.pose_from_pdb("model_1.pdb")
pose2 = pyrosetta.pose_from_pdb("model_2.pdb")
# 计算RMSD
rmsd = pyrosetta.rosetta.core.scoring.CA_rmsd(pose1, pose2)
print(f"CA RMSD: {rmsd:.3f} Å")
# 计算每个残基的SASA
sasa_calc = pyrosetta.rosetta.core.scoring.sasa.SasaCalc()
sasa_calc.calculate(pose1)
for i in range(1, pose1.total_residue()+1):
sasa = sasa_calc.get_residue_sasa(i)
print(f"Residue {i}: {sasa:.2f} Ų")
# 分析氢键网络
hbonds = pyrosetta.rosetta.core.scoring.hbonds.HBondSet()
pyrosetta.rosetta.core.scoring.hbonds.fill_hbond_set(pose1, hbonds)
for i in range(1, hbonds.nhbonds()+1):
hb = hbonds.hbond(i)
don_res = hb.don_res()
acc_res = hb.acc_res()
energy = hb.energy()
print(f"H-bond {i}: Donor {don_res} - Acceptor {acc_res}, Energy: {energy:.2f}")
批量分析多个结构
# 使用Rosetta工具分析多个结构
~/rosetta/main/source/bin/score_jd2.default.linuxgccrelease \
-database ~/rosetta/main/database \
-in:file:s model_*.pdb \
-out:file:scorefile all_scores.sc
# 计算RMSD矩阵
~/rosetta/main/source/bin/rms_analysis.default.linuxgccrelease \
-database ~/rosetta/main/database \
-in:file:s model_*.pdb \
-out:file:o rmsd_matrix.txt
3. 聚类分析
# 使用Rosetta进行聚类
~/rosetta/main/source/bin/cluster.default.linuxgccrelease \
-database ~/rosetta/main/database \
-in:file:s model_*.pdb \
-cluster:radius 2.0 \
-out:prefix clustered_
常见问题解决
1. 编译错误
问题: scons编译失败,显示缺少依赖项
解决方案:
# 安装所有必要的依赖项
sudo apt update
sudo apt install -y build-essential scons python3-dev zlib1g-dev libboost-all-dev libxml2-dev libxslt1-dev
问题: 编译中内存不足
解决方案:
# 减少并行编译线程数
./scons.py -j2 mode=release bin
# 或增加交换空间
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
2. 运行时错误
问题: 找不到数据库
解决方案:
# 确保正确指定数据库路径
-database ~/rosetta/main/database
问题: "core dumped"错误
解决方案:
# 检查内存使用情况
ulimit -a
# 增加可用内存限制
ulimit -v unlimited
问题: 输入文件格式错误
解决方案:
# 检查PDB文件格式
~/rosetta/main/source/bin/check_pdb_file.default.linuxgccrelease -s input.pdb
# 修复PDB文件
~/rosetta/main/source/bin/clean_pdb.py input.pdb A
3. 性能优化
问题: 运行速度慢
解决方案:
# 并行运行多个独立任务
for i in {1..10}; do
mkdir run_$i
cd run_$i
~/rosetta/main/source/bin/AbinitioRelax.default.linuxgccrelease \
-database ~/rosetta/main/database \
-in:file:fasta ../protein.fasta \
-nstruct 10 \
-out:file:silent abinit_$i.out &
cd ..
done
问题: 磁盘空间不足
解决方案:
# 使用二进制静默文件而非PDB文件
-out:file:silent results.out
# 提取特定模型
~/rosetta/main/source/bin/extract_pdbs.default.linuxgccrelease \
-in:file:silent results.out \
-in:file:tags S_00001 S_00002
资源与社区
1. 官方资源
- Rosetta Commons官网: https://www.rosettacommons.org/
- 文档: https://www.rosettacommons.org/docs/latest/Home
- 教程: https://www.rosettacommons.org/demos/latest/Home
- RosettaCon: 年度Rosetta用户与开发者会议
2. 社区支持
- Rosetta Forums: https://www.rosettacommons.org/forums
- Rosetta3 User Guide: https://www.rosettacommons.org/manuals/latest/rosetta3_user_guide/
- GitHub: https://github.com/RosettaCommons
3. 学习资源
-
Online Courses:
- Coursera: Computational Methods for Protein Structure Prediction
- edX: Protein Design and Ligand Design with Rosetta
-
推荐书籍:
- "Protein Structure Prediction: A Practical Approach" by M.J.E. Sternberg
- "Computational Protein Design" by Gert-Jan Bekker
4. 常用第三方工具
-
结构可视化:
- PyMOL: https://pymol.org/
- UCSF Chimera: https://www.cgl.ucsf.edu/chimera/
- VMD: https://www.ks.uiuc.edu/Research/vmd/
-
序列分析:
- BLAST: https://blast.ncbi.nlm.nih.gov/
- HMMER: http://hmmer.org/
- Clustal Omega: https://www.ebi.ac.uk/Tools/msa/clustalo/
-
结构验证:
- MolProbity: http://molprobity.biochem.duke.edu/
- PROCHECK: https://www.ebi.ac.uk/thornton-srv/software/PROCHECK/
高级应用案例
1. 蛋白质界面优化
# 1. 首先使用界面分析器
~/rosetta/main/source/bin/InterfaceAnalyzer.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s complex.pdb \
-interface A_B \
-pack_input true \
-out:file:score_only interface_analysis.sc
# 2. 创建界面设计resfile
cat > interface_design.resfile << EOF
NATAA
start
# 基于界面分析的结果选择残基
45 A ALLAA
46 A ALLAA
72 B PIKAA DEHKR
73 B PIKAA DEHKR
EOF
# 3. 运行界面设计
~/rosetta/main/source/bin/rosetta_scripts.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s complex.pdb \
-parser:protocol interface_design.xml \
-nstruct 100
interface_design.xml示例:
<ROSETTASCRIPTS>
<SCOREFXNS>
<ScoreFunction name="ref15" weights="ref2015"/>
<ScoreFunction name="soft_rep" weights="ref2015" >
<Reweight scoretype="fa_rep" weight="0.5"/>
</ScoreFunction>
</SCOREFXNS>
<RESIDUE_SELECTORS>
<Chain name="chainA" chains="A"/>
<Chain name="chainB" chains="B"/>
<Interface name="interface" chain1="chainA" chain2="chainB" cutoff="8.0"/>
<Neighborhood name="neighbor" selector="interface" distance="4.0"/>
<Union name="design_residues" selectors="interface,neighbor"/>
</RESIDUE_SELECTORS>
<TASKOPERATIONS>
<InitializeFromCommandline name="init"/>
<ReadResfile name="res_file" filename="interface_design.resfile"/>
<OperateOnResidueSubset name="restrict_to_interface" selector="design_residues">
<RestrictToRepackingRLT/>
</OperateOnResidueSubset>
</TASKOPERATIONS>
<FILTERS>
<InterfaceHoles name="holes" jump="1" threshold="200"/>
<Ddg name="ddg_filter" scorefxn="ref15" threshold="-15" jump="1"/>
<ShapeComplementarity name="sc" jump="1" verbose="false" min_sc="0.65"/>
<PackStat name="pstat" threshold="0.58"/>
</FILTERS>
<MOVERS>
<PackRotamersMover name="design" scorefxn="soft_rep" task_operations="init,res_file,restrict_to_interface"/>
<MinMover name="minimize" scorefxn="ref15" chi="1" bb="1"/>
<FilterReportAsPDBInfo name="report_sc" filter="sc"/>
<FilterReportAsPDBInfo name="report_ddg" filter="ddg_filter"/>
</MOVERS>
<PROTOCOLS>
<Add mover="design"/>
<Add mover="minimize"/>
<Add filter="holes"/>
<Add filter="ddg_filter"/>
<Add filter="sc"/>
<Add filter="pstat"/>
<Add mover="report_sc"/>
<Add mover="report_ddg"/>
</PROTOCOLS>
<OUTPUT scorefxn="ref15"/>
</ROSETTASCRIPTS>
2. 稳定性改进设计
# 1. 首先运行Rosetta弛豫和稳定性分析
~/rosetta/main/source/bin/relax.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s protein.pdb \
-relax:fast \
-out:suffix _relaxed
# 2. 创建稳定性设计XML脚本
cat > stability_design.xml << EOF
<ROSETTASCRIPTS>
<SCOREFXNS>
<ScoreFunction name="ref15" weights="ref2015"/>
</SCOREFXNS>
<RESIDUE_SELECTORS>
<SecondaryStructure name="sheet" ss="E" />
<SecondaryStructure name="helix" ss="H" />
<Not name="loop" selector="sheet">
<SecondaryStructure ss="H"/>
</Not>
<Layer name="surface" select_core="false" select_boundary="false" select_surface="true" core_cutoff="2.0" surface_cutoff="1.0"/>
<Layer name="boundary" select_core="false" select_boundary="true" select_surface="false"/>
<Layer name="core" select_core="true" select_boundary="false" select_surface="false"/>
</RESIDUE_SELECTORS>
<TASKOPERATIONS>
<InitializeFromCommandline name="init"/>
<RestrictToRepacking name="rtrp"/>
<OperateOnResidueSubset name="design_hydrophobic_core" selector="core">
<RestrictAbsentCanonicalAASRLT aas="ACFGILMPVWY"/>
</OperateOnResidueSubset>
<OperateOnResidueSubset name="design_boundary" selector="boundary">
<RestrictAbsentCanonicalAASRLT aas="ADEFGHIKLMNPQRSTVWY"/>
</OperateOnResidueSubset>
<OperateOnResidueSubset name="design_surface" selector="surface">
<RestrictAbsentCanonicalAASRLT aas="DEHKNPQRST"/>
</OperateOnResidueSubset>
<DesignRestrictions name="turn_glycine">
<Action residue_selector="loop" aas="G"/>
</DesignRestrictions>
<DesignRestrictions name="helix_favoring">
<Action residue_selector="helix" aas="AEKLMQR"/>
</DesignRestrictions>
<DesignRestrictions name="sheet_favoring">
<Action residue_selector="sheet" aas="FILVWY"/>
</DesignRestrictions>
</TASKOPERATIONS>
<FILTERS>
<PackStat name="packstat" threshold="0.58"/>
<BuriedUnsatHbonds name="buried_unsat_hbonds" cutoff="5"/>
<CavityVolume name="cavity" threshold="10"/>
<SSPrediction name="ss_prediction" threshold="0.8" use_svm="1"/>
<EnergyCutoff name="score_cut" scorefxn="ref15" cutoff="0" energy_type="total_score"/>
</FILTERS>
<MOVERS>
<FastDesign name="fast_design" scorefxn="ref15" task_operations="init,design_hydrophobic_core,design_boundary,design_surface,turn_glycine,helix_favoring,sheet_favoring" repeats="5"/>
<MinMover name="minimize" scorefxn="ref15" chi="1" bb="1"/>
</MOVERS>
<PROTOCOLS>
<Add mover="fast_design"/>
<Add mover="minimize"/>
<Add filter="packstat"/>
<Add filter="buried_unsat_hbonds"/>
<Add filter="cavity"/>
<Add filter="ss_prediction"/>
<Add filter="score_cut"/>
</PROTOCOLS>
<OUTPUT scorefxn="ref15"/>
</ROSETTASCRIPTS>
EOF
# 3. 运行稳定性设计
~/rosetta/main/source/bin/rosetta_scripts.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s protein_relaxed.pdb \
-parser:protocol stability_design.xml \
-nstruct 100
3. 多状态设计
# 创建多状态设计XML脚本
cat > multistate_design.xml << EOF
<ROSETTASCRIPTS>
<SCOREFXNS>
<ScoreFunction name="ref15" weights="ref2015"/>
</SCOREFXNS>
<RESIDUE_SELECTORS>
<Chain name="chainA" chains="A"/>
<Chain name="chainB" chains="B"/>
<Chain name="chainC" chains="C"/>
<Interface name="AB_interface" chain1="chainA" chain2="chainB" cutoff="8.0"/>
<Interface name="AC_interface" chain1="chainA" chain2="chainC" cutoff="8.0"/>
<Union name="design_residues" selectors="AB_interface,AC_interface"/>
</RESIDUE_SELECTORS>
<TASKOPERATIONS>
<InitializeFromCommandline name="init"/>
<OperateOnResidueSubset name="design_interfaces" selector="design_residues">
<RestrictToRepackingRLT/>
</OperateOnResidueSubset>
<DisallowIfNonnative name="native_aa" disallow_aas="ACDEFGHIKLMNPQRSTVWY"/>
</TASKOPERATIONS>
<FILTERS>
<Ddg name="ddg_AB" scorefxn="ref15" threshold="-15" jump="1"/>
<Ddg name="ddg_AC" scorefxn="ref15" threshold="-15" jump="2"/>
</FILTERS>
<MOVERS>
<MakeMultimerMover name="make_multimer" states="2">
<State filename="AB_complex.pdb"/>
<State filename="AC_complex.pdb"/>
</MakeMultimerMover>
<MutateResidue name="mutate1" target="10" new_res="ALA"/>
<MultiStateDesign name="msd" scorefxn="ref15" task_operations="init,design_interfaces,native_aa">
<States>
<State state_name="AB" reference="true"/>
<State state_name="AC"/>
</States>
</MultiStateDesign>
</MOVERS>
<PROTOCOLS>
<Add mover="make_multimer"/>
<Add mover="msd"/>
<Add filter="ddg_AB"/>
<Add filter="ddg_AC"/>
</PROTOCOLS>
<OUTPUT scorefxn="ref15"/>
</ROSETTASCRIPTS>
EOF
# 运行多状态设计
~/rosetta/main/source/bin/rosetta_scripts.default.linuxgccrelease \
-database ~/rosetta/main/database \
-parser:protocol multistate_design.xml \
-nstruct 50
4. 设计金属结合位点
# 1. 创建金属结合位点设计XML脚本
cat > metal_site_design.xml << EOF
<ROSETTASCRIPTS>
<SCOREFXNS>
<ScoreFunction name="ref15" weights="ref2015"/>
<ScoreFunction name="metal_score" weights="ref2015">
<Reweight scoretype="coordinate_constraint" weight="1.0"/>
<Reweight scoretype="atom_pair_constraint" weight="1.0"/>
<Reweight scoretype="angle_constraint" weight="1.0"/>
<Reweight scoretype="dihedral_constraint" weight="1.0"/>
</ScoreFunction>
</SCOREFXNS>
<RESIDUE_SELECTORS>
<ResiduePDBInfoHasLabel name="catalytic_residues" property="METAL_BINDING"/>
<Neighborhood name="shell1" selector="catalytic_residues" distance="8.0"/>
<Neighborhood name="shell2" selector="shell1" distance="8.0"/>
</RESIDUE_SELECTORS>
<TASKOPERATIONS>
<InitializeFromCommandline name="init"/>
<OperateOnResidueSubset name="design_catalytic" selector="catalytic_residues">
<RestrictAbsentCanonicalAASRLT aas="DEHKYCST"/>
</OperateOnResidueSubset>
<OperateOnResidueSubset name="design_shell1" selector="shell1">
<RestrictToRepackingRLT/>
</OperateOnResidueSubset>
<OperateOnResidueSubset name="design_shell2" selector="shell2">
<PreventRepackingRLT/>
</OperateOnResidueSubset>
</TASKOPERATIONS>
<FILTERS>
<MetalContact name="metal_contact" confidence="0.8" catalytic_residues="catalytic_residues"/>
<BuriedUnsatHbonds name="buried_unsat" jump_number="0" cutoff="5" residue_surface_cutoff="20.0" ignore_surface_res="true"/>
</FILTERS>
<MOVERS>
<AddConstraints name="add_cst">
<AddZincCoordinationConstraints name="zinc_cst" coordinating_residues="10,14,54,78" metal_id="1000" CA_distance_cutoff="6.0" coordinators="NE2 OE1 OE2 ND1"/>
</AddConstraints>
<PackRotamersMover name="design" scorefxn="metal_score" task_operations="init,design_catalytic,design_shell1,design_shell2"/>
<MinMover name="minimize" scorefxn="metal_score" chi="1" bb="1"/>
</MOVERS>
<PROTOCOLS>
<Add mover="add_cst"/>
<Add mover="design"/>
<Add mover="minimize"/>
<Add filter="metal_contact"/>
<Add filter="buried_unsat"/>
</PROTOCOLS>
<OUTPUT scorefxn="ref15"/>
</ROSETTASCRIPTS>
EOF
# 2. 运行金属结合位点设计
~/rosetta/main/source/bin/rosetta_scripts.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s protein.pdb \
-parser:protocol metal_site_design.xml \
-nstruct 50
5. 非天然氨基酸整合设计
# 1. 准备非天然氨基酸参数文件
~/rosetta/main/source/scripts/python/public/ncaa_to_params.py \
-n UAA \
--no-pdb \
--clobber \
uaa.mol2
# 2. 创建包含非天然氨基酸的设计XML脚本
cat > ncaa_design.xml << EOF
<ROSETTASCRIPTS>
<SCOREFXNS>
<ScoreFunction name="ref15" weights="ref2015"/>
</SCOREFXNS>
<RESIDUE_SELECTORS>
<Index name="target_res" resnums="10,25,42"/>
<Neighborhood name="nbr" selector="target_res" distance="8.0"/>
</RESIDUE_SELECTORS>
<TASKOPERATIONS>
<InitializeFromCommandline name="init"/>
<IncludeCurrent name="include_current"/>
<ExtraRotamersGeneric name="ex_rot" ex1="1" ex2="1" extrachi_cutoff="0"/>
<LimitAromaChi2 name="limit_aro" include_trp="true"/>
<OperateOnResidueSubset name="design_target" selector="target_res">
<RestrictAbsentCanonicalAASRLT aas="X"/>
</OperateOnResidueSubset>
<OperateOnResidueSubset name="repack_nbr" selector="nbr">
<RestrictToRepackingRLT/>
</OperateOnResidueSubset>
</TASKOPERATIONS>
<MOVERS>
<PackRotamersMover name="design" scorefxn="ref15" task_operations="init,include_current,ex_rot,limit_aro,design_target,repack_nbr"/>
<MinMover name="minimize" scorefxn="ref15" chi="1" bb="1"/>
</MOVERS>
<PROTOCOLS>
<Add mover="design"/>
<Add mover="minimize"/>
</PROTOCOLS>
<OUTPUT scorefxn="ref15"/>
</ROSETTASCRIPTS>
EOF
# 3. 运行非天然氨基酸设计
~/rosetta/main/source/bin/rosetta_scripts.default.linuxgccrelease \
-database ~/rosetta/main/database \
-s protein.pdb \
-parser:protocol ncaa_design.xml \
-extra_res_fa UAA.params \
-nstruct 50