R语言计算一段氨基酸序列的疏水性分布及可视化

采用的方法是Kyte-Doolittle Hydrophilicity。

Kyte & Doolittle Analysis
Kyte & Doolittle values fall within a range of +4 to -4, with hydrophilic residues having a negative score. The most hydrophilic reside has a value of -4.5 (arginine). On the graphic display, values above the axis line are hydrophilic; values below the axis line are hydrophobic.
Kyte & Doolittle的评分范围在 +4 到 -4 之间,亲水性残基的评分呈负值。最亲水的残基其评分值为 -4.5(精氨酸)。在图表展示中,位于轴线以上的值表示亲水性;位于轴线以下的值表示疏水性。
Kyte & Doolittle represents a composite hydrophobicity scale derived from interpretation of free energy changes on a water-vapor phase transition and an analysis of buried side chains. Each value is the average of the values of 5 adjacent residues and is plotted at the middle residue. The range of values is approximately ±4 relative units.
Kyte & Doolittle所采用的是一种综合的疏水性衡量标准,该标准是基于对水-蒸汽相变过程中自由能变化的解读以及对隐藏侧链的分析得出的。每个数值都是 5 个相邻残基数值的平均值,并标注在中间的残基位置。数值范围大约在±4 个相对单位之间。
Reference: Kyte, Jack & Russell F. Doolittle. 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157:105-132.

先定义每个氨基酸的疏水性值。

kd_scale <- c(
  "A" = 1.8, "C" = 2.5, "D" = -3.5, "E" = -3.5,
  "F" = 2.8, "G" = -0.4, "H" = -3.2, "I" = 4.5,
  "K" = -3.9, "L" = 3.8, "M" = 1.9, "N" = -3.5,
  "P" = -1.6, "Q" = -3.5, "R" = -4.5, "S" = -0.8,
  "T" = -0.7, "V" = 4.2, "W" = -0.9, "Y" = -1.3
)

输入序列

library(stringr)
library(Biostrings)
myfiles <- list.files(pattern = "*.fasta")
myfiles
protein_sequences <- readAAStringSet(myfiles[2])
# 查看序列信息
protein_sequences
aa_sequence = protein_sequences[1]|> as.character()
# 拆分成单个氨基酸
aa_split <- strsplit(aa_sequence, "")[[1]]

设置窗口大小

window_size <- 5

# 计算滑动窗口的平均疏水性
hydrophobicity_window <- sapply(
  1:(length(aa_split) - window_size + 1),
  function(i) {
    window <- aa_split[i:(i + window_size - 1)]  # 提取当前窗口的氨基酸
    mean(kd_scale[window], na.rm = TRUE)        # 计算均值
  }
)
# 输出结果
hydrophobicity_window

绘制滑动窗口疏水性变化

plot(-hydrophobicity_window, 
     type = "l", 
     col = "blue", 
     xlab = "窗口位置", 
     ylab = "平均疏水性 (Kyte-Doolittle)", 
     main = "蛋白质序列的滑动窗口疏水性分析")

标记疏水性峰值

abline(h = 0, col = "red", lty = 2)  # 疏水性=0的参考线
abline(v = 36, col = "green", lty = 1)  # 疏水性=0的参考线
points(which.max(hydrophobicity_window), 
       max(hydrophobicity_window), 
       col = "red", pch = 19)  # 最高疏水性点
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容