思路整合

亲缘系数的计算原则就是利用两个个体之间共享等位基因的程度。因为亲缘系数计算的是两个个体之间的关系，那么它与个体杂合度的关系就不能建立，我想暂时建立个体之间杂合位点的差异程度与亲缘系数之间的关系。亲缘系数很好计算，个体之间杂合位点的差异程度可以将个体的杂合位点都挑选出来，然后比对个体之间位点不同的概率，如果A个体杂合位点数为100，B个体的杂合位点数为120，两者位点数为相同的数量为50，那么不相同的概率为1-（2*50）/ （100+120）。计算出位点不相同的概率再与亲缘系数进行比较。预测，位点不相同的概率越高，亲缘系数越小，两者成反比。再看相关系数R2。
我们保证种群的遗传多样性首先就是要保证种群的基因组杂合度。除了直观反应杂合度的系数，还有ROH和遗传负荷load也能间接反映杂合度，比如我们想要避免近交，就是要考虑个体之间的ROH，我们不希望个体的长段ROH一直保留我们希望在进行配对时，有更多的机会将长段ROH在后代中变为短段ROH。这是其中一个目标。这其实就是间接利用基因组中变异位点的组合产生更多的杂合位点实现的。另一个目标就是，对遗传负荷的关注，我们希望遗传负荷的位点不要纯合，尽量保证该位点为无害突变或者为隐性杂合负荷。
我们把基因组杂合度看成一把尺子，这把尺子很长，能从全局来展示个体的杂合情况。基因组的功能能让我们关注更多的局部表现。比如我们能让基因组这种长段ROH变少，同时也能避免更多的配对导致遗传负荷变得纯合。

1.杂合度与ROH数量相关性

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the data from the uploaded file
file_path = 'het_rohnumber.xlsx'
data = pd.read_excel(file_path)

# Calculate the correlation coefficient for each species
correlation_data = data.groupby('species').apply(lambda x: x['het'].corr(x['ROH(number)']) ** 2)

# Set the style for the plot
sns.set(style="whitegrid")

# Create a scatter plot with the R^2 values annotated
plt.figure(figsize=(10, 6))
scatter_plot = sns.scatterplot(data=data, x='het', y='ROH(number)', hue='species', palette='deep', s=100)

# Annotate the R^2 values on the plot
for species, r_squared in correlation_data.items():
    x_pos = data[data['species'] == species]['het'].mean()
    y_pos = data[data['species'] == species]['ROH(number)'].mean()
    plt.text(x_pos, y_pos, f'R² = {r_squared:.3f}', fontsize=12, ha='right')

# Set the title and labels
scatter_plot.set_title('Correlation between Heterozygosity and ROH Number by Species with R² values')
scatter_plot.set_xlabel('Heterozygosity')
scatter_plot.set_ylabel('ROH Number')

# Display the plot with legend
plt.legend(title='Species')

# Save the plot as a PDF file
output_file_path = 'het_roh_correlation.pdf'
plt.savefig(output_file_path, format='pdf')

# Show the plot
plt.show()

2.杂合度与ROH总长度相关性

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the data from the uploaded file
file_path = 'het_rohlength.xlsx'
data = pd.read_excel(file_path)

# Calculate the correlation coefficient for each species
correlation_data = data.groupby('species').apply(lambda x: x['het'].corr(x['ROH(KB)']) ** 2)

# Set the style for the plot
sns.set(style="whitegrid")

# Create a scatter plot with the R^2 values annotated
plt.figure(figsize=(8, 6))
scatter_plot = sns.scatterplot(data=data, x='het', y='ROH(KB)', hue='species', palette='deep', s=100)

# Annotate the R^2 values on the plot
for species, r_squared in correlation_data.items():
    x_pos = data[data['species'] == species]['het'].mean()
    y_pos = data[data['species'] == species]['ROH(KB)'].mean()
    plt.text(x_pos, y_pos, f'R² = {r_squared:.3f}', fontsize=12, ha='right')

# Set the title and labels
scatter_plot.set_title('Correlation between Heterozygosity and ROH Length by Species with R² values')
scatter_plot.set_xlabel('Heterozygosity')
scatter_plot.set_ylabel('ROH Length')

# Display the plot with legend
plt.legend(title='Species')

# Save the plot as a PDF file
output_file_path = 'het_length_correlation.pdf'
plt.savefig(output_file_path, format='pdf')

# Show the plot
plt.show()

3.杂合度与KBAVG相关性

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the data from the uploaded file
file_path = 'het_AveKB.xlsx'
data = pd.read_excel(file_path)

# Calculate the correlation coefficient for each species
correlation_data = data.groupby('species').apply(lambda x: x['het'].corr(x['ROH(KBAVG)']) ** 2)

# Set the style for the plot
sns.set(style="whitegrid")

# Create a scatter plot with the R^2 values annotated
plt.figure(figsize=(8, 6))
scatter_plot = sns.scatterplot(data=data, x='het', y='ROH(KBAVG)', hue='species', palette='deep', s=100)

# Annotate the R^2 values on the plot
for species, r_squared in correlation_data.items():
   x_pos = data[data['species'] == species]['het'].mean()
   y_pos = data[data['species'] == species]['ROH(KBAVG)'].mean()
   plt.text(x_pos, y_pos, f'R² = {r_squared:.3f}', fontsize=12, ha='right')

# Set the title and labels
scatter_plot.set_title('Correlation between Heterozygosity and ROH(KBAVG) by Species with R² values')
scatter_plot.set_xlabel('Heterozygosity')
scatter_plot.set_ylabel('ROH Ave_Len')

# Display the plot with legend
plt.legend(title='Species')

# Save the plot as a PDF file
output_file_path = 'het_AveLen_correlation.pdf'
plt.savefig(output_file_path, format='pdf')

# Show the plot
plt.show()

4.杂合度与Froh

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the data from the uploaded file
file_path = 'het_Froh.xlsx'
data = pd.read_excel(file_path)

# Calculate the correlation coefficient for each species
correlation_data = data.groupby('species').apply(lambda x: x['het'].corr(x['Froh']) ** 2)

# Set the style for the plot
sns.set(style="whitegrid")

# Create a scatter plot with the R^2 values annotated
plt.figure(figsize=(8, 6))
scatter_plot = sns.scatterplot(data=data, x='het', y='Froh', hue='species', palette='deep', s=100)

# Annotate the R^2 values on the plot
for species, r_squared in correlation_data.items():
    x_pos = data[data['species'] == species]['het'].mean()
    y_pos = data[data['species'] == species]['Froh'].mean()
    plt.text(x_pos, y_pos, f'R² = {r_squared:.3f}', fontsize=12, ha='right')

# Set the title and labels
scatter_plot.set_title('Correlation between Heterozygosity and ROH(KBAVG) by Species with R² values')
scatter_plot.set_xlabel('Heterozygosity')
scatter_plot.set_ylabel('Froh')

# Display the plot with legend
plt.legend(title='Species')

# Save the plot as a PDF file
output_file_path = 'het_Froh_correlation.pdf'
plt.savefig(output_file_path, format='pdf')

# Show the plot
plt.show()

5.杂合度与Load

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the data from the uploaded file
file_path = 'het_Load.xlsx'
data = pd.read_excel(file_path)

# Calculate the correlation coefficient for each species
correlation_data = data.groupby('species').apply(lambda x: x['het'].corr(x['Load']) ** 2)

# Set the style for the plot
sns.set(style="whitegrid")

# Create a scatter plot with the R^2 values annotated
plt.figure(figsize=(8, 6))
scatter_plot = sns.scatterplot(data=data, x='het', y='Load', hue='species', palette='deep', s=100)

# Annotate the R^2 values on the plot
for species, r_squared in correlation_data.items():
    x_pos = data[data['species'] == species]['het'].mean()
    y_pos = data[data['species'] == species]['Load'].mean()
    plt.text(x_pos, y_pos, f'R² = {r_squared:.3f}', fontsize=12, ha='right')

# Set the title and labels
scatter_plot.set_title('Correlation between Heterozygosity and Load by Species with R² values')
scatter_plot.set_xlabel('Heterozygosity')
scatter_plot.set_ylabel('Load')

# Display the plot with legend
plt.legend(title='Species')

# Save the plot as a PDF file
output_file_path = 'het_Load_correlation.pdf'
plt.savefig(output_file_path, format='pdf')

# Show the plot
plt.show()

6.杂合度与R亲缘系数

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the data from the uploaded file
file_path = 'het_R.xlsx'
data = pd.read_excel(file_path)

# Calculate the correlation coefficient for each species
correlation_data = data.groupby('species').apply(lambda x: x['het'].corr(x['Relationship']) ** 2)

# Set the style for the plot
sns.set(style="whitegrid")

# Create a scatter plot with the R^2 values annotated
plt.figure(figsize=(8, 6))
scatter_plot = sns.scatterplot(data=data, x='het', y='Relationship', hue='species', palette='deep', s=100)

# Annotate the R^2 values on the plot
for species, r_squared in correlation_data.items():
    x_pos = data[data['species'] == species]['het'].mean()
    y_pos = data[data['species'] == species]['Relationship'].mean()
    plt.text(x_pos, y_pos, f'R² = {r_squared:.3f}', fontsize=12, ha='right')

# Set the title and labels
scatter_plot.set_title('Correlation between Heterozygosity and R by Species with R² values')
scatter_plot.set_xlabel('Heterozygosity')
scatter_plot.set_ylabel('Relationship')

# Display the plot with legend
plt.legend(title='Species')

# Save the plot as a PDF file
output_file_path = 'het_R_correlation.pdf'
plt.savefig(output_file_path, format='pdf')

# Show the plot
plt.show()