Steady Fair Share

我们计算steady fair share的时候，计算的都是加权值，即权重不同的队列，将获得不同的steady fair share，权重越大，获取的steady fair share越多，反之越小。steady fair share和instaneous fair share的计算，其实就是根据队列的权重配置决定预分配给他们多少资源，同时，这个资源又在minShare和maxShare的限制之下
因此想要找到一个R（weight-to-slots）来尽可能满足：

R*（Queue1Weights + Queue2Weights+...+QueueNWeights） <=totalResource
R*QueueWeights >= minShare
R*QueueWeights <= maxShare

private static int handleFixedFairShares(
      Collection<? extends Schedulable> schedulables,
      Collection<Schedulable> nonFixedSchedulables,
      boolean isSteadyShare, ResourceType type) {
    int totalResource = 0;//所有队列的总资源求和

    for (Schedulable sched : schedulables) {
      //如果是一个fixed队列，则fixedShare为该队列的minShare或者0
      int fixedShare = getFairShareIfFixed(sched, isSteadyShare, type);
      if (fixedShare < 0) {
        //不是fixed，即maxShare 和weight不是0， 并且这个Schedulable是一个active的
        //则将这个Schedulable保存在nonFixedSchedulables中返回
        nonFixedSchedulables.add(sched);
      } else {
        //如果是fix队列，那么分两种情况：
        // 如果isSteadyShare=true,即我们计算的是steady fairshares,则将其steady fair share设置为fixedShare,因为只有固定队列才将fixedShare计算在内
        // 如果isSteadyShare=false,即我们计算的instaneous fair share,则将这个Schedulable的fairShare(
        //即instaneous fair share)设置为fixedShare
        setResourceValue(fixedShare,
            isSteadyShare
                ? ((FSQueue)sched).getSteadyFairShare()
                : sched.getFairShare(),
            type);
        totalResource = (int) Math.min((long)totalResource + (long)fixedShare,
            Integer.MAX_VALUE);
      }
    }
    //返回我们计算得到的所有的 fix-schedulable的资源之和
    return totalResource;
  }

对fix-schedulable计算其FairShare

计算fixed-schedulable的固定资源,fix-schedulable的资源不具有动态性，
因此可以直接进行计算

private static int getFairShareIfFixed(Schedulable sched,
      boolean isSteadyShare, ResourceType type) {

   // 检查MaxShare最大资源数是否<=0，如果<=0，说明是一个fixed队列，并且fixed资源是0
    if (getResourceValue(sched.getMaxShare(), type) <= 0) {
      return 0;
    }

    //如果我们当前计算的是instaneous ,并且这个队列上没有任何运行的app，那么认为这是一个fix队列，并且fair scheduler是0
    if (!isSteadyShare &&
        (sched instanceof FSQueue) && !((FSQueue)sched).isActive()) {
      return 0;
    }

    // 如果weight <= 0，也说明是一个fix队列，
   //  此时需要根据minShare的配置确定它的fair share, 
   // (minShare <= 0) ? 0 : minShare;
    if (sched.getWeights().getWeight(type) <= 0) {
      int minShare = getResourceValue(sched.getMinShare(), type);
      return (minShare <= 0) ? 0 : minShare;
    }

    //这个队列maxShare大于0 并且（isSteadyShare = true 或者 队列是活跃的 ）并且  weight > 0, 则返回 -1,代表这是一个non-fixed队列
    return -1;
  }

什么样的Schedulable才是一个fixed schedulable：
对于Steady Fair Share，只要这个队列的maxShare配置合法(是一个大于0的数)并且weight值大于0，那么这个队列就应该参与瓜分Steady Fair Share，他就是non-fixed队列；由于Steady Fair Share中对于non-fixed的定义只和weight以及maxShare的配置相关，因此在yarn运行过程中一直保持不变；在weight<0 || maxShare<=0 是,对于Steady Fair Share 取minShare

对于Instaneous Fair Share，除了他的maxShare 和 weight配置合法外，还必须要求它上面必须有正在运行的app，这样的队列才能够瓜分Instaneous Fair Share，即，如果一个队列上没有正在运行的app，那么它的Instaneous Fair Share是0，无法参与瓜分Instaneous Fair Share。因此，如果一个队列上运行的app全部运行完毕，其它队列会因此而分到更多的Instaneous Fair Share。

computeSharesInternal

  private static void computeSharesInternal(
      Collection<? extends Schedulable> allSchedulables,
      Resource totalResources, ResourceType type, boolean isSteadyShare) {

    Collection<Schedulable> schedulables = new ArrayList<Schedulable>();
    //所有计算完毕准备取走的resource
    int takenResources = handleFixedFairShares(
        allSchedulables, schedulables, isSteadyShare, type);

    if (schedulables.isEmpty()) {
      return;
    }
    //找到一个我们准备用在二分查找的R的上限值，我们将R初始化为1，然后每次翻倍，直到所有的Schedulable
    //已经用完了所有的资源或者所有Schedulables已经达到了自己的maxShare
    // Find an upper bound on R that we can use in our binary search. We start
    // at R = 1 and double it until we have either used all the resources or we
    // have met all Schedulables' max shares.
    int totalMaxShare = 0;//non-fix队列的所有max-share之和
    for (Schedulable sched : schedulables) {//对于non-fixed队列，计算maxShare之和
      int maxShare = getResourceValue(sched.getMaxShare(), type);
      totalMaxShare = (int) Math.min((long)maxShare + (long)totalMaxShare,
          Integer.MAX_VALUE);
      if (totalMaxShare == Integer.MAX_VALUE) {
        break;
      }
    }

    //集群总资源，减去 已经计算完毕的fix队列的资源，得到剩下的non-fix的资源总量，这一部分资源，是可分配的资源
    int totalResource = Math.max((getResourceValue(totalResources, type) -
        takenResources), 0);
    //所有non-fix队列的maxShare加起来小于 totalResource（集群总资源减去fix队列资源量的和的剩余值）,则只需要所有maxShare的和就可以了，否则，需要totalResuorce（集群总资源减去fix队列资源量的和的剩余值）
    totalResource = Math.min(totalMaxShare, totalResource);

    double rMax = 1.0;
    while (resourceUsedWithWeightToResourceRatio(rMax, schedulables, type)
        < totalResource) {
      rMax *= 2.0;
    }
    //获取了一个最大值，可以在0和这个最大值之间进行二分查找了。二分查找结束以后，right 值就几乎逼近了non fix队列的可用资源值
    //为了防止无限迭代下去，设置COMPUTE_FAIR_SHARES_ITERATIONS限制迭代次数
    // Perform the binary search for up to COMPUTE_FAIR_SHARES_ITERATIONS steps
    double left = 0;
    double right = rMax;
    for (int i = 0; i < COMPUTE_FAIR_SHARES_ITERATIONS; i++) {
      double mid = (left + right) / 2.0;//折半查找
      int plannedResourceUsed = resourceUsedWithWeightToResourceRatio(
          mid, schedulables, type);
      if (plannedResourceUsed == totalResource) {
        right = mid;
        break;
      } else if (plannedResourceUsed < totalResource) {
        left = mid;//折半查找，更新左侧下限值
      } else {
        right = mid;//折半查找，更新右侧上限值
      }
    }
    //二分查找完毕，right中存放了正确的R值
    // Set the fair shares based on the value of R we've converged to
    for (Schedulable sched : schedulables) {
      if (isSteadyShare) {//如果是计算steady fair share , 则设置这个steady fair share值
        setResourceValue(computeShare(sched, right, type),
            ((FSQueue) sched).getSteadyFairShare(), type);//根据全局的right值设置这个队列的steady fair share 值
      } else {
        setResourceValue(//否则，//根据全局的right值设置这个队列的 fair share 值
            computeShare(sched, right, type), sched.getFairShare(), type);
      }
    }
  }

Yarn FairScheduler每个队列的weight值不一定相同，它代表了这个队列所能分配到的资源的比例(当然，最终分配给它的fair share值会受到minShare和maxShare限制)，因此，我们有必要了解加权的公平分配代表什么：如果所有Schedulable权重相同，并且minShare和maxShare也没有差异，那么它们通过均分集群资源得到属于自己的fair share。但是当每个队列有了不同的minShare和maxShare，那么实际分配到的资源就出现差异了，因为yarn会满足队列的minShare的资源需求，同时，分配给他的资源也永远不会大于maxShare。

我们假定如果存在这样一个R值，并满足以下条件，那么，我们认为完成了公平分配：

对于一个Schedulable，如果它的minShare > R * S.weight，即实际分配到的资源竟然比自己的minShare还小，这种情况是自己本身完全不能容忍的，那么这个Schedulable被分配的fair share 将是它的minShare；
如果它的maxShare < R * S.weight，即这个Scheduler分配到的资源竟然大于自己的maxShare，显然，这将造成资源浪费，那么这个Schedulable被分配的fairShare是它的maxShare；
其它的Schedulable被分配的资源是R * S.weight。

确定一个Schedulable能够分配到的资源

那么，根据当前r值，怎么确定一个Schedulable能够分配到的资源呢，在方法ComputFairShares.computeShare()中：

/**
   * 如果sched.getWeights().getWeight(type) * w2rRatio;介于minShare 和 maxShare之间，则直接返回sched.getWeights().getWeight(type) * w2rRatio;，
   * 否则，如果sched.getWeights().getWeight(type) * w2rRatio;小于minShare , 则使用minShare
   * 如果sched.getWeights().getWeight(type) * w2rRatio  > maxShare, 则使用maxShare
   */
  private static int computeShare(Schedulable sched, double w2rRatio,
      ResourceType type) {
    double share = sched.getWeights().getWeight(type) * w2rRatio;
    share = Math.max(share, getResourceValue(sched.getMinShare(), type));//取share和minShare中的较大值
    share = Math.min(share, getResourceValue(sched.getMaxShare(), type));//取share和maxShare中的较小值
    return (int) share;
  }

Instantaneous Fair Share 原理和计算方式

总结

Yarn中的steady fair share值和Instaneous Fair Share值都代表了当前分配给这个队列的最大资源值，也是队列在任何时候资源使用量不可以超过的值，但是他们存在区别。

对于steady fair share，是一个静态值，是Yarn根据每个队列的minShare、maxShare和weight的配置计算得到的理论上应该分配给这个队列的最大资源，它与这个队列当前是否有app正在运行无关，只和我们在fair-scheduler.xml中的配置有关。

而Instaneous fair share则不同，它是根据当前集群中队列的运行状态的变化而实时变化的，即，如果一个队列上没有任何一个app在运行，即这个队列是inactive队列，那么，这个队列的instaneous fair share值是0，剩余的active队列会对集群资源进行瓜分，显然，如果集群中有队列从active变为inactive，那么剩余这些队列瓜分到的instaneous fair shared都会随之变大，反之，如果有一个队列从inactive变为active，则剩余每个队列的instaneous fair share会随之变小，即instaneous fair share会变小。

因此，yarn运行过程中实际上是用instaneous fair share值作为队列当前最大可使用的资源。比如，我们在为一个ApplicationMaster分配container的时候，必须保证不超过我们配置的maxAMShare值：


/**
 * 判断当前队列是否能够运行一个ApplicationMaster应用
 * @param amResource 需要运行的am的资源量
 * @return true if this queue can run
 */
public boolean canRunAppAM(Resource amResource) {
  float maxAMShare = //获取队列的maxAMShare参数,即队列允许用来运行AM的资源总量
      scheduler.getAllocationConfiguration().getQueueMaxAMShare(getName());
  if (Math.abs(maxAMShare - -1.0f) < 0.0001) { //如果配置的值为-1.0f，则说明没有限制
    return true;
  }
  //计算队列中可以用来运行AM的最大资源量，即，用队列的FairShare * maxAMShare，这里的fair share指的是instaneous fair share值
  Resource maxAMResource = Resources.multiply(getFairShare(), maxAMShare);
  Resource ifRunAMResource = Resources.add(amResourceUsage, amResource); //计算如果运行了这个am以后这个队列所有的am的资源使用量
  return !policy
      .checkIfAMResourceUsageOverLimit(ifRunAMResource, maxAMResource);  //对于默认的FairSharePolicy,判断如果运行了这个am，是否超过了maxAMShare的限制
}

Resource maxAMResource = Resources.multiply(getFairShare(), maxAMShare);用的就是Instaneous fair share作为当前队列最大可使用资源，然后乘以maxAMShare，获得队列最大可用来运行ApplicationMaster的资源量
————————————————
版权声明：本文为CSDN博主「小昌昌的博客」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/zhanyuanlin/article/details/72667293

Fair Share计算实现原理