K8s中明明配置了HPA，但是沒擴容

作者：喬克 2025-09-05 01:22:00

最令人興奮的是，這種測試時計算可能展現出與預訓練相似的擴展規律 —— 換言之，就像增加訓練計算量能帶來模型能力的指數級提升一樣，若在推理階段分配更多計算資源（延長思考時間），模型性能理論上也會出現可預測的指數級增長。

Kubernetes 的 Horizontal Pod Autoscaler（HPA）是一種根據觀察到的 CPU 利用率或其他自定義指標自動擴展 Pod 副本數的控制器。它在業務繁忙的時候可以有效的對 Pod 進行橫線擴展，但是最近發現明明使用率已經超過了定義的目標值，但是為何沒有擴容呢？

9906771bea31d64adb2a89a2f2b88207 MD5

為了搞清楚原由，我們從源碼層面來找找原因。

一、HPA 的整體架構與核心組件

HPA 的實現位于 Kubernetes 的 k8s.io/kubernetes/pkg/controller/podautoscaler 目錄下，主要由以下幾個組件構成：

HorizontalController：主控制器，負責監聽 HPA 和 Pod 資源，協調擴縮容。
ReplicaCalculator：計算目標副本數的核心邏輯。
MetricsClient：獲取指標數據（如 CPU、內存、自定義指標）。
ScaleClient：用于修改工作負載（如 Deployment、ReplicaSet）的副本數。

二、源碼入口：HPA 控制器的啟動

HPA 控制器在cmd/kube-controller-manager 啟動時被初始化。

在 cmd/kube-controller-manager/controllermanager.go 中的 Run() 調用 NewControllerDescriptors() 中將控制器注冊。

func NewControllerDescriptors() map[string]*ControllerDescriptor {
 ...
 register(newHorizontalPodAutoscalerControllerDescriptor())
 ...
}

然后在 cmd/kube-controller-manager/autoscaling.go 里面最終通過 startHPAControllerWithMetricsClient() 來啟動。

func newHorizontalPodAutoscalerControllerDescriptor() *ControllerDescriptor {
 return &ControllerDescriptor{
  name:     names.HorizontalPodAutoscalerController,
  aliases:  []string{"horizontalpodautoscaling"},
  initFunc: startHorizontalPodAutoscalerControllerWithRESTClient,
 }
}

func startHorizontalPodAutoscalerControllerWithRESTClient(ctx context.Context, controllerContext ControllerContext, controllerName string) (controller.Interface, bool, error) {

 ...
 return startHPAControllerWithMetricsClient(ctx, controllerContext, metricsClient)
}

func startHPAControllerWithMetricsClient(ctx context.Context, controllerContext ControllerContext, metricsClient metrics.MetricsClient) (controller.Interface, bool, error) {

 ...

 go podautoscaler.NewHorizontalController(
  ctx,
  hpaClient.CoreV1(),
  scaleClient,
  hpaClient.AutoscalingV2(),
  controllerContext.RESTMapper,
  metricsClient,
  controllerContext.InformerFactory.Autoscaling().V2().HorizontalPodAutoscalers(),
  controllerContext.InformerFactory.Core().V1().Pods(),
  controllerContext.ComponentConfig.HPAController.HorizontalPodAutoscalerSyncPeriod.Duration,
  controllerContext.ComponentConfig.HPAController.HorizontalPodAutoscalerDownscaleStabilizationWindow.Duration,
  controllerContext.ComponentConfig.HPAController.HorizontalPodAutoscalerTolerance,
  controllerContext.ComponentConfig.HPAController.HorizontalPodAutoscalerCPUInitializationPeriod.Duration,
  controllerContext.ComponentConfig.HPAController.HorizontalPodAutoscalerInitialReadinessDelay.Duration,
 ).Run(ctx, int(controllerContext.ComponentConfig.HPAController.ConcurrentHorizontalPodAutoscalerSyncs))
 return nil, true, nil
}

三、控制器核心邏輯

控制器的核心實現邏輯的代碼位于 k8s.io/kubernetes/pkg/controller/podautoscaler 中，其調用鏈路為：

Run() -> worker() -> processNextWorkItem() -> reconcileKey() -> reconcileAutoscaler()

其中主要的邏輯在 reconcileAutoscaler 中實現。

（1）使用 a.monitor.ObserveReconciliationResult(actionLabel, errorLabel, time.Since(start)) 記錄協調過程中的監控指標。

（2）使用 hpaShared.DeepCopy() 和 hpa.Status.DeepCopy() 對 hpa 和 hpaStaus 對象進行深度拷貝，避免修改共享緩存。

（3）然后對資源進行解析并實現資源映射。

// API版本解析
targetGV, err := schema.ParseGroupVersion(hpa.Spec.ScaleTargetRef.APIVersion)

// REST映射獲取
mappings, err := a.mapper.RESTMappings(targetGK)

// 獲取Scale子資源
scale, targetGR, err := a.scaleForResourceMappings(ctx, hpa.Namespace, hpa.Spec.ScaleTargetRef.Name, mappings)

其中：

schema.ParseGroupVersion : 解析目標資源的API版本
a.mapper.RESTMappings : 獲取資源的REST映射信息
a.scaleForResourceMappings : 獲取目標資源的Scale子資源

（4）對指標進行核心計算獲取期望副本

// 計算基于指標的期望副本數
metricDesiredReplicas, metricName, metricStatuses, metricTimestamp, err = a.computeReplicasForMetrics(ctx, hpa, scale, hpa.Spec.Metrics)

（5）根據是否配置了 Behavior 選擇不通的標準化策略

// 根據是否配置了Behavior選擇不同的標準化策略
if hpa.Spec.Behavior == nil {
    desiredReplicas = a.normalizeDesiredReplicas(hpa, key, currentReplicas, desiredReplicas, minReplicas)
} else {
    desiredReplicas = a.normalizeDesiredReplicasWithBehaviors(hpa, key, currentReplicas, desiredReplicas, minReplicas)
}

（6）對于滿足擴縮容要求的進行擴縮容操作

// 重試機制更新Scale子資源
err := retry.RetryOnConflict(retry.DefaultRetry, func() error {
    scale.Spec.Replicas = desiredReplicas
    _, updateErr := a.scaleNamespacer.Scales(hpa.Namespace).Update(ctx, targetGR, scale, metav1.UpdateOptions{})
    // ... 沖突處理邏輯
})

這里使用 retry.RetryOnConflict 處理并發沖突的重試機制。實際上對目標資源的更新操作是調用 a.scaleNamespacer.Scales().Update 實現。

（7）最后更新狀態和事件記錄

// 設置HPA狀態條件
setCondition(hpa, autoscalingv2.AbleToScale, v1.ConditionTrue, "SucceededRescale", "...")

// 記錄事件
a.eventRecorder.Eventf(hpa, v1.EventTypeNormal, "SuccessfulRescale", "New size: %d; reason: %s", desiredReplicas, rescaleReason)

// 存儲擴縮容事件（用于Behavior計算）
a.storeScaleEvent(hpa.Spec.Behavior, key, currentReplicas, desiredReplicas)

// 更新HPA狀態
a.setStatus(hpa, currentReplicas, desiredReplicas, metricStatuses, rescale)
err = a.updateStatusIfNeeded(ctx, hpaStatusOriginal, hpa)

以上就是 reconcileAutoscaler 這個方法中的主要流程。其中最核心的地方在于副本數計算，它是在 computeReplicasForMetrics 中實現。

四、核心算法

現在我們對 computeReplicasForMetrics 方法進行解析，看看具體是怎么實現的。

（1）進行前置驗證和初始化

// 解析 HPA 選擇器，確保能夠正確識別目標 Pod
selector, err := a.validateAndParseSelector(hpa, scale.Status.Selector)
if err != nil {
    return -1, "", nil, time.Time{}, err
}

// 獲取目標資源的副本數信息
specReplicas := scale.Spec.Replicas      // 期望副本數
statusReplicas := scale.Status.Replicas  // 當前副本數

// 初始化指標狀態列表
statuses = make([]autoscalingv2.MetricStatus, len(metricSpecs))

// 無效指標
invalidMetricsCount := 0
var invalidMetricError error
var invalidMetricCondition autoscalingv2.HorizontalPodAutoscalerCondition

（2）對指標進行循環計算

for i, metricSpec := range metricSpecs {
    // 為每個指標計算建議副本數
    replicaCountProposal, metricNameProposal, timestampProposal, condition, err := a.computeReplicasForMetric(ctx, hpa, metricSpec, specReplicas, statusReplicas, selector, &statuses[i])
    
    if err != nil {
        // 記錄第一個無效指標的錯誤信息
        if invalidMetricsCount <= 0 {
            invalidMetricCondition = condition
            invalidMetricError = err
        }
        invalidMetricsCount++
        continue
    }
    
    // 采用最大值策略選擇副本數
    if replicas == 0 || replicaCountProposal > replicas {
        timestamp = timestampProposal
        replicas = replicaCountProposal
        metric = metricNameProposal
    }
}

這里調用 replicaCountProposal, metricNameProposal, timestampProposal, condition, err := a.computeReplicasForMetric(ctx, hpa, metricSpec, specReplicas, statusReplicas, selector, &statuses[i]) 對每個指標進行計算。

在 computeReplicasForMetric 會根據不通的指標類型進行計算。

switch spec.Type {
case autoscalingv2.ObjectMetricSourceType:
    // 對象指標處理
case autoscalingv2.PodsMetricSourceType:
    // Pod指標處理
case autoscalingv2.ResourceMetricSourceType:
    // 資源指標處理
case autoscalingv2.ContainerResourceMetricSourceType:
    // 容器資源指標處理
case autoscalingv2.ExternalMetricSourceType:
    // 外部指標處理
default:
    // 未知指標類型錯誤處理
}

這里我們只拿對象指標 autoscalingv2.ObjectMetricSourceType 進行說明。如果類型是對象指標，則會調用 a.computeStatusForObjectMetric 來進行計算。

在 computeStatusForObjectMetric 中會先初始化指標狀態，用于記錄指標的當前狀態。

// 初始化指標狀態，用于記錄指標的當前狀態
metricStatus := autoscalingv2.MetricStatus{
    Type: autoscalingv2.ObjectMetricSourceType,
    Object: &autoscalingv2.ObjectMetricStatus{
        DescribedObject: metricSpec.Object.DescribedObject,
        Metric: autoscalingv2.MetricIdentifier{
            Name:     metricSpec.Object.Metric.Name,
            Selector: metricSpec.Object.Metric.Selector,
        },
    },
}

然后調用 a.tolerancesForHpa(hpa) 獲取當前對象的容忍度，在 tolerancesForHpa 中的代碼實現如下：

func (a *HorizontalController) tolerancesForHpa(hpa *autoscalingv2.HorizontalPodAutoscaler) Tolerances {
    // 初始化默認容忍度
 t := Tolerances{a.tolerance, a.tolerance}
 // 查看特性門控是否開啟
 behavior := hpa.Spec.Behavior
 allowConfigurableTolerances := utilfeature.DefaultFeatureGate.Enabled(features.HPAConfigurableTolerance)
 // 如果特性門控未啟用，則直接返回默認值
 if behavior == nil || !allowConfigurableTolerances {
  return t
 }
 // 如果自定義了容忍度，則返回自定義的容忍度
 if behavior.ScaleDown != nil && behavior.ScaleDown.Tolerance != nil {
  t.scaleDown = behavior.ScaleDown.Tolerance.AsApproximateFloat64()
 }
 if behavior.ScaleUp != nil && behavior.ScaleUp.Tolerance != nil {
  t.scaleUp = behavior.ScaleUp.Tolerance.AsApproximateFloat64()
 }
 return t
}

默認容忍度在 pkg\controller\podautoscaler\config\v1alpha1\defaults.go 中定義，默認是 0.1，也就是 10% 的容忍度。

if obj.HorizontalPodAutoscalerTolerance == 0 {
    obj.HorizontalPodAutoscalerTolerance = 0.1
}

獲取到容忍度之后，會分別就 絕對值目標 和 平均值目標 進行計算。

if metricSpec.Object.Target.Type == autoscalingv2.ValueMetricType && metricSpec.Object.Target.Value != nil {
// 計算絕對值目標的副本數
replicaCountProposal, usageProposal, timestampProposal, err := a.replicaCalc.GetObjectMetricReplicas()
...
} else if metricSpec.Object.Target.Type == autoscalingv2.AverageValueMetricType && metricSpec.Object.Target.AverageValue != nil {
// 計算平均值目標的副本數
replicaCountProposal, usageProposal, timestampProposal, err := a.replicaCalc.GetObjectPerPodMetricReplicas()
...
}

在計算 絕對值 目標的副本數中，使用 usageRatio := float64(usage) / float64(targetUsage) 來計算使用率，然后通過replicaCountFloat := usageRatio * float64(readyPodCount) 獲取期望的副本數，如果副本數不是整數，則會向上取整。

// GetObjectMetricReplicas

func (c *ReplicaCalculator) GetObjectMetricReplicas(currentReplicas int32, targetUsage int64, metricName string, tolerances Tolerances, namespace string, objectRef *autoscaling.CrossVersionObjectReference, selector labels.Selector, metricSelector labels.Selector) (replicaCount int32, usage int64, timestamp time.Time, err error) {
 // 獲取當前的指標值
 usage, _, err = c.metricsClient.GetObjectMetric(metricName, namespace, objectRef, metricSelector)
 if err != nil {
  return 0, 0, time.Time{}, fmt.Errorf("unable to get metric %s: %v on %s %s/%s", metricName, objectRef.Kind, namespace, objectRef.Name, err)
 }
 // 計算使用率
 usageRatio := float64(usage) / float64(targetUsage)
 // 計算期望的副本數
 replicaCount, timestamp, err = c.getUsageRatioReplicaCount(currentReplicas, usageRatio, tolerances, namespace, selector)
 return replicaCount, usage, timestamp, err
}

func (c *ReplicaCalculator) getUsageRatioReplicaCount(currentReplicas int32, usageRatio float64, tolerances Tolerances, namespace string, selector labels.Selector) (replicaCount int32, timestamp time.Time, err error) {
 // 當當前副本數不為0的時候
 if currentReplicas != 0 {
  // 檢查使用率比例是否在容忍度中，如果在容忍度內，直接返回當前副本數
  if tolerances.isWithin(usageRatio) {
   return currentReplicas, timestamp, nil
  }
  // 獲取就緒的Pod
  readyPodCount := int64(0)
  readyPodCount, err = c.getReadyPodsCount(namespace, selector)
  if err != nil {
   return 0, time.Time{}, fmt.Errorf("unable to calculate ready pods: %s", err)
  }
  // 計算期望的副本數
  replicaCountFloat := usageRatio * float64(readyPodCount)
  // 檢查副本數是否超過最大int32值
  if replicaCountFloat > math.MaxInt32 {
   replicaCount = math.MaxInt32
  } else {
   // 向上取整
   replicaCount = int32(math.Ceil(replicaCountFloat))
  }
 } else {
  // 當當前副本數為0的時候，直接使用使用率計算，向上取整
  replicaCount = int32(math.Ceil(usageRatio))
 }

 return replicaCount, timestamp, err
}

在處理 平均值 目標的副本數中，是采用 usageRatio := float64(usage) / (float64(targetAverageUsage) * float64(replicaCount)) 來計算使用率，也就是 使用率 = 實際指標值 / (目標平均值 × 當前副本數)。當使用率超出容忍范圍，則采用 math.Ceil(實際指標值 / 目標平均值) 重新計算副本數，否則副本數不變。

func (c *ReplicaCalculator) GetObjectPerPodMetricReplicas(statusReplicas int32, targetAverageUsage int64, metricName string, tolerances Tolerances, namespace string, objectRef *autoscaling.CrossVersionObjectReference, metricSelector labels.Selector) (replicaCount int32, usage int64, timestamp time.Time, err error) {
 // 獲取當前的指標值
 usage, timestamp, err = c.metricsClient.GetObjectMetric(metricName, namespace, objectRef, metricSelector)
 if err != nil {
  return 0, 0, time.Time{}, fmt.Errorf("unable to get metric %s: %v on %s %s/%s", metricName, objectRef.Kind, namespace, objectRef.Name, err)
 }

 // 初始化副本數為當前副本數
 replicaCount = statusReplicas
 // 計算使用率
 usageRatio := float64(usage) / (float64(targetAverageUsage) * float64(replicaCount))
 if !tolerances.isWithin(usageRatio) {
  // 重新計算副本數
  replicaCount = int32(math.Ceil(float64(usage) / float64(targetAverageUsage)))
 }
 // 計算平均使用量
 usage = int64(math.Ceil(float64(usage) / float64(statusReplicas)))
 return replicaCount, usage, timestamp, nil
}

（3）如果指標無效則返回錯誤，否則返回期望副本數

// 如果所有指標都無效或部分指標無效且會導致縮容，則返回錯誤
if invalidMetricsCount >= len(metricSpecs) || (invalidMetricsCount > 0 && replicas < specReplicas) {
    setCondition(hpa, invalidMetricCondition.Type, invalidMetricCondition.Status, 
        invalidMetricCondition.Reason, "%s", invalidMetricCondition.Message)
    return -1, "", statuses, time.Time{}, invalidMetricError
}

// 設置伸縮活躍狀態
setCondition(hpa, autoscalingv2.ScalingActive, v1.ConditionTrue, "ValidMetricFound", 
    "the HPA was able to successfully calculate a replica count from %s", metric)

// 返回期望副本數
return replicas, metric, statuses, timestamp, invalidMetricError

這里的 容忍度 可以解釋為何指標達到了87%，但是未觸發擴容。

在上面我們介紹了默認的容忍度是 0.1 ，也就是 10%，也就是當前使用率在目標值的 ±10% 范圍內，不會觸發擴縮容。我們可以使用容忍度的比較方法 (1.0-t.scaleDown) <= usageRatio && usageRatio <= (1.0+t.scaleUp) 來進行計算。

// 使用率
usageRatio = 實際值 / 目標值 = 87% / 80% = 1.0875
// 默認容忍度為 0.1，則容忍度范圍為 [0.9,1.1]
// 0.9 ≤ 1.0875 ≤ 1.1
// 所以目標值在容忍度范圍內，不會觸發擴容

五、約束機制

HPA 的擴縮容也不是無限制的，為了避免頻繁的擴縮容，除了容忍度之外，還增加了許多約束條件。

其主要在 a.normalizeDesiredReplicas 或 a.normalizeDesiredReplicasWithBehaviors 中進行實現。這兩個實現的區別在于：

normalizeDesiredReplicas是基礎的標準化處理，而 normalizeDesiredReplicasWithBehaviors是高級的行為策略處理
要使用 normalizeDesiredReplicasWithBehaviors，則需要配置 hpa.Spec.Behavior，比如：

behavior:
  scaleUp:
    policies:
    - type: Percent
      value: 100
      periodSeconds: 15
  scaleDown:
    policies:
    - type: Percent
      value: 100
      periodSeconds: 300

下面，我們在 normalizeDesiredReplicas 中進行說明，源代碼如下：

func (a *HorizontalController) normalizeDesiredReplicas(hpa *autoscalingv2.HorizontalPodAutoscaler, key string, currentReplicas int32, prenormalizedDesiredReplicas int32, minReplicas int32) int32 {
 // 對推薦副本數進行穩定性處理
 stabilizedRecommendation := a.stabilizeRecommendation(key, prenormalizedDesiredReplicas)
 // 如果穩定化推薦值和原始值不同，則發生了穩定化變化，需要設置相應的狀態條件來反映當前的擴縮容能力
 if stabilizedRecommendation != prenormalizedDesiredReplicas {
  setCondition(hpa, autoscalingv2.AbleToScale, v1.ConditionTrue, "ScaleDownStabilized", "recent recommendations were higher than current one, applying the highest recent recommendation")
 } else {
  setCondition(hpa, autoscalingv2.AbleToScale, v1.ConditionTrue, "ReadyForNewScale", "recommended size matches current size")
 }

 // 應用規則，確保最終副本數在 [minReplicas, maxReplicas] 范圍內
 desiredReplicas, reason, message := convertDesiredReplicasWithRules(currentReplicas, stabilizedRecommendation, minReplicas, hpa.Spec.MaxReplicas)

 // 如果最終副本數與穩定化推薦值相同，說明沒有受到限制
 // 如果不同，說明受到了某種限制（如最小/最大副本數限制、擴容速率限制等）
 if desiredReplicas == stabilizedRecommendation {
  setCondition(hpa, autoscalingv2.ScalingLimited, v1.ConditionFalse, reason, "%s", message)
 } else {
  setCondition(hpa, autoscalingv2.ScalingLimited, v1.ConditionTrue, reason, "%s", message)
 }

 return desiredReplicas
}

在 convertDesiredReplicasWithRules 中通過 calculateScaleUpLimit 來計算擴容限制。

func calculateScaleUpLimit(currentReplicas int32) int32 {
    return int32(math.Max(scaleUpLimitFactor*float64(currentReplicas), scaleUpLimitMinimum))
}

其中：

scaleUpLimitFactor = 2.0 (擴容因子)
scaleUpLimitMinimum = 4.0 (最小擴容限制)

其計算邏輯是：

擴容限制 = max(當前副本數 × 2, 4)
例如：當前2個副本，擴容限制為max(2×2, 4) = 4
例如：當前10個副本，擴容限制為max(10×2, 4) = 20

假設當前副本數為5，HPA配置最小2個、最大20個：

期望副本數為1 ：返回2（最小限制），條件"TooFewReplicas"
期望副本數為8 ：返回8（在范圍內），條件"DesiredWithinRange"
期望副本數為15 ：

擴容限制 = max(5×2, 4) = 10
返回10（擴容限制），條件"ScaleUpLimit"

期望副本數為25 ：

擴容限制 = max(5×2, 4) = 10
返回10（擴容限制），條件"ScaleUpLimit"

這個函數是HPA安全擴縮容機制的重要組成部分，確保擴縮容操作既滿足業務需求又不會造成系統不穩定。

六、最后

上面我們通過源碼了解 HPA 的工作機制，了解到為何 HPA 的目標值設置為 80%，但是實際使用達到 87% 而沒觸發擴容。

其直接原因是 容忍度 導致的，但是在其他常見也可能因為 冷卻窗口 影響擴容，甚至還可能是指標采集延遲導致指標不準確等各種因素。如果要用好 HPA，我們應該：

監控 HPA 狀態：使用 kubectl describe hpa 查看 Conditions 和 Current Metrics
合理設置目標值：避免設置過高的目標利用率（如 >75%）
啟用 Behavior 配置：精細化控制擴縮容節奏
結合日志與事件：關注 ScalingActive、ScalingLimited 等狀態變化

責任編輯：武曉燕來源：運維開發故事