Gateway 網(wǎng)關(guān)坑我!被這個(gè)404 問(wèn)題折騰了一年?
最近同事找我?guī)兔ε挪橐粋€(gè)"詭異"的 Bug,說(shuō)困擾了他們一年多一直沒(méi)解決。我接手后花了一些時(shí)間定位到了問(wèn)題根源,今天就來(lái)跟大家分享一下這個(gè)問(wèn)題的排查過(guò)程和解決方案。
問(wèn)題描述
同事使用的是 SpringCloud Gateway 3.0.1 + JDK8,整合了 Nacos 做動(dòng)態(tài)路由配置。問(wèn)題是:每次修改 Nacos 的路由配置后,網(wǎng)關(guān)的 API 請(qǐng)求就會(huì)出現(xiàn) 404 錯(cuò)誤,但重啟網(wǎng)關(guān)后又能恢復(fù)正常。
聽到這個(gè)問(wèn)題,我的第一反應(yīng)是:Nacos 配置更新后,網(wǎng)關(guān)的緩存數(shù)據(jù)可能沒(méi)有及時(shí)更新。帶著這個(gè)猜想,我開始深入排查。
環(huán)境準(zhǔn)備
首先準(zhǔn)備了 3 個(gè)后端服務(wù)實(shí)例,端口分別為 8103、12040、12041,在 Nacos 中配置了對(duì)應(yīng)的網(wǎng)關(guān)路由:xiaofu-8103、xiaofu-12040、xiaofu-12041,并將它們放在同一個(gè)權(quán)重組 xiaofu-group 中,實(shí)現(xiàn)基于權(quán)重的負(fù)載均衡。
- id: xiaofu-8103
uri: http://127.0.0.1:8103/
predicates:
- Weight=xiaofu-group, 2
- Path=/test/version1/**
filters:
- RewritePath=/test/version1/(?<segment>.*),/$\{segment}
- id: xiaofu-12040
uri: http://127.0.0.1:12040/
predicates:
- Weight=xiaofu-group, 1
- Path=/test/version1/**
filters:
- RewritePath=/test/version1/(?<segment>.*),/$\{segment}
- id: xiaofu-12041
uri: http://127.0.0.1:12041/
predicates:
- Weight=xiaofu-group, 2
- Path=/test/version1/**
filters:
- RewritePath=/test/version1/(?<segment>.*),/$\{segment}使用 JMeter 進(jìn)行持續(xù)請(qǐng)求測(cè)試,為了便于日志追蹤,給每個(gè)請(qǐng)求參數(shù)都添加了隨機(jī)數(shù)。
圖片
準(zhǔn)備完成后啟動(dòng) JMeter 循環(huán)請(qǐng)求,觀察到三個(gè)實(shí)例都有日志輸出,說(shuō)明網(wǎng)關(guān)的負(fù)載均衡功能正常。
圖片
問(wèn)題排查
為了獲取更詳細(xì)的日志信息,我將網(wǎng)關(guān)的日志級(jí)別調(diào)整為 TRACE。
啟動(dòng) JMeter 后,隨機(jī)修改三個(gè)實(shí)例的路由屬性(uri、port、predicates、filters),請(qǐng)求沒(méi)有出現(xiàn)報(bào)錯(cuò),網(wǎng)關(guān)控制臺(tái)也顯示了更新后的路由屬性,說(shuō)明 Nacos 配置變更已成功同步到網(wǎng)關(guān)。
圖片
接下來(lái)嘗試去掉一個(gè)實(shí)例 xiaofu-12041,這時(shí)發(fā)現(xiàn) JMeter 請(qǐng)求開始出現(xiàn) 404 錯(cuò)誤,成功復(fù)現(xiàn)問(wèn)題!
圖片
查看網(wǎng)關(guān)控制臺(tái)日志時(shí),驚奇地發(fā)現(xiàn)已刪除的實(shí)例 xiaofu-12041 的路由配置仍然存在,甚至還被選中(chosen)處理請(qǐng)求。
問(wèn)題根源找到了:雖然 Nacos 中刪除了實(shí)例路由配置,但網(wǎng)關(guān)在實(shí)際負(fù)載均衡時(shí)仍然使用舊的路由數(shù)據(jù)。
圖片
繼續(xù)深入排查,發(fā)現(xiàn)在路由的權(quán)重信息(Weights attr)中也存在舊的路由數(shù)據(jù)。
至此基本確定問(wèn)題:在計(jì)算實(shí)例權(quán)重和負(fù)載均衡時(shí),網(wǎng)關(guān)使用了陳舊的緩存數(shù)據(jù)。
圖片
源碼分析
通過(guò)分析源碼,發(fā)現(xiàn)了一個(gè)專門計(jì)算權(quán)重的過(guò)濾器 WeightCalculatorWebFilter。它內(nèi)部維護(hù)了一個(gè) groupWeights 變量來(lái)存儲(chǔ)路由權(quán)重信息。
當(dāng)配置變更事件發(fā)生時(shí),會(huì)執(zhí)行 addWeightConfig(WeightConfig weightConfig) 方法來(lái)添加權(quán)重配置。
@Override
public void onApplicationEvent(ApplicationEvent event) {
if (event instanceof PredicateArgsEvent) {
handle((PredicateArgsEvent) event);
}
else if (event instanceof WeightDefinedEvent) {
addWeightConfig(((WeightDefinedEvent) event).getWeightConfig());
}
else if (event instanceof RefreshRoutesEvent && routeLocator != null) {
if (routeLocatorInitialized.compareAndSet(false, true)) {
routeLocator.ifAvailable(locator -> locator.getRoutes().blockLast());
}
else {
routeLocator.ifAvailable(locator -> locator.getRoutes().subscribe());
}
}
}addWeightConfig 方法的注釋明確說(shuō)明:該方法僅創(chuàng)建新的 GroupWeightConfig,而不進(jìn)行修改。
這意味著它只能新建或覆蓋路由權(quán)重,無(wú)法清理已刪除的路由權(quán)重信息。
void addWeightConfig(WeightConfig weightConfig) {
String group = weightConfig.getGroup();
GroupWeightConfig config;
// only create new GroupWeightConfig rather than modify
// and put at end of calculations. This avoids concurency problems
// later during filter execution.
if (groupWeights.containsKey(group)) {
config = new GroupWeightConfig(groupWeights.get(group));
}
else {
config = new GroupWeightConfig(group);
}
final AtomicInteger index = new AtomicInteger(0);
....省略.....
if (log.isTraceEnabled()) {
log.trace("Recalculated group weight config " + config);
}
// only update after all calculations
groupWeights.put(group, config);
}解決方案
找到問(wèn)題根源后,解決方案就清晰了。
開始我懷疑可能是springcloud gateway 版本問(wèn)題,將版本升級(jí)到了4.1.0,但結(jié)果還是存在這個(gè)問(wèn)題。
圖片
看來(lái)只能手動(dòng)更新緩存解決了,需要監(jiān)聽 Nacos 路由配置變更事件,獲取最新路由配置,并更新 groupWeights 中的權(quán)重?cái)?shù)據(jù)。
以下是實(shí)現(xiàn)的解決方案代碼:
@Slf4j
@Configuration
public class WeightCacheRefresher {
@Autowired
private WeightCalculatorWebFilter weightCalculatorWebFilter;
@Autowired
private RouteDefinitionLocator routeDefinitionLocator;
@Autowired
private ApplicationEventPublisher publisher;
/**
* 監(jiān)聽路由刷新事件,同步更新權(quán)重緩存
*/
@EventListener(RefreshRoutesEvent.class)
public void onRefreshRoutes() {
log.info("檢測(cè)到路由刷新事件,準(zhǔn)備同步更新權(quán)重緩存");
syncWeightCache();
}
/**
* 同步權(quán)重緩存與當(dāng)前路由配置
*/
public void syncWeightCache() {
try {
// 獲取 groupWeights 字段
Field groupWeightsField = WeightCalculatorWebFilter.class.getDeclaredField("groupWeights");
groupWeightsField.setAccessible(true);
// 獲取當(dāng)前的 groupWeights 值
@SuppressWarnings("unchecked")
Map<String, Object> groupWeights = (Map<String, Object>) groupWeightsField.get(weightCalculatorWebFilter);
if (groupWeights == null) {
log.warn("未找到 groupWeights 緩存");
return;
}
log.info("當(dāng)前 groupWeights 緩存: {}", groupWeights.keySet());
// 獲取當(dāng)前所有路由的權(quán)重組和路由ID
final Set<String> currentRouteIds = new HashSet<>();
final Map<String, Map<String, Integer>> currentGroupRouteWeights = new HashMap<>();
routeDefinitionLocator.getRouteDefinitions()
.collectList()
.subscribe(definitions -> {
definitions.forEach(def -> {
currentRouteIds.add(def.getId());
def.getPredicates().stream()
.filter(predicate -> predicate.getName().equals("Weight"))
.forEach(predicate -> {
Map<String, String> args = predicate.getArgs();
String group = args.getOrDefault("_genkey_0", "unknown");
int weight = Integer.parseInt(args.getOrDefault("_genkey_1", "0"));
// 記錄每個(gè)組中當(dāng)前存在的路由及其權(quán)重
currentGroupRouteWeights.computeIfAbsent(group, k -> new HashMap<>())
.put(def.getId(), weight);
});
});
log.info("當(dāng)前路由配置中的路由ID: {}", currentRouteIds);
log.info("當(dāng)前路由配置中的權(quán)重組: {}", currentGroupRouteWeights);
// 檢查每個(gè)權(quán)重組,移除不存在的路由,更新權(quán)重變化的路由
Set<String> groupsToRemove = new HashSet<>();
Set<String> groupsToUpdate = new HashSet<>();
for (String group : groupWeights.keySet()) {
if (!currentGroupRouteWeights.containsKey(group)) {
// 整個(gè)權(quán)重組不再存在
groupsToRemove.add(group);
log.info("權(quán)重組 [{}] 不再存在于路由配置中,將被移除", group);
continue;
}
// 獲取該組中當(dāng)前配置的路由ID和權(quán)重
Map<String, Integer> configuredRouteWeights = currentGroupRouteWeights.get(group);
// 獲取該組中緩存的權(quán)重配置
Object groupWeightConfig = groupWeights.get(group);
try {
// 獲取 weights 字段
Field weightsField = groupWeightConfig.getClass().getDeclaredField("weights");
weightsField.setAccessible(true);
@SuppressWarnings("unchecked")
LinkedHashMap<String, Integer> weights = (LinkedHashMap<String, Integer>) weightsField.get(groupWeightConfig);
// 找出需要移除的路由ID
Set<String> routesToRemove = weights.keySet().stream()
.filter(routeId -> !configuredRouteWeights.containsKey(routeId))
.collect(Collectors.toSet());
// 找出權(quán)重發(fā)生變化的路由ID
Set<String> routesWithWeightChange = new HashSet<>();
for (Map.Entry<String, Integer> entry : weights.entrySet()) {
String routeId = entry.getKey();
Integer cachedWeight = entry.getValue();
if (configuredRouteWeights.containsKey(routeId)) {
Integer configuredWeight = configuredRouteWeights.get(routeId);
if (!cachedWeight.equals(configuredWeight)) {
routesWithWeightChange.add(routeId);
log.info("路由 [{}] 的權(quán)重從 {} 變?yōu)?{}", routeId, cachedWeight, configuredWeight);
}
}
}
// 找出新增的路由ID
Set<String> newRoutes = configuredRouteWeights.keySet().stream()
.filter(routeId -> !weights.containsKey(routeId))
.collect(Collectors.toSet());
if (!routesToRemove.isEmpty() || !routesWithWeightChange.isEmpty() || !newRoutes.isEmpty()) {
log.info("權(quán)重組 [{}] 中有變化:刪除 {},權(quán)重變化 {},新增 {}",
group, routesToRemove, routesWithWeightChange, newRoutes);
// 如果有任何變化,我們將重新計(jì)算整個(gè)組的權(quán)重
groupsToUpdate.add(group);
}
// 首先,移除需要?jiǎng)h除的路由
for (String routeId : routesToRemove) {
weights.remove(routeId);
}
// 如果權(quán)重組中沒(méi)有剩余路由,則移除整個(gè)組
if (weights.isEmpty()) {
groupsToRemove.add(group);
log.info("權(quán)重組 [{}] 中沒(méi)有剩余路由,將移除整個(gè)組", group);
}
} catch (Exception e) {
log.error("處理權(quán)重組 [{}] 時(shí)出錯(cuò)", group, e);
}
}
// 移除不再需要的權(quán)重組
for (String group : groupsToRemove) {
groupWeights.remove(group);
log.info("已移除權(quán)重組: {}", group);
}
// 更新需要重新計(jì)算的權(quán)重組
for (String group : groupsToUpdate) {
try {
// 獲取該組中當(dāng)前配置的路由ID和權(quán)重
Map<String, Integer> configuredRouteWeights = currentGroupRouteWeights.get(group);
// 移除舊的權(quán)重組配置
groupWeights.remove(group);
log.info("已移除權(quán)重組 [{}] 以便重新計(jì)算", group);
// 為每個(gè)路由創(chuàng)建 WeightConfig 并調(diào)用 addWeightConfig 方法
Method addWeightConfigMethod = WeightCalculatorWebFilter.class.getDeclaredMethod("addWeightConfig", WeightConfig.class);
addWeightConfigMethod.setAccessible(true);
for (Map.Entry<String, Integer> entry : configuredRouteWeights.entrySet()) {
String routeId = entry.getKey();
Integer weight = entry.getValue();
WeightConfig weightConfig = new WeightConfig(routeId);
weightConfig.setGroup(group);
weightConfig.setWeight(weight);
addWeightConfigMethod.invoke(weightCalculatorWebFilter, weightConfig);
log.info("為路由 [{}] 添加權(quán)重配置:組 [{}],權(quán)重 {}", routeId, group, weight);
}
} catch (Exception e) {
log.error("重新計(jì)算權(quán)重組 [{}] 時(shí)出錯(cuò)", group, e);
}
}
log.info("權(quán)重緩存同步完成,當(dāng)前緩存的權(quán)重組: {}", groupWeights.keySet());
});
} catch (Exception e) {
log.error("同步權(quán)重緩存失敗", e);
}
}
}如此一來(lái)每次更新nacos路由配置,就會(huì)監(jiān)聽到配置變更事件,進(jìn)而用最新的實(shí)例數(shù)據(jù)來(lái)更新本地的路由權(quán)重?cái)?shù)據(jù)。
網(wǎng)上找一圈并沒(méi)發(fā)現(xiàn)官方的修改意見,可能是咱們使用方式不對(duì)導(dǎo)致的,要不如此明顯的BUG早就有人改了吧!


























