问题
最近遇到了一个奇怪的问题,在线上发布应用时,有一部分机器能正常运行,有一部分起不来。代码部署的都是同一套,应用是tomcat中部署的SpringBoot war包,应用日志看不到,查看tomcat日志,发现报错信息为:
Exception in thread "main" java.lang.IllegalArgumentException: LoggerFactory is not a Logback LoggerContext but Logback is on the classpath. Either remove Logback or the competing implementation (class org.apache.logging.slf4j.Log4jLoggerFactory loaded from ......WEB-INFO/lib/log4j-slf4j-impl-2.13.3.jar). If you are using WebLogic you will need to add 'org.slf4j' to prefer-application-packages in WEB-INF/weblogic.xml: org.apache.logging.slf4j.Log4jLoggerFactory
at org.springframework.util.Assert.instanceCheckFailed(Assert.java:699)
at org.springframework.util.Assert.isInstanceOf(Assert.java:599)
at org.springframework.boot.logging.logback.LogbackLoggingSystem.getLoggerContext(LogbackLoggingSystem.java:284)
at org.springframework.boot.logging.logback.LogbackLoggingSystem.beforeInitialize(LogbackLoggingSystem.java:104)
......
看提示信息是日志框架冲突了,我们用的是logback却在新版本中无意中引入了log4j的jar包,排除掉log4j的相关引用后都可以正常发布运行了!
问题虽然找到原因也解决了,但是为什么这个问题会引发异常还不是很清楚,为什么有的机器可以有的不行也很奇怪,正好也学习下日志框架。
观察到有一段SLF4J的日志,提示有多个日志实现框架,报错的机器Actual binding都是log4j,不报错的是logback。
启动成功的日志:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:../ch/qos/logback/logback-classic/1.2.3/logback-classic-1.2.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:.../org/apache/logging/log4j/log4j-slf4j-impl/2.13.3/log4j-slf4j-impl-2.13.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
启动失败的日志:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:.../org/apache/logging/log4j/log4j-slf4j-impl/2.13.3/log4j-slf4j-impl-2.13.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:../ch/qos/logback/logback-classic/1.2.3/logback-classic-1.2.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
就先研究下SLF4J。
SLF4J
slf4j是日志接口层,提供了日志的统一接口,可以有不同的日志实现。
slf4j通过LoggerFactory.getLogger方法返回Logger接口供上层使用,我们通过源码来看下是如何实现绑定不同的日志框架的。以下代码来自slf4j-api-1.7.25:
public static Logger getLogger(Class<?> clazz) {
Logger logger = getLogger(clazz.getName());
......下面省略非主流程......
}
public static Logger getLogger(String name) {
ILoggerFactory iLoggerFactory = getILoggerFactory();
return iLoggerFactory.getLogger(name);
}
使用的工厂模式来创建日志对象,那关键就是看如何获取的不同日志工厂了。
public static ILoggerFactory getILoggerFactory() {
if (INITIALIZATION_STATE == UNINITIALIZED) {
synchronized (LoggerFactory.class) {
if (INITIALIZATION_STATE == UNINITIALIZED) {
INITIALIZATION_STATE = ONGOING_INITIALIZATION;
performInitialization();
}
}
}
......下面省略非主流程......
}
private final static void performInitialization() {
bind();
if (INITIALIZATION_STATE == SUCCESSFUL_INITIALIZATION) {
versionSanityCheck();
}
}
private final static void bind() {
try {
Set<URL> staticLoggerBinderPathSet = null;
// skip check under android, see also
// http://jira.qos.ch/browse/SLF4J-328
if (!isAndroid()) {
staticLoggerBinderPathSet = findPossibleStaticLoggerBinderPathSet();
reportMultipleBindingAmbiguity(staticLoggerBinderPathSet);
}
// the next line does the binding
StaticLoggerBinder.getSingleton();
INITIALIZATION_STATE = SUCCESSFUL_INITIALIZATION;
reportActualBinding(staticLoggerBinderPathSet);
......下面省略非主流程......
}
}
可以看到,绑定日志实现类的流程在bind方法里面,reportMultipleBindingAmbiguity和reportActualBinding就是打印上面SLF4J日志的地方了。但是具体是如何绑定的呢?
// the next line does the binding
StaticLoggerBinder.getSingleton();
纳尼?StaticLoggerBinder这个类的实现不存在slf4j的jar包中啊,为什么调用一下获取的单例方法就能够绑定日志框架了?
这里是通过java的类加载机制,在slf4j框架中不仅定义了绑定接口(LoggerFactoryBinder),还固定了它的实现类必须得是org.slf4j.impl.StaticLoggerBinder,也就是说,需要绑定slf4j的日志实现框架,不仅要实现LoggerFactoryBinder接口,而且还必须得定义org.slf4j.impl.StaticLoggerBinder这个类来实现,这样在slf4j绑定的时候就可以通过类名来加载这个类了,哪个先被类加载器加载到,就绑定的那个实现框架。不知道这里为啥这样设计,一般定义好接口,获取外部可拓展的实现类时,都是通过java SPI机制来实现的,SLF4J这样设计猜测可能是出于性能考虑。
原因
上面看到SLF4J绑定日志框架是看先加载到哪个类,而绑定到logback时应用正常启动,绑定到log4j时应用启动失败。而失败堆栈是springboot打印出来的,猜测应该是springboot适配日志系统的问题。
出错处代码在LogbackLoggingSystem中:
private LoggerContext getLoggerContext() {
ILoggerFactory factory = StaticLoggerBinder.getSingleton().getLoggerFactory();
Assert.isInstanceOf(LoggerContext.class, factory,
String.format(
"LoggerFactory is not a Logback LoggerContext but Logback is on "
+ "the classpath. Either remove Logback or the competing "
+ "implementation (%s loaded from %s). If you are using "
+ "WebLogic you will need to add 'org.slf4j' to "
+ "prefer-application-packages in WEB-INF/weblogic.xml",
factory.getClass(), getLocation(factory)));
return (LoggerContext) factory;
}
也就是springboot在获得LoggingSystem取到的是LogbackLoggingSystem,所以在LoggerBinder绑定Logback的时候正常,而在绑定Log4j的时候会抛异常。这就是只有一部分机器启动失败的原因了。那么为什么LoggingSystem会是LogbackLoggingSystem呢,再看下获取LoggingSystem的代码(spring-boot-2.3.2.RELEASE)
public static LoggingSystem get(ClassLoader classLoader) {
String loggingSystem = System.getProperty(SYSTEM_PROPERTY);
if (StringUtils.hasLength(loggingSystem)) {
if (NONE.equals(loggingSystem)) {
return new NoOpLoggingSystem();
}
return get(classLoader, loggingSystem);
}
return SYSTEMS.entrySet().stream().filter((entry) -> ClassUtils.isPresent(entry.getKey(), classLoader))
.map((entry) -> get(classLoader, entry.getValue())).findFirst()
.orElseThrow(() -> new IllegalStateException("No suitable logging system located"));
}
可以看到是先从系统属性中取loggingSystem,没有的话再从SYSTEMS变量中遍历当前类存在的。SYSTEMS是这样初始化的:
private static final Map<String, String> SYSTEMS;
static {
Map<String, String> systems = new LinkedHashMap<>();
systems.put("ch.qos.logback.core.Appender", "org.springframework.boot.logging.logback.LogbackLoggingSystem");
systems.put("org.apache.logging.log4j.core.impl.Log4jContextFactory",
"org.springframework.boot.logging.log4j2.Log4J2LoggingSystem");
systems.put("java.util.logging.LogManager", "org.springframework.boot.logging.java.JavaLoggingSystem");
SYSTEMS = Collections.unmodifiableMap(systems);
}
由于没有设置SYSTEM_PROPERTY的系统属性,而SYSTEMS中第一项是logback,同时logback的jar包也在类路径中,所以LoggingSystem每次取到的都是LogbackLoggingSystem了。至此问题出现的原因已找到,同时也找到了另外一种修改的方式,即通过设置系统属性来让springboot不使用日志系统输出日志:
System.setProperty(LoggingSystem.class.getName(), "none");
经测试果然也能避免异常的发生。
猜测
现在问题的原因基本已找到,但还有一个最根本的问题没有解决,那就是,为什么有的机器是先加载的Log4j的StaticLoggerBinder类,有的先加载的是Logback中的。这个问题暂时还没有找到答案,只能有一些大概的猜测,哪位大佬如果有想法可以留言交流下。
以下分析tomcat版本为8.0.30:
Tomcat中Web应用的类加载器使用的是WebappClassLoader来加载WEB-INF/lib中的jar包,在数据完全一致的时候为什么会出现不一样的结果呢?查看WebappClassLoader加载类的流程,发现是将lib中的jar包封装成WebResourceSet存到WebResourceRoot中,而加载lib中的jar的代码在StandardRoot.processWebInfLib:
protected void processWebInfLib() throws LifecycleException {
WebResource[] possibleJars = listResources("/WEB-INF/lib", false);
for (WebResource possibleJar : possibleJars) {
if (possibleJar.isFile() && possibleJar.getName().endsWith(".jar")) {
createWebResourceSet(ResourceSetType.CLASSES_JAR,
"/WEB-INF/classes", possibleJar.getURL(), "/");
}
}
}
jar包加载的顺序是listResources决定的,里面会调用File.list()方法来获取目录中的所有文件,而java中这个方法的注释有说:
There is no guarantee that the name strings in the resulting array
* will appear in any specific order; they are not, in particular,
* guaranteed to appear in alphabetical order.
...还有一些别的注释...
public String[] list() {
SecurityManager security = System.getSecurityManager();
if (security != null) {
security.checkRead(path);
}
if (isInvalid()) {
return null;
}
return fs.list(this);
}
翻译过来也就是说不保证返回的顺序,尤其是不保证安字母顺序返回,我猜测这就是加载类顺序部固定的原因,但是也have no guarantee😂