今天查看监控,发现又多了几个influxdb读失败的异常,查看日志发现这种打印
Failed to query InfluxDB java.io.InterruptedIOException
查看附近日志,发现读postgresql也异常了
The error may exist in com/accuenergy/api/data/dao/DataFowardMapper.java (best guess)
2025-06-23 02:10:28.452
The error may involve com.accuenergy.api.data.dao.DataFowardMapper.updateById
2025-06-23 02:10:28.452
The error occurred while executing an update
2025-06-23 02:10:28.452
Cause: org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is java.sql.SQLException: interrupt
2025-06-23 02:10:28.452
at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30)
2025-06-23 02:10:28.452
at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:196)
2025-06-23 02:10:28.452
at jdk.internal.reflect.GeneratedMethodAccessor496.invoke(Unknown Source)
2025-06-23 02:10:28.452
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2025-06-23 02:10:28.452
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
2025-06-23 02:10:28.452
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:427)
2025-06-23 02:10:28.452
... 20 more
2025-06-23 02:10:28.452
Caused by: org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is java.sql.SQLException: interrupt
2025-06-23 02:10:28.452
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:83)
2025-06-23 02:10:28.452
at org.mybatis.spring.transaction.SpringManagedTransaction.openConnection(SpringManagedTransaction.java:80)
2025-06-23 02:10:28.452
at org.mybatis.spring.transaction.SpringManagedTransaction.getConnection(SpringManagedTransaction.java:67)
日志显示,读influxdb和读pg都出现了InterruptedException异常,可能原因:
- 查询太慢超时了
- 线程被其他线程中断
查看数据库时延的指标,可以排除1
那么只可能是被中断了,被谁中断了?
分析日志,发现InterruptedException异常都出现在10分钟整点,怀疑都是定时转发任务
这个任务是xxjob触发的
怀疑是xxjob执行超时了,但是任务超时时间配的很长1w秒,不能是超时导致
查看调度日志
block strategy effect:Cover Early [job running, killed]
果然有任务被kill的打印
为什么会被kill?因为我们的阻塞处理策略配的是:覆盖之前调度
当上一次任务还没执行完,下一个就到来时,会停掉上一次的job来执行新的
考虑我们的业务特点,前一个没处理完应该等待,所以改为单机串行策略