CTS/GTS问题分析13
问题分析
这个问题不是第一次出现,详见CTS问题分析10;但当时有更紧急的问题,所以并没有继续深入分析,只是分析到持有大量的CompatibilityTestSuite导致retry时发生错误;
但是这次又出现了,因此有必要进行下调研,以确保下次不再复现此问题
retry 命令: run retry --retry 0 --shard-count 2 -s 7c6252f -s 7c62472
终端报错log:
java.lang.OutOfMemoryError: GC overhead limit exceeded
Dumping heap to java_pid26338.hprof ...
Heap dump file created [5553157593 bytes in 101.829 secs]
01-29 16:09:47 E/CommandScheduler: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.HashMap.newNode(HashMap.java:1747)
at java.util.HashMap.putVal(HashMap.java:631)
at java.util.HashMap.put(HashMap.java:612)
at java.util.HashSet.add(HashSet.java:220)
at java.util.AbstractCollection.addAll(AbstractCollection.java:344)
at com.android.tradefed.config.OptionSetter.setFieldValue(OptionSetter.java:452)
at com.android.tradefed.config.OptionSetter.setFieldValue(OptionSetter.java:549)
at com.android.tradefed.config.OptionCopier.copyOptions(OptionCopier.java:49)
at com.android.tradefed.config.OptionCopier.copyOptionsNoThrow(OptionCopier.java:60)
at com.android.tradefed.testtype.suite.ITestSuite.split(ITestSuite.java:662)
at com.android.compatibility.common.tradefed.testtype.retry.RetryFactoryTest.split(RetryFactoryTest.java:122)
at com.android.tradefed.invoker.shard.ShardHelper.shardTest(ShardHelper.java:123)
at com.android.tradefed.invoker.shard.ShardHelper.shardConfig(ShardHelper.java:30)
at com.android.tradefed.invoker.shard.StrictShardHelper.shardConfig(StrictShardHelper.java:51)
at com.android.tradefed.invoker.InvocationExecution.shardConfig(InvocationExecution.java:149)
at com.android.tradefed.invoker.TestInvocation.invoke(TestInvocation.java:656)
at com.android.tradefed.command.CommandScheduler$InvocationThread.run(CommandScheduler.java:1357)
首先,我们从中可以看到失败时栈的路径,从中找出为什么占用大量内存的原因
多台机器retry时的数据结构组织
通过以前的分析,我们知道大量的CompatibilityTestSuite,中间持有大量的exclude case项记录最终造成问题;因此我们跟着栈梳理下多台机器retry时,cts相关的数据结构是如何组织的
tools/tradefederation/core/src/com/android/tradefed/invoker/shard/ShardHelper.java
65 /**
66 * Attempt to shard the configuration into sub-configurations, to be re-scheduled to run on
67 * multiple resources in parallel.
68 *
69 * <p>A successful shard action renders the current config empty, and invocation should not
70 * proceed.
71 *
72 * @see IShardableTest
73 * @see IRescheduler
74 * @param config the current {@link IConfiguration}.
75 * @param context the {@link IInvocationContext} holding the tests information.
76 * @param rescheduler the {@link IRescheduler}
77 * @return true if test was sharded. Otherwise return <code>false</code>
78 */
79 @Override
80 public boolean shardConfig(
81 IConfiguration config, IInvocationContext context, IRescheduler rescheduler) {
82 List<IRemoteTest> shardableTests = new ArrayList<IRemoteTest>();
83 boolean isSharded = false;
84 Integer shardCount = config.getCommandOptions().getShardCount();
85 for (IRemoteTest test : config.getTests()) {
86 isSharded |= shardTest(shardableTests, test, shardCount, context);// shardTest做retry时test的切分工作 ,此时test中没有什么,只记录了cts-known-failures.xml中的已知失败项,保存在exclude list中
87 }
88 if (!isSharded) {
89 return false;
90 }
91 // shard this invocation!
92 // create the TestInvocationListener that will collect results from all the shards,
93 // and forward them to the original set of listeners (minus any ISharddableListeners)
94 // once all shards complete
95 int expectedShard = shardableTests.size();
96 if (shardCount != null) {
97 expectedShard = Math.min(shardCount, shardableTests.size());
98 }
99 ShardMasterResultForwarder resultCollector =
100 new ShardMasterResultForwarder(buildMasterShardListeners(config), expectedShard);
101
102 resultCollector.invocationStarted(context);
103 synchronized (shardableTests) {
104 // When shardCount is available only create 1 poller per shard
105 // TODO: consider aggregating both case by picking a predefined shardCount if not
106 // available (like 4) for autosharding.
107 if (shardCount != null) {
108 // We shuffle the tests for best results: avoid having the same module sub-tests
109 // contiguously in the list.
110 Collections.shuffle(shardableTests);
111 int maxShard = Math.min(shardCount, shardableTests.size());
112 CountDownLatch tracker = new CountDownLatch(maxShard);
113 for (int i = 0; i < maxShard; i++) {
114 IConfiguration shardConfig = config.clone();
115 shardConfig.setTest(new TestsPoolPoller(shardableTests, tracker));
116 rescheduleConfig(shardConfig, config, context, rescheduler, resultCollector);
117 }
118 } else {
119 CountDownLatch tracker = new CountDownLatch(shardableTests.size());
120 for (IRemoteTest testShard : shardableTests) {
121 CLog.i("Rescheduling sharded config...");
122 IConfiguration shardConfig = config.clone();
123 if (config.getCommandOptions().shouldUseDynamicSharding()) {
124 shardConfig.setTest(new TestsPoolPoller(shardableTests, tracker));
125 } else {
126 shardConfig.setTest(testShard);
127 }
128 rescheduleConfig(shardConfig, config, context, rescheduler, resultCollector);
129 }
130 }
131 }
132 // clean up original builds
133 for (String deviceName : context.getDeviceConfigNames()) {
134 config.getDeviceConfigByName(deviceName)
135 .getBuildProvider()
136 .cleanUp(context.getBuildInfo(deviceName));
137 }
138 return true;
139 }
196 /**
197 * Attempt to shard given {@link IRemoteTest}.
198 *
199 * @param shardableTests the list of {@link IRemoteTest}s to add to
200 * @param test the {@link IRemoteTest} to shard
201 * @param shardCount attempted number of shard, can be null.
202 * @param context the {@link IInvocationContext} of the current invocation.
203 * @return <code>true</code> if test was sharded
204 */
205 private static boolean shardTest(
206 List<IRemoteTest> shardableTests,
207 IRemoteTest test,
208 Integer shardCount,
209 IInvocationContext context) {
210 boolean isSharded = false;
211 if (test instanceof IShardableTest) {
212 // inject device and build since they might be required to shard.
213 if (test instanceof IBuildReceiver) {
214 ((IBuildReceiver) test).setBuild(context.getBuildInfos().get(0));
215 }
216 if (test instanceof IDeviceTest) {
217 ((IDeviceTest) test).setDevice(context.getDevices().get(0));
218 }
219 if (test instanceof IMultiDeviceTest) {
220 ((IMultiDeviceTest) test).setDeviceInfos(context.getDeviceBuildMap());
221 }
222 if (test instanceof IInvocationContextReceiver) {
223 ((IInvocationContextReceiver) test).setInvocationContext(context);
224 }
225 //为test设置一些属性
226 IShardableTest shardableTest = (IShardableTest) test;
227 Collection<IRemoteTest> shards = null;
228 // Give the shardCount hint to tests if they need it.
229 if (shardCount != null) { //当多台机器retry指定了shardCount时
230 shards = shardableTest.split(shardCount); //调用RetryFactoryTest.split方法
231 } else {
232 shards = shardableTest.split();
233 }
234 if (shards != null) {
235 shardableTests.addAll(shards);
236 isSharded = true;
237 }
238 }
239 if (!isSharded) {
240 shardableTests.add(test);
241 }
242 return isSharded;
243 }
test/suite_harness/common/host-side/tradefed/src/com/android/compatibility/common/tradefed/testtype/retry/RetryFactoryTest.java
180 @Override
181 public Collection<IRemoteTest> split(int shardCountHint) {
182 try {
183 CompatibilityTestSuite test = loadSuite();
184 return test.split(shardCountHint); //注意上面两句,这里是组织数据结构的关键所在
185 } catch (DeviceNotAvailableException e) {
186 CLog.e("Failed to shard the retry run.");
187 CLog.e(e);
188 }
189 return null;
190 }
创建一个CompatibilityTestSuite
192 /**
193 * Helper to create a {@link CompatibilityTestSuite} from previous results.
194 */
195 private CompatibilityTestSuite loadSuite() throws DeviceNotAvailableException {
196 // Create a compatibility test and set it to run only what we want.
197 CompatibilityTestSuite test = createTest();
198
199 CompatibilityBuildHelper buildHelper = new CompatibilityBuildHelper(mBuildInfo);
200 // Create the helper with all the options needed.
201 RetryFilterHelper helper = createFilterHelper(buildHelper); //创建一个RetryFilterHelper
202 // TODO: we have access to the original command line, we should accommodate more re-run
203 // scenario like when the original cts.xml config was not used.
204 helper.validateBuildFingerprint(mDevice);
205 helper.setCommandLineOptionsFor(test);
206 helper.setCommandLineOptionsFor(this);
207 helper.populateRetryFilters(); //exclude项的增加
208
209 try {
210 OptionSetter setter = new OptionSetter(test);
211 for (String moduleArg : mModuleArgs) {
212 setter.setOptionValue("compatibility:module-arg", moduleArg);
213 }
214 for (String testArg : mTestArgs) {
215 setter.setOptionValue("compatibility:test-arg", testArg);
216 }
217 } catch (ConfigurationException e) {
218 throw new RuntimeException(e);
219 }
220
221 test.setIncludeFilter(helper.getIncludeFilters());
222 test.setExcludeFilter(helper.getExcludeFilters());
223 test.setDevice(mDevice);
224 test.setBuild(mBuildInfo);
225 test.setAbiName(mAbiName);
226 test.setPrimaryAbiRun(mPrimaryAbiRun);
227 test.setSystemStatusChecker(mStatusCheckers);
228 test.setInvocationContext(mContext);
229 test.setConfiguration(mMainConfiguration);
230 // reset the retry id - Ensure that retry of retry does not throw
231 test.resetRetryId();
232 test.isRetry();
233 // clean the helper
234 helper.tearDown();
235 return test;
236 }
test/suite_harness/common/host-side/tradefed/src/com/android/compatibility/common/tradefed/util/RetryFilterHelper.java
72 /**
73 * Constructor for a {@link RetryFilterHelper}.
74 *
75 * @param build a {@link CompatibilityBuildHelper} describing the build.
76 * @param sessionId The ID of the session to retry.
77 * @param subPlan The name of a subPlan to be used. Can be null.
78 * @param includeFilters The include module filters to apply
79 * @param excludeFilters The exclude module filters to apply
80 * @param abiName The name of abi to use. Can be null.
81 * @param moduleName The name of the module to run. Can be null.
82 * @param testName The name of the test to run. Can be null.
83 * @param retryType The type of results to retry. Can be null.
84 */
85 public RetryFilterHelper(CompatibilityBuildHelper build, int sessionId, String subPlan,
86 Set<String> includeFilters, Set<String> excludeFilters, String abiName,
87 String moduleName, String testName, RetryType retryType) {
88 this(build, sessionId);
89 mSubPlan = subPlan;
90 mIncludeFilters.addAll(includeFilters);
91 mExcludeFilters.addAll(excludeFilters);
92 mAbiName = abiName;
93 mModuleName = moduleName;
94 mTestName = testName;
95 mRetryType = retryType;
96 }
到此时mExcludeFilters中还只有cts-known-failures.xml中记录的已知错误,关键在populateRetryFilters
183 /**
184 * Populate mRetryIncludes and mRetryExcludes based on the options and the result set for
185 * this instance of RetryFilterHelper.
186 */
187 public void populateRetryFilters() {
188 mRetryIncludes = new HashSet<>(mIncludeFilters); // reset for each population
189 mRetryExcludes = new HashSet<>(mExcludeFilters); // reset for each population
190 if (RetryType.CUSTOM.equals(mRetryType)) {
191 Set<String> customIncludes = new HashSet<>(mIncludeFilters);
192 Set<String> customExcludes = new HashSet<>(mExcludeFilters);
193 if (mSubPlan != null) { //retry时一般不指定subplan,因此这里不会走到
194 ISubPlan retrySubPlan = SubPlanHelper.getSubPlanByName(mBuild, mSubPlan);
195 customIncludes.addAll(retrySubPlan.getIncludeFilters());
196 customExcludes.addAll(retrySubPlan.getExcludeFilters());
197 }
198 // If includes were added, only use those includes. Also use excludes added directly
199 // or by subplan. Otherwise, default to normal retry.
200 if (!customIncludes.isEmpty()) {
201 mRetryIncludes.clear();
202 mRetryIncludes.addAll(customIncludes);
203 mRetryExcludes.addAll(customExcludes);
204 return;
205 }
206 }
207 // remove any extra filtering options
208 // TODO(aaronholden) remove non-plan includes (e.g. those in cts-vendor-interface)
209 // TODO(aaronholden) remove non-known-failure excludes
210 mModuleName = null;
211 mTestName = null;
212 mSubPlan = null;
213 populateFiltersBySubPlan();
214 populatePreviousSessionFilters();
215 }
因此会走到这里
217 /* Generation of filters based on previous sessions is implemented thoroughly in SubPlanHelper,
218 * and retry filter generation is just a subset of the use cases for the subplan retry logic.
219 * Use retry type to determine which result types SubPlanHelper targets. */
220 public void populateFiltersBySubPlan() {
221 SubPlanHelper retryPlanCreator = new SubPlanHelper();
222 retryPlanCreator.setResult(getResult());
223 if (RetryType.FAILED.equals(mRetryType)) {
224 // retry only failed tests
225 retryPlanCreator.addResultType(SubPlanHelper.FAILED);
226 } else if (RetryType.NOT_EXECUTED.equals(mRetryType)){
227 // retry only not executed tests
228 retryPlanCreator.addResultType(SubPlanHelper.NOT_EXECUTED);
229 } else {
230 // retry both failed and not executed tests
231 retryPlanCreator.addResultType(SubPlanHelper.FAILED);
232 retryPlanCreator.addResultType(SubPlanHelper.NOT_EXECUTED);
233 }
234 try {
235 ISubPlan retryPlan = retryPlanCreator.createSubPlan(mBuild); //可以看到SubPlanHelper中的include list和exclude list会被加到CompatibilityTestSuite项中
236 mRetryIncludes.addAll(retryPlan.getIncludeFilters());了
237 mRetryExcludes.addAll(retryPlan.getExcludeFilters());
238 } catch (ConfigurationException e) {
239 throw new RuntimeException ("Failed to create subplan for retry", e);
240 }
241 }
test/suite_harness/common/host-side/tradefed/src/com/android/compatibility/common/tradefed/result/SubPlanHelper.java
createSubPlan 最关键点,从我们retry的报告中提取信息到include list(mIncludeFilters)和exclude list(mExcludeFilters)
206 /**
207 * Create a subplan derived from a result.
208 * <p/>
209 * {@link Option} values must be set before this is called.
210 * @param buildHelper
211 * @return subplan
212 * @throws ConfigurationException
213 */
214 public ISubPlan createSubPlan(CompatibilityBuildHelper buildHelper)
215 throws ConfigurationException {
216 setupFields(buildHelper);
217 ISubPlan subPlan = new SubPlan();
218
219 // add filters from previous session to track which tests must run
220 subPlan.addAllIncludeFilters(mIncludeFilters);
221 subPlan.addAllExcludeFilters(mExcludeFilters);
222 if (mLastSubPlan != null) {
223 ISubPlan lastSubPlan = SubPlanHelper.getSubPlanByName(buildHelper, mLastSubPlan);
224 subPlan.addAllIncludeFilters(lastSubPlan.getIncludeFilters());
225 subPlan.addAllExcludeFilters(lastSubPlan.getExcludeFilters());
226 }
227 if (mModuleName != null) {
228 addIncludeToSubPlan(subPlan, new TestFilter(mAbiName, mModuleName, mTestName));
229 }
230 Set<TestStatus> statusesToRun = getStatusesToRun();
231 for (IModuleResult module : mResult.getModules()) {
232 if (shouldRunModule(module)) {
233 TestFilter moduleInclude =
234 new TestFilter(module.getAbi(), module.getName(), null /*test*/);
235 if (shouldRunEntireModule(module)) {
236 // include entire module
237 addIncludeToSubPlan(subPlan, moduleInclude); //整个模块的所有case全部fail
238 } else if (mResultTypes.contains(NOT_EXECUTED) && !module.isDone()) {
239 // add module include and test excludes
240 addIncludeToSubPlan(subPlan, moduleInclude);
241 for (ICaseResult caseResult : module.getResults()) {
242 for (ITestResult testResult : caseResult.getResults()) {
243 if (!statusesToRun.contains(testResult.getResultStatus())) {
244 TestFilter testExclude = new TestFilter(module.getAbi(),
245 module.getName(), testResult.getFullName());
246 addExcludeToSubPlan(subPlan, testExclude); //模块没执行完 done = false的情况
247 }
248 }
249 }
250 } else {
251 // Not-executed tests should not be rerun and/or this module is completed
252 // In any such case, it suffices to add includes for each test to rerun
253 for (ICaseResult caseResult : module.getResults()) {
254 for (ITestResult testResult : caseResult.getResults()) {
255 if (statusesToRun.contains(testResult.getResultStatus())) {
256 TestFilter testInclude = new TestFilter(module.getAbi(),
257 module.getName(), testResult.getFullName());
258 addIncludeToSubPlan(subPlan, testInclude);//模块执行完成,但是中间有部分fail的情况
259 }
260 }
261 }
262 }
263 } else {
264 // module should not run, exclude entire module
265 TestFilter moduleExclude =
266 new TestFilter(module.getAbi(), module.getName(), null /*test*/);
267 addExcludeToSubPlan(subPlan, moduleExclude);//全部正确的module
268 }
269 }
270 return subPlan;
271 }
那么到这里,CompatibilityTestSuite为什么会持有大量的exclude case项记录已经明白了,CtsDeqpTestCases没有完成,且是在快完成前中断导致最后没有完成,这一项共有35万条case(仅v7a或者v8a)
CompatibilityTestSuite下面的一些初始化操作因为不是本文的重点,不再赘述了;继续看test.split(shardCountHint)的逻辑
tools/tradefederation/core/src/com/android/tradefed/testtype/suite/ITestSuite.java
621 /** {@inheritDoc} */
622 @Override
623 public Collection<IRemoteTest> split(int shardCountHint) {
624 if (shardCountHint <= 1 || mIsSharded) {
625 // cannot shard or already sharded
626 return null;
627 }
628
629 LinkedHashMap<String, IConfiguration> runConfig = loadAndFilter();
630 if (runConfig.isEmpty()) {
631 CLog.i("No config were loaded. Nothing to run.");
632 return null;
633 }
634 injectInfo(runConfig);
635
636 // We split individual tests on double the shardCountHint to provide better average.
637 // The test pool mechanism prevent this from creating too much overhead.
638 List<ModuleDefinition> splitModules =
639 ModuleSplitter.splitConfiguration(
640 runConfig, shardCountHint, mShouldMakeDynamicModule);
641 runConfig.clear();
642 runConfig = null;
643 // create an association of one ITestSuite <=> one ModuleDefinition as the smallest
644 // execution unit supported.
645 List<IRemoteTest> splitTests = new ArrayList<>();
646 for (ModuleDefinition m : splitModules) {
647 ITestSuite suite = createInstance();
648 OptionCopier.copyOptionsNoThrow(this, suite);
649 suite.mIsSharded = true;
650 suite.mDirectModule = m;
651 splitTests.add(suite);
652 }
653 // return the list of ITestSuite with their ModuleDefinition assigned
654 return splitTests;
655 }
首先看loadAndFilter的相关逻辑
261 private LinkedHashMap<String, IConfiguration> loadAndFilter() {
262 LinkedHashMap<String, IConfiguration> runConfig = loadTests();
263 if (runConfig.isEmpty()) {
264 CLog.i("No config were loaded. Nothing to run.");
265 return runConfig;
266 }
267 if (mModuleMetadataIncludeFilter.isEmpty() && mModuleMetadataExcludeFilter.isEmpty()) {
268 return runConfig;
269 }
270 LinkedHashMap<String, IConfiguration> filteredConfig = new LinkedHashMap<>();
271 for (Entry<String, IConfiguration> config : runConfig.entrySet()) {
272 if (!filterByConfigMetadata(
273 config.getValue(),
274 mModuleMetadataIncludeFilter,
275 mModuleMetadataExcludeFilter)) {
276 // if the module config did not pass the metadata filters, it's excluded
277 // from execution.
278 continue;
279 }
280 if (!filterByRunnerType(config.getValue(), mAllowedRunners)) {
281 // if the module config did not pass the runner type filter, it's excluded from
282 // execution.
283 continue;
284 }
285 filterPreparers(config.getValue(), mAllowedPreparers);
286 filteredConfig.put(config.getKey(), config.getValue());
287 }
288 runConfig.clear();
289 return filteredConfig;
290 }
tools/tradefederation/core/src/com/android/tradefed/testtype/suite/BaseTestSuite.java
首先在loadTests中重新组织mIncludeFilters和mExcludeFilters,变为mIncludeFiltersParsed和mExcludeFiltersParsed
133 /** {@inheritDoc} */
134 @Override
135 public LinkedHashMap<String, IConfiguration> loadTests() {
136 try {
137 File testsDir = getTestsDir();
138 setupFilters(testsDir);
139 Set<IAbi> abis = getAbis(getDevice());
140
141 // Create and populate the filters here
142 SuiteModuleLoader.addFilters(mIncludeFilters, mIncludeFiltersParsed, abis);
143 SuiteModuleLoader.addFilters(mExcludeFilters, mExcludeFiltersParsed, abis); //解析成<String,List>键值对,module为name,List为其test
144
145 CLog.d(
146 "Initializing ModuleRepo\nABIs:%s\n"
147 + "Test Args:%s\nModule Args:%s\nIncludes:%s\nExcludes:%s",
148 abis, mTestArgs, mModuleArgs, mIncludeFiltersParsed, mExcludeFiltersParsed);
149 mModuleRepo =
150 createModuleLoader(
151 mIncludeFiltersParsed, mExcludeFiltersParsed, mTestArgs, mModuleArgs);
152 // Actual loading of the configurations.
153 return loadingStrategy(abis, testsDir, mSuitePrefix, mSuiteTag); //取要执行的module对应的config
154 } catch (DeviceNotAvailableException | FileNotFoundException e) {
155 throw new RuntimeException(e);
156 }
157 }
159 /**
160 * Default loading strategy will load from the resources and the tests directory. Can be
161 * extended or replaced.
162 *
163 * @param abis The set of abis to run against.
164 * @param testsDir The tests directory.
165 * @param suitePrefix A prefix to filter the resource directory.
166 * @param suiteTag The suite tag a module should have to be included. Can be null.
167 * @return A list of loaded configuration for the suite.
168 */
169 public LinkedHashMap<String, IConfiguration> loadingStrategy(
170 Set<IAbi> abis, File testsDir, String suitePrefix, String suiteTag) {
171 LinkedHashMap<String, IConfiguration> loadedConfigs = new LinkedHashMap<>();
172 // Load configs that are part of the resources
173 if (!mSkipJarLoading) {
174 loadedConfigs.putAll(
175 getModuleLoader().loadConfigsFromJars(abis, suitePrefix, suiteTag));
176 }
177
178 // Load the configs that are part of the tests dir
179 if (mConfigPatterns.isEmpty()) {
180 // If no special pattern was configured, use the default configuration patterns we know
181 mConfigPatterns.add(".*\\.config");
182 mConfigPatterns.add(".*\\.xml");
183 }
184 loadedConfigs.putAll(
185 getModuleLoader()
186 .loadConfigsFromDirectory(
187 testsDir, abis, suitePrefix, suiteTag, mConfigPatterns));
188 return loadedConfigs;
189 }
tools/tradefederation/core/src/com/android/tradefed/testtype/suite/ModuleSplitter.java
然后调用到splitConfiguration
56 /**
57 * Create a List of executable unit {@link ModuleDefinition}s based on the map of configuration
58 * that was loaded.
59 *
60 * @param runConfig {@link LinkedHashMap} loaded from {@link ITestSuite#loadTests()}.
61 * @param shardCount a shard count hint to help with sharding.
62 * @return List of {@link ModuleDefinition}
63 */
64 public static List<ModuleDefinition> splitConfiguration(
65 LinkedHashMap<String, IConfiguration> runConfig,
66 int shardCount,
67 boolean dynamicModule) {
68 if (dynamicModule) {
69 // We maximize the sharding for dynamic to reduce time difference between first and
70 // last shard as much as possible. Overhead is low due to our test pooling.
71 shardCount *= 2;
72 }
73 List<ModuleDefinition> runModules = new ArrayList<>();
74 for (Entry<String, IConfiguration> configMap : runConfig.entrySet()) {
75 // Check that it's a valid configuration for suites, throw otherwise.
76 ValidateSuiteConfigHelper.validateConfig(configMap.getValue());
77
78 createAndAddModule(
79 runModules,
80 configMap.getKey(),
81 configMap.getValue(),
82 shardCount,
83 dynamicModule); //根据module name,config,shardcount 创建对应的ModuleDefinition
84 }
85 return runModules;
86 }
88 private static void createAndAddModule(
89 List<ModuleDefinition> currentList,
90 String moduleName,
91 IConfiguration config,
92 int shardCount,
93 boolean dynamicModule) {
94 // If this particular configuration module is declared as 'not shardable' we take it whole
95 // but still split the individual IRemoteTest in a pool.
96 if (config.getConfigurationDescription().isNotShardable()
97 || (!dynamicModule
98 && config.getConfigurationDescription().isNotStrictShardable())) {
99 for (int i = 0; i < config.getTests().size(); i++) {
100 if (dynamicModule) {
101 ModuleDefinition module =
102 new ModuleDefinition(
103 moduleName,
104 config.getTests(),
105 clonePreparersMap(config),
106 clonePreparers(config.getMultiTargetPreparers()),
107 config);
108 currentList.add(module);
109 } else {
110 addModuleToListFromSingleTest(
111 currentList, config.getTests().get(i), moduleName, config);
112 }
113 }
114 return;
115 }
116
117 // If configuration is possibly shardable we attempt to shard it.
118 for (IRemoteTest test : config.getTests()) {
119 if (test instanceof IShardableTest) {
120 Collection<IRemoteTest> shardedTests = ((IShardableTest) test).split(shardCount);
121 if (shardedTests != null) {
122 // Test did shard we put the shard pool in ModuleDefinition which has a polling
123 // behavior on the pool.
124 if (dynamicModule) {
125 for (int i = 0; i < shardCount; i++) {
126 ModuleDefinition module =
127 new ModuleDefinition(
128 moduleName,
129 shardedTests,
130 clonePreparersMap(config),
131 clonePreparers(config.getMultiTargetPreparers()),
132 config);
133 currentList.add(module);
134 }
135 } else {
136 // We create independent modules with each sharded test.
137 for (IRemoteTest moduleTest : shardedTests) {
138 addModuleToListFromSingleTest(
139 currentList, moduleTest, moduleName, config);
140 }
141 }
142 continue;
143 }
144 }
145 // test is not shardable or did not shard
146 addModuleToListFromSingleTest(currentList, test, moduleName, config);
147 }
148 }
创建出ModuleDefinition list之后,根据其进行进一步的split操作
646 for (ModuleDefinition m : splitModules) {
647 ITestSuite suite = createInstance();
648 OptionCopier.copyOptionsNoThrow(this, suite); //注意这里,刚刚的创建的CompatibilityTestSuite有复制的操作
649 suite.mIsSharded = true;
650 suite.mDirectModule = m; //新的suite,为mDirectModule赋值(刚刚创建的ModuleDefinition)
651 splitTests.add(suite); //CompatibilityTestSuite list
652 }
这里splitTests就是hprof中造成失败的CompatibilityTestSuite list
tools/tradefederation/core/src/com/android/tradefed/config/OptionCopier.java
54 /**
55 * Identical to {@link #copyOptions(Object, Object)} but will log instead of throw if exception
56 * occurs.
57 */
58 public static void copyOptionsNoThrow(Object source, Object dest) {
59 try {
60 copyOptions(source, dest);
61 } catch (ConfigurationException e) {
62 CLog.e(e);
63 }
64 }
32 /**
33 * Copy the values from {@link Option} fields in <var>origObject</var> to <var>destObject</var>
34 *
35 * @param origObject the {@link Object} to copy from
36 * @param destObject the {@link Object} tp copy to
37 * @throws ConfigurationException if options failed to copy
38 */
39 public static void copyOptions(Object origObject, Object destObject)
40 throws ConfigurationException {
41 Collection<Field> origFields = OptionSetter.getOptionFieldsForClass(origObject.getClass());
42 Map<String, Field> destFieldMap = getFieldOptionMap(destObject);
43 for (Field origField : origFields) {
44 final Option option = origField.getAnnotation(Option.class);
45 Field destField = destFieldMap.remove(option.name());
46 if (destField != null) {
47 Object origValue = OptionSetter.getFieldValue(origField,
48 origObject);
49 OptionSetter.setFieldValue(option.name(), destObject, destField, origValue);
50 }
51 }
52 }
最后复制出大量的CompatibilityTestSuite (需要retry module多的情况) ;并且每个CompatibilityTestSuite持有大量的exclude记录项(35万条);最终造成log中的报错
问题总结
- 测试CtsDeqpTestCases module这个超大模块时,再其要执行完时,adb中断等情况造成case中断,done = false;因此再retry时,会将大量的exclude项记录到CompatibilityTestSuite中
- CompatibilityTestSuite在多台机器retry时有复制操作,更进一步放到了问题,导致fail
- 临时解决方案,将CtsDeqpTestCases这个模块单独提出来测试,这样能保证问题绝对不会发生;就算在此中断,单独retry CtsDeqpTestCases报告也不会进行复制操作;因此,目前看来只要单独测试CtsDeqpTestCases模块,此问题绝不会复现,这也是google允许的
- 建议google进行cts框架的修改,比如对retry时不用的exclude项进行移除;或者复制CompatibilityTestSuite时对exclude list用单例模式进行处理(这建议google来修复,google更熟悉此逻辑,并且google自身有专门的团队在不断迭代更新)
- 向google提供的首个patch 只是一种思路,不太好,还是建议google来修复这个问题