注:本文基于Android 8.1进行分析。
ART对象分配过程解析——内存分配的准备阶段
本章我们将分析Android 8.1中ART虚拟机的对象创建时内存分配过程的分析。本节将介绍内存分配相关的环境准备及各种跳转逻辑。
我们首先从Thread类开始分析。
Thread类
Thread类的Init()方法会进行线程相关的所有初始化工作,例如,初始化Cpu信息,成员函数InitTlsEntryPoints初始化一个外部库函数调用跳转表。例如,Thread类将外部库函数调用跳转表划分为4个,其中,interpreter_entrypoints_描述的是解释器要用到的跳转表,jni_entrypoints_描述的是JNI调用相关的跳转表,portable_entrypoints_描述的是Portable后端生成的本地机器指令要用到的跳转表,而quick_entrypoints_描述的是Quick后端生成的本地机器指令要用到的跳转表。这些函数跳转入口通过访问线程Thread对应的偏移量进入。
Thread的Init方法:
bool Thread::Init(ThreadList* thread_list, JavaVMExt* java_vm, JNIEnvExt* jni_env_ext) {
// This function does all the initialization that must be run by the native thread it applies to.
// (When we create a new thread from managed code, we allocate the Thread* in Thread::Create so
// we can handshake with the corresponding native thread when it's ready.) Check this native
// thread hasn't been through here already...
CHECK(Thread::Current() == nullptr);
// Set pthread_self_ ahead of pthread_setspecific, that makes Thread::Current function, this
// avoids pthread_self_ ever being invalid when discovered from Thread::Current().
tlsPtr_.pthread_self = pthread_self();
CHECK(is_started_);
SetUpAlternateSignalStack();
if (!InitStackHwm()) {
return false;
}
InitCpu();
InitTlsEntryPoints();
RemoveSuspendTrigger();
InitCardTable();
InitTid();
interpreter::InitInterpreterTls(this);
……
thread_list->Register(this);
return true;
}
Thread的InitTlsEntryPoints()方法:
void Thread::InitTlsEntryPoints() {
// Insert a placeholder so we can easily tell if we call an unimplemented entry point.
uintptr_t* begin = reinterpret_cast<uintptr_t*>(&tlsPtr_.jni_entrypoints);
uintptr_t* end = reinterpret_cast<uintptr_t*>(
reinterpret_cast<uint8_t*>(&tlsPtr_.quick_entrypoints) + sizeof(tlsPtr_.quick_entrypoints));
for (uintptr_t* it = begin; it != end; ++it) {
*it = reinterpret_cast<uintptr_t>(UnimplementedEntryPoint);
}
InitEntryPoints(&tlsPtr_.jni_entrypoints, &tlsPtr_.quick_entrypoints);
}
entrypoints目录
Thread的InitTlsEntryPoints()方法调用InitEntryPoints()方法,并且把偏移地址传递进去。根据设备cpu架构的不同,该方法的实现也不同,我们来看ARM 64的实现(/art/runtime/arch/arm64/entrypoints_init_arm64.cc):
void InitEntryPoints(JniEntryPoints* jpoints, QuickEntryPoints* qpoints) {
DefaultInitEntryPoints(jpoints, qpoints);
……
}
调用DefaultInitEntryPoints()方法(/art/runtime/entrypoints/quick/quick_default_init_entrypoints.h):
static void DefaultInitEntryPoints(JniEntryPoints* jpoints, QuickEntryPoints* qpoints) {
// JNI
jpoints->pDlsymLookup = art_jni_dlsym_lookup_stub;
// Alloc
ResetQuickAllocEntryPoints(qpoints, /* is_marking */ true);
……
}
我们只关注Alloc部分。这里继续调用ResetQuickAllocEntryPoints()方法。
位置:/art/runtime/entrypoints/quick/quick_alloc_entrypoints.cc
static gc::AllocatorType entry_points_allocator = gc::kAllocatorTypeDlMalloc;
void SetQuickAllocEntryPointsAllocator(gc::AllocatorType allocator) {
entry_points_allocator = allocator;
}
void ResetQuickAllocEntryPoints(QuickEntryPoints* qpoints, bool is_marking) {
#if !defined(__APPLE__) || !defined(__LP64__)
switch (entry_points_allocator) {
case gc::kAllocatorTypeDlMalloc: {
SetQuickAllocEntryPoints_dlmalloc(qpoints, entry_points_instrumented);
return;
}
case gc::kAllocatorTypeRosAlloc: {
SetQuickAllocEntryPoints_rosalloc(qpoints, entry_points_instrumented);
return;
}
case gc::kAllocatorTypeBumpPointer: {
CHECK(kMovingCollector);
SetQuickAllocEntryPoints_bump_pointer(qpoints, entry_points_instrumented);
return;
}
case gc::kAllocatorTypeTLAB: {
CHECK(kMovingCollector);
SetQuickAllocEntryPoints_tlab(qpoints, entry_points_instrumented);
return;
}
case gc::kAllocatorTypeRegion: {
CHECK(kMovingCollector);
SetQuickAllocEntryPoints_region(qpoints, entry_points_instrumented);
return;
}
case gc::kAllocatorTypeRegionTLAB: {
CHECK(kMovingCollector);
if (is_marking) {
SetQuickAllocEntryPoints_region_tlab(qpoints, entry_points_instrumented);
} else {
// Not marking means we need no read barriers and can just use the normal TLAB case.
SetQuickAllocEntryPoints_tlab(qpoints, entry_points_instrumented);
}
return;
}
default:
break;
}
#else
UNUSED(qpoints);
UNUSED(is_marking);
#endif
UNIMPLEMENTED(FATAL);
UNREACHABLE();
}
entry_points_allocator代表了内存分配器的类型,初始值为kAllocatorTypeDlMalloc表示将会使用DlMalloc的分配器入口。可以在调用SetQuickAllocEntryPointsAllocator改变entry_points_allocator的值。大部分情况下entry_points_allocator这个值为kAllocatorTypeRosAlloc。
SetQuickAllocEntryPointsAllocator会在ChangeAllocator方法修改分配器时被调用,ChangeAllocator会在ChangeCollector(修改垃圾收集方式)时被调用。
上面的代码调用到了SetQuickAllocEntryPoints_+不同分配器后缀,该方法又是在哪定义的呢?我们继续来看。
/art/runtime/entrypoints/quick/quick_alloc_entrypoints.cc:
#define GENERATE_ENTRYPOINTS(suffix) \
extern "C" void* art_quick_alloc_array_resolved##suffix(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved8##suffix(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved16##suffix(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved32##suffix(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved64##suffix(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_object_resolved##suffix(mirror::Class* klass); \
extern "C" void* art_quick_alloc_object_initialized##suffix(mirror::Class* klass); \
extern "C" void* art_quick_alloc_object_with_checks##suffix(mirror::Class* klass); \
extern "C" void* art_quick_alloc_string_from_bytes##suffix(void*, int32_t, int32_t, int32_t); \
extern "C" void* art_quick_alloc_string_from_chars##suffix(int32_t, int32_t, void*); \
extern "C" void* art_quick_alloc_string_from_string##suffix(void*); \
extern "C" void* art_quick_alloc_array_resolved##suffix##_instrumented(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved8##suffix##_instrumented(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved16##suffix##_instrumented(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved32##suffix##_instrumented(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved64##suffix##_instrumented(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_object_resolved##suffix##_instrumented(mirror::Class* klass); \
extern "C" void* art_quick_alloc_object_initialized##suffix##_instrumented(mirror::Class* klass); \
extern "C" void* art_quick_alloc_object_with_checks##suffix##_instrumented(mirror::Class* klass); \
extern "C" void* art_quick_alloc_string_from_bytes##suffix##_instrumented(void*, int32_t, int32_t, int32_t); \
extern "C" void* art_quick_alloc_string_from_chars##suffix##_instrumented(int32_t, int32_t, void*); \
extern "C" void* art_quick_alloc_string_from_string##suffix##_instrumented(void*); \
void SetQuickAllocEntryPoints##suffix(QuickEntryPoints* qpoints, bool instrumented) { \
if (instrumented) { \
qpoints->pAllocArrayResolved = art_quick_alloc_array_resolved##suffix##_instrumented; \
qpoints->pAllocArrayResolved8 = art_quick_alloc_array_resolved8##suffix##_instrumented; \
qpoints->pAllocArrayResolved16 = art_quick_alloc_array_resolved16##suffix##_instrumented; \
qpoints->pAllocArrayResolved32 = art_quick_alloc_array_resolved32##suffix##_instrumented; \
qpoints->pAllocArrayResolved64 = art_quick_alloc_array_resolved64##suffix##_instrumented; \
qpoints->pAllocObjectResolved = art_quick_alloc_object_resolved##suffix##_instrumented; \
qpoints->pAllocObjectInitialized = art_quick_alloc_object_initialized##suffix##_instrumented; \
qpoints->pAllocObjectWithChecks = art_quick_alloc_object_with_checks##suffix##_instrumented; \
qpoints->pAllocStringFromBytes = art_quick_alloc_string_from_bytes##suffix##_instrumented; \
qpoints->pAllocStringFromChars = art_quick_alloc_string_from_chars##suffix##_instrumented; \
qpoints->pAllocStringFromString = art_quick_alloc_string_from_string##suffix##_instrumented; \
} else { \
qpoints->pAllocArrayResolved = art_quick_alloc_array_resolved##suffix; \
qpoints->pAllocArrayResolved8 = art_quick_alloc_array_resolved8##suffix; \
qpoints->pAllocArrayResolved16 = art_quick_alloc_array_resolved16##suffix; \
qpoints->pAllocArrayResolved32 = art_quick_alloc_array_resolved32##suffix; \
qpoints->pAllocArrayResolved64 = art_quick_alloc_array_resolved64##suffix; \
qpoints->pAllocObjectResolved = art_quick_alloc_object_resolved##suffix; \
qpoints->pAllocObjectInitialized = art_quick_alloc_object_initialized##suffix; \
qpoints->pAllocObjectWithChecks = art_quick_alloc_object_with_checks##suffix; \
qpoints->pAllocStringFromBytes = art_quick_alloc_string_from_bytes##suffix; \
qpoints->pAllocStringFromChars = art_quick_alloc_string_from_chars##suffix; \
qpoints->pAllocStringFromString = art_quick_alloc_string_from_string##suffix; \
} \
}
我们以pAllocObject为例,实际上art_quick_alloc_object_rosalloc使用bl指令跳转到C函数artAllocObjectFromCodeRosAlloc。参数type_idx描述的是要分配的对象的类型,通过寄存器r0传递,参数method描述的是当前调用的类方法,通过寄存器r1传递。
以函数artAllocObjectFromCodeRosAlloc为例,它是由以下代码调用的:(/art/runtime/entrypoints/quick/quick_alloc_entrypoints.cc)
#define GENERATE_ENTRYPOINTS_FOR_ALLOCATOR_INST(suffix, suffix2, instrumented_bool, allocator_type) \
extern "C" mirror::Object* artAllocObjectFromCodeWithChecks##suffix##suffix2( \
mirror::Class* klass, Thread* self) \
REQUIRES_SHARED(Locks::mutator_lock_) { \
return artAllocObjectFromCode<false, true, instrumented_bool, allocator_type>(klass, self); \
} \
extern "C" mirror::Object* artAllocObjectFromCodeResolved##suffix##suffix2( \
mirror::Class* klass, Thread* self) \
REQUIRES_SHARED(Locks::mutator_lock_) { \
return artAllocObjectFromCode<false, false, instrumented_bool, allocator_type>(klass, self); \
} \
extern "C" mirror::Object* artAllocObjectFromCodeInitialized##suffix##suffix2( \
mirror::Class* klass, Thread* self) \
REQUIRES_SHARED(Locks::mutator_lock_) { \
return artAllocObjectFromCode<true, false, instrumented_bool, allocator_type>(klass, self); \
} \
extern "C" mirror::Array* artAllocArrayFromCodeResolved##suffix##suffix2( \
mirror::Class* klass, int32_t component_count, Thread* self) \
REQUIRES_SHARED(Locks::mutator_lock_) { \
ScopedQuickEntrypointChecks sqec(self); \
return AllocArrayFromCodeResolved<instrumented_bool>(klass, component_count, self, \
allocator_type); \
} \
extern "C" mirror::String* artAllocStringFromBytesFromCode##suffix##suffix2( \
mirror::ByteArray* byte_array, int32_t high, int32_t offset, int32_t byte_count, \
Thread* self) \
REQUIRES_SHARED(Locks::mutator_lock_) { \
ScopedQuickEntrypointChecks sqec(self); \
StackHandleScope<1> hs(self); \
Handle<mirror::ByteArray> handle_array(hs.NewHandle(byte_array)); \
return mirror::String::AllocFromByteArray<instrumented_bool>(self, byte_count, handle_array, \
offset, high, allocator_type); \
} \
extern "C" mirror::String* artAllocStringFromCharsFromCode##suffix##suffix2( \
int32_t offset, int32_t char_count, mirror::CharArray* char_array, Thread* self) \
REQUIRES_SHARED(Locks::mutator_lock_) { \
StackHandleScope<1> hs(self); \
Handle<mirror::CharArray> handle_array(hs.NewHandle(char_array)); \
return mirror::String::AllocFromCharArray<instrumented_bool>(self, char_count, handle_array, \
offset, allocator_type); \
} \
extern "C" mirror::String* artAllocStringFromStringFromCode##suffix##suffix2( /* NOLINT */ \
mirror::String* string, Thread* self) \
REQUIRES_SHARED(Locks::mutator_lock_) { \
StackHandleScope<1> hs(self); \
Handle<mirror::String> handle_string(hs.NewHandle(string)); \
return mirror::String::AllocFromString<instrumented_bool>(self, handle_string->GetLength(), \
handle_string, 0, allocator_type); \
}
#define GENERATE_ENTRYPOINTS_FOR_ALLOCATOR(suffix, allocator_type) \
GENERATE_ENTRYPOINTS_FOR_ALLOCATOR_INST(suffix, Instrumented, true, allocator_type) \
GENERATE_ENTRYPOINTS_FOR_ALLOCATOR_INST(suffix, , false, allocator_type)
最终都调用到了artAllocObjectFromCode()方法(/art/runtime/entrypoints/quick/quick_alloc_entrypoints.cc):
static constexpr bool kUseTlabFastPath = true;
template <bool kInitialized,
bool kFinalize,
bool kInstrumented,
gc::AllocatorType allocator_type>
static ALWAYS_INLINE inline mirror::Object* artAllocObjectFromCode(
mirror::Class* klass,
Thread* self) REQUIRES_SHARED(Locks::mutator_lock_) {
ScopedQuickEntrypointChecks sqec(self);
DCHECK(klass != nullptr);
if (kUseTlabFastPath && !kInstrumented && allocator_type == gc::kAllocatorTypeTLAB) {
if (kInitialized || klass->IsInitialized()) {
if (!kFinalize || !klass->IsFinalizable()) {
size_t byte_count = klass->GetObjectSize();
byte_count = RoundUp(byte_count, gc::space::BumpPointerSpace::kAlignment);
mirror::Object* obj;
if (LIKELY(byte_count < self->TlabSize())) {
obj = self->AllocTlab(byte_count);
DCHECK(obj != nullptr) << "AllocTlab can't fail";
obj->SetClass(klass);
if (kUseBakerReadBarrier) {
obj->AssertReadBarrierState();
}
QuasiAtomic::ThreadFenceForConstructor();
return obj;
}
}
}
}
if (kInitialized) {
return AllocObjectFromCodeInitialized<kInstrumented>(klass, self, allocator_type);
} else if (!kFinalize) {
return AllocObjectFromCodeResolved<kInstrumented>(klass, self, allocator_type);
} else {
return AllocObjectFromCode<kInstrumented>(klass, self, allocator_type);
}
}
该方法做了以下几个事:
首先判断是否可以使用TLAB方式分配内存。TLAB是Android为了减少多线程之间同步,加快处理速度,使用Thread的本地存储空间来进行存储。如果可以使用TLAB分配,最终会调用Thread对象的AllocTlab()方法进行内存分配。
接下来会根据参数kInitialized和kFinalize的值来进行分支条件判断。如果类已经初始化,执行AllocObjectFromCodeInitialized()方法;否则,执行AllocObjectFromCodeResolved()和AllocObjectFromCode()方法。
我们来看AllocObjectFromCodeResolved方法( /art/runtime/entrypoints/entrypoint_utils-inl.h):
// Given the context of a calling Method and a resolved class, create an instance.
template <bool kInstrumented>
ALWAYS_INLINE
inline mirror::Object* AllocObjectFromCodeResolved(mirror::Class* klass,
Thread* self,
gc::AllocatorType allocator_type) {
DCHECK(klass != nullptr);
bool slow_path = false;
klass = CheckClassInitializedForObjectAlloc(klass, self, &slow_path);
if (UNLIKELY(slow_path)) {
if (klass == nullptr) {
return nullptr;
}
gc::Heap* heap = Runtime::Current()->GetHeap();
// Pass in false since the object cannot be finalizable.
// CheckClassInitializedForObjectAlloc can cause thread suspension which means we may now be
// instrumented.
return klass->Alloc</*kInstrumented*/true, false>(self, heap->GetCurrentAllocator()).Ptr();
}
// Pass in false since the object cannot be finalizable.
return klass->Alloc<kInstrumented, false>(self, allocator_type).Ptr();
}
判断是否需要对类进行解析(类没有加载到虚拟机中),默认不需要,则slow_path为false,如果需要解析,则slow_path为true。CheckClassInitializedForObjectAlloc返回要分配的对象对应的class。 如果klass不为null,则进行该类的对象的内存分配:调用klass的Alloc方法。
Alloc方法:(/art/runtime/mirror/class-inl.h)
template<bool kIsInstrumented, bool kCheckAddFinalizer>
inline ObjPtr<Object> Class::Alloc(Thread* self, gc::AllocatorType allocator_type) {
CheckObjectAlloc();
gc::Heap* heap = Runtime::Current()->GetHeap();
const bool add_finalizer = kCheckAddFinalizer && IsFinalizable();
if (!kCheckAddFinalizer) {
DCHECK(!IsFinalizable());
}
// Note that the this pointer may be invalidated after the allocation.
ObjPtr<Object> obj =
heap->AllocObjectWithAllocator<kIsInstrumented, false>(self,
this,
this->object_size_,
allocator_type,
VoidFunctor());
if (add_finalizer && LIKELY(obj != nullptr)) {
heap->AddFinalizerReference(self, &obj);
if (UNLIKELY(self->IsExceptionPending())) {
// Failed to allocate finalizer reference, it means that the whole allocation failed.
obj = nullptr;
}
}
return obj.Ptr();
}
通过CheckObjectAlloc()方法检查对象类型是否合法。
进行finalize相关判断,如果这个类重写了finalize()方法,则需要调用heap->AddFinalizerReference(self, &obj),通过FinalizerReference.java的add()方法,生成一个FinalizerReference对象,并添加到一个链表结构中。当对象进行销毁时,会执行调用该对象的finalize()方法。
调用heap->AllocObjectWithAllocator进行对象的内存分配。
到了这里,对象的内存分配就进入到heap堆的相关分配阶段了,我们将在下一节介绍heap堆中的内存分配环节。
小结
Thread类初始化外部库函数调用跳转表。这些函数跳转入口通过访问线程Thread对应的偏移量进入。
Thread的InitTlsEntryPoints()方法调用InitEntryPoints()方法,并且把偏移地址传递进去。根据设备cpu架构的不同,该方法的实现也不同,例如ARM 64的实现/art/runtime/arch/arm64/entrypoints_init_arm64.cc。
entry_points_allocator代表了内存分配器的类型,初始值为kAllocatorTypeDlMalloc表示将会使用DlMalloc的分配器入口。可以在调用SetQuickAllocEntryPointsAllocator改变entry_points_allocator的值。大部分情况下entry_points_allocator这个值为kAllocatorTypeRosAlloc。
artAllocObjectFromCode()方法(/art/runtime/entrypoints/quick/quick_alloc_entrypoints.cc)会根据条件(例如,是否需要对类进行解析)调用不同分支条件的内存分配。
最终,都调用heap->AllocObjectWithAllocator进行对象的内存分配。