字节码的基本结构
一个普通类的java代码
public class Parent {
public void sayHi(){
}
}
编译原理是这么描述编译的:将某一种语言(源语言)编写的程序,翻译成为一个等价的、用另一种语言(目标语言)编写的程序。那么对于java而言,就是将java代码翻译成字节码,那么上面这个简单的类,字节码是怎么描述的呢,使用 javap -v Parent.class
可以看到字节码的相应结构
public class Parent
minor version: 0
major version: 52
flags: ACC_PUBLIC, ACC_SUPER
Constant pool: 常量池
#1 = Methodref #3.#11 // java/lang/Object."<init>":()V
#2 = Class #12 // Parent
#3 = Class #13 // java/lang/Object
#4 = Utf8 <init>
#5 = Utf8 ()V
#6 = Utf8 Code
#7 = Utf8 LineNumberTable
#8 = Utf8 sayHi
#9 = Utf8 SourceFile
#10 = Utf8 Parent.java
#11 = NameAndType #4:#5 // "<init>":()V
#12 = Utf8 Parent
#13 = Utf8 java/lang/Object
{ 方法表
public Parent();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
LineNumberTable:
line 1: 0
public void sayHi();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=0, locals=1, args_size=1
0: return
LineNumberTable:
line 3: 0
}
其中,constant pool
代表常量池,它可以理解为class文件中的资源仓库,常量池中主要存放两大类常量,字面量和符号引用,字面量比较接近于java层面的常量概念,如文本字符串,声明为final
的常量值等,符号引用则是属于编译原理方面的内容,本例中类型为Utf8
的就是字面量,其他都是符号引用
Constant pool:
#2 = Class #12 // Parent
#3 = Class #13 // java/lang/Object
#11 = NameAndType #4:#5 // "<init>":()
#12 = Utf8 Parent
#13 = Utf8 java/lang/Object
首先来看这部分常量,常量#2和#3是是符号引用的第一种类型: 类和接口的全限定名,常量#2是本类的全限定名(Parent),常量#3是父类的全限定名(java/lang/Object)
,这样字节码就将继承关系简单的描述了出来
再看第一个常量#1,这是另一个类型的符号引用:方法的名称和描述符
#1 = Methodref #3.#11 // java/lang/Object."<init>":()V
#3 = Class #13 // java/lang/Object
#4 = Utf8 <init>
#5 = Utf8 ()V
#11 = NameAndType #4:#5 // "<init>":()
方法的符号引用比较有意思,可以看到,#1是由#3.#11组合而来,简单拼一下就是 #3.#11=Class(#13). NameAndType(#4:#5) = java/lang/Object."<init>":()V
,翻译一下就是类java/lang/Object
(父类)的,方法名为<init>
的,返回值为void的,没有入参的符号引用。
对于jvm而言,如果要调用一个方法,只能根据上述的符号引用(java/lang/Object."<init>":()V)
来找到这个方法,所以必须保证它的精确性和唯一性,本例中,通过类的全限定名、方法名、返回值、入参列表 精确描述了一个方法
那么方法的符号引用会在那部分使用到呢,在如下的方法表中,有个Code区,里面存放是java代码编译而来的字节码指令
public Parent();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
LineNumberTable:
line 1: 0
从方法名可以看出,本方法是编译器自动生成的无参构造方法Parent()
,里面 invokespecial #1
这一条指令即调用常量#1对应的方法引用,即父类方法java/lang/Object."<init>":()V
, invokespecial
会在后文进一步介绍,这里可以简单理解为调用方法的指令
<init>()方法与构造方法的关系
再看这个指令的细节,<init>
方法是编译器自动生成的方法,名为实例构造器,奇怪的是,构造函数Parent()
中只调用了父类的<init>
方法,没有没有调用自己的<init>
方法,那这个<init>
方法和构造方法有什么关系呢,是不是包含关系?可以验证一下
public class Parent1 {
public Parent1(){
System.out.println("hello world");
}
public void sayHi(){
Parent1 parent = new Parent1();
}
}
给类Parent1显式定义一个无参构造函数以及一个成员方法,javap -v
一下:
public void sayHi();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=2, locals=2, args_size=1
0: new #5 // class Parent1
3: dup
4: invokespecial #6 // Method "<init>":()V
7: astore_1
8: return
可以看到,sayHi()
这个方法,在new Parent1()
时,只有对Parent1的<init>
方法的调用,没有对无参构造方法Parent1()
方法的调用,那么<init>
方法必然包含了执行构造方法
public Parent1();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=2, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
7: ldc #3 // String hello world
9: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
12: return
再看显式声明的构造函数Parent1(),父类的<init>方法在具体逻辑(hello world)之前,
那么可以得出结论:<init>方法会在实例化的时候调用,且<init>方法包含了构造方法的逻辑,而构造方法中,在其他逻辑之前显式调用了父类的<init>方法,这与我们的java常识也是相符的
符号引用如何解析为直接引用
通过以上的例子,应该对类结构和字节码有了基本的认识,可能会有一个疑惑,字节码中的常量池里存放的都只是符号引用,那jvm是如何通过符号引用定位到真正的地址呢
这也是Java与c不同的地方,c语言将.c源文件编译成.o文件后,.o文件中的引用直接就是使用的分配好的虚拟地址,而java的class文件中引用就仅仅是一个描述性的符号,并没有保存最终的内存布局信息,那么引出两个问题,java编译器如何保证能够让虚拟机准确定位到符号引用指向的实际引用,以及虚拟机在什么时候将符号引用转为实际引用
符号引用:是以一组符号来描述所引用的目标,符号可以是任何形式的字面量,只要使用时能无歧义地定位到目标即可. 符号引用的目标不一定要加载到内存中.
直接引用:是直接指向目标的指针、相对偏移量或是一个能间接定位到目标的句柄. 如果有了直接引用,那引用的目标必定存在于内存中.
第一个问题上文也提到了,符号引用包含足够的信息,以供jvm实际使用时可以找到相应的位置,比如:“java/io/PrintStream.println:(Ljava/lang/String;)V”
,虚拟机就会将其转为实际引用,这个过程叫做解析,但是解析的时机并不是固定的
虚拟机规范中并没有明确规定解析阶段发生的具体时间,只要求了在执行 anewarray、checkcast、getfield、getstatic、instanceof、invokedynamic、invokeinterface、invokespecial、invokestatic、invokevirtual、ldc、ldc_w、multianewarray、new、putfield、putstatic 用于操作符号引用的字节码指令前,先对它们所使用的符号进行解析. 所以虚拟机实现可以根据需要来判断到底是在类被加载时就对常量池中的符号进行解析,还是等到一个符号将要被使用前才解析它.
以上是类加载的流程,解析是连接中的一个部分,也是本文的重点,将通过撸hotspot的源码来配合理解这个过程,源代码在下图的这个路径下
首先看类连接的时机,以下是类初始化函数的部分代码,可以看出,第一行逻辑就是进行类解析,解析完之后才会继续初始化,与常识相符(先连接再初始化)
void InstanceKlass::initialize_impl(instanceKlassHandle this_oop, TRAPS) {
// Make sure klass is linked (verified) before initialization
// A class could already be verified, since it has been reflected upon.
this_oop->link_class(CHECK);
...
}
顺便看下InstanceKlass是个什么东西,跟类有什么关系
// An InstanceKlass is the VM level representation of a Java class.
// It contains all information needed for at class at execution runtime.
// InstanceKlass layout:
// [C++ vtbl pointer ] Klass
// [subtype cache ] Klass
// [instance size ] Klass
// [java mirror ] Klass
// [super ] Klass
// [access_flags ] Klass
// [name ] Klass
// [first subklass ] Klass
// [next sibling ] Klass
// [array klasses ]
// [methods ]
// [local interfaces ]
// [transitive interfaces ]
// [fields ]
// [constants ]
// [class loader ]
// [source file name ]
// [inner classes ]
// [static field size ]
// [nonstatic field size ]
// [static oop fields size ]
// [nonstatic oop maps size ]
// [has finalize method ]
// [deoptimization mark bit ]
// [initialization state ]
// [initializing thread ]
// [Java vtable length ]
// [oop map cache (stack maps) ]
// [EMBEDDED Java vtable ] size in words = vtable_len
// [EMBEDDED nonstatic oop-map blocks] size in words = nonstatic_oop_map_size
// The embedded nonstatic oop-map blocks are short pairs (offset, length)
// indicating where oops are located in instances of this klass.
// [EMBEDDED implementor of the interface] only exist for interface
// [EMBEDDED host klass ] only exist for an anonymous class (JSR 292 en
InstanceKlass存着Java类型的名字、继承关系、实现接口关系,字段信息,方法信息,运行时常量池的指针,还有内嵌的虚方法表(vtable)、接口方法表(itable)和记录对象里什么位置上有GC会关心的指针(oop map)等等。
是给VM内部用的,并不直接暴露给Java层;InstanceKlass不是java.lang.Class的实例。
再看看类连接的逻辑:
bool InstanceKlass::link_class_impl(
instanceKlassHandle this_oop, bool throw_verifyerror, TRAPS) {
// check for error state
if (this_oop->is_in_error_state()) {
ResourceMark rm(THREAD);
THROW_MSG_(vmSymbols::java_lang_NoClassDefFoundError(),
this_oop->external_name(), false);
}
// return if already verified
if (this_oop->is_linked()) {
return true;
}
// Timing
// timer handles recursion
assert(THREAD->is_Java_thread(), "non-JavaThread in link_class_impl");
JavaThread* jt = (JavaThread*)THREAD;
// link super class before linking this class
instanceKlassHandle super(THREAD, this_oop->super());
if (super.not_null()) {
if (super->is_interface()) { // check if super class is an interface
ResourceMark rm(THREAD);
Exceptions::fthrow(
THREAD_AND_LOCATION,
vmSymbols::java_lang_IncompatibleClassChangeError(),
"class %s has interface %s as super class",
this_oop->external_name(),
super->external_name()
);
return false;
}
link_class_impl(super, throw_verifyerror, CHECK_false);
}
// link all interfaces implemented by this class before linking this class
Array<Klass*>* interfaces = this_oop->local_interfaces();
int num_interfaces = interfaces->length();
for (int index = 0; index < num_interfaces; index++) {
HandleMark hm(THREAD);
instanceKlassHandle ih(THREAD, interfaces->at(index));
link_class_impl(ih, throw_verifyerror, CHECK_false);
}
// in case the class is linked in the process of linking its superclasses
if (this_oop->is_linked()) {
return true;
}
注释比较清晰,比较容易捋出大体的流程,为了直观一点画个流程图
可以看到,连接过程中没有看到有解析这一步,因为解析的时机并不是固定的,解析的逻辑则是交给了linkResolve
这个类来完成,后文会进一步分析,这里先重点搞清楚 初始化vtable
和itable
这一步,vatable
也被称为虚方法表,从代码来看它初始化的逻辑,这也是java是实现重写的一个很重要的步骤
虚方法表 & 重写(源码分析)
void klassVtable::initialize_vtable(bool checkconstraints, TRAPS) {
// Note: Arrays can have intermediate array supers. Use java_super to skip them.
KlassHandle super (THREAD, klass()->java_super());
int nofNewEntries = 0;
if (PrintVtables && !klass()->oop_is_array()) {
ResourceMark rm(THREAD);
tty->print_cr("Initializing: %s", _klass->name()->as_C_string());
}
#ifdef ASSERT
oop* end_of_obj = (oop*)_klass() + _klass()->size();
oop* end_of_vtable = (oop*)&table()[_length];
assert(end_of_vtable <= end_of_obj, "vtable extends beyond end");
#endif
if (Universe::is_bootstrapping()) {
// just clear everything
for (int i = 0; i < _length; i++) table()[i].clear();
return;
}
int super_vtable_len = initialize_from_super(super);
if (klass()->oop_is_array()) {
assert(super_vtable_len == _length, "arrays shouldn't introduce new methods");
} else {
assert(_klass->oop_is_instance(), "must be InstanceKlass");
Array<Method*>* methods = ik()->methods();
int len = methods->length();
int initialized = super_vtable_len;
// Check each of this class's methods against super;
// if override, replace in copy of super vtable, otherwise append to end
for (int i = 0; i < len; i++) {
// update_inherited_vtable can stop for gc - ensure using handles
HandleMark hm(THREAD);
assert(methods->at(i)->is_method(), "must be a Method*");
methodHandle mh(THREAD, methods->at(i));
bool needs_new_entry = update_inherited_vtable(ik(), mh, super_vtable_len, -1, checkconstraints, CHECK);
if (needs_new_entry) {
put_method_at(mh(), initialized);
mh()->set_vtable_index(initialized); // set primary vtable index
initialized++;
}
}
// update vtable with default_methods
Array<Method*>* default_methods = ik()->default_methods();
if (default_methods != NULL) {
len = default_methods->length();
if (len > 0) {
Array<int>* def_vtable_indices = NULL;
if ((def_vtable_indices = ik()->default_vtable_indices()) == NULL) {
def_vtable_indices = ik()->create_new_default_vtable_indices(len, CHECK);
} else {
assert(def_vtable_indices->length() == len, "reinit vtable len?");
}
for (int i = 0; i < len; i++) {
HandleMark hm(THREAD);
assert(default_methods->at(i)->is_method(), "must be a Method*");
methodHandle mh(THREAD, default_methods->at(i));
bool needs_new_entry = update_inherited_vtable(ik(), mh, super_vtable_len, i, checkconstraints, CHECK);
// needs new entry
if (needs_new_entry) {
put_method_at(mh(), initialized);
def_vtable_indices->at_put(i, initialized); //set vtable index
initialized++;
}
}
}
}
// add miranda methods; it will also return the updated initialized
// Interfaces do not need interface methods in their vtables
// This includes miranda methods and during later processing, default methods
if (!ik()->is_interface()) {
initialized = fill_in_mirandas(initialized);
}
// In class hierarchies where the accessibility is not increasing (i.e., going from private ->
// package_private -> public/protected), the vtable might actually be smaller than our initial
// calculation.
assert(initialized <= _length, "vtable initialization failed");
for(;initialized < _length; initialized++) {
put_method_at(NULL, initialized);
}
NOT_PRODUCT(verify(tty, true));
}
}
// Update child's copy of super vtable for overrides
// OR return true if a new vtable entry is required.
// Only called for InstanceKlass's, i.e. not for arrays
// If that changed, could not use _klass as handle for klass
bool klassVtable::update_inherited_vtable(InstanceKlass* klass, methodHandle target_method,
int super_vtable_len, int default_index,
bool checkconstraints, TRAPS) {
//省略
}
从以上代码可以看出,刚开始子类的虚方法表与父类的虚方法表一致,个数也一样,然后再对子类的方法进行遍历,通过调用update_inherited_vtable
函数判断方法是否是对父类的重写,如果是,就调用klassVtable::put_method_at(Method* m, int index)
函数进行重写操作,更新子类 vtable 表中指向父类方法的指针,使其指向子类中该方法的入口地址。 若该方法并不是对父类方法的重写,则会调用klassVtable::put_method_at(Method* m, int index)
函数向该 Java 类的 vtable 中插入一个新的指针元素,使其指向该方法的入口地址,即增加一个新的虚函数地址,这里要注意一点,对于重写的方法,子类和父类的方法表的索引值是一致的,这个特性很关键,后文会进行介绍
多态的实现(源码分析)
在连接阶段,虚方法表初始化完成,这个时候,再来看重写和多态是怎么实现的,顺便将解析的流程也撸一遍,以下是一个简单的例子,子类Son3重写了sayHi()方法,并定义了一个成员方法用到了多态的特性
public class Parent3 {
public void sayHi(){
System.out.println("hi,son");
}
}
class Son3 extends Parent3{
public void sayHi(){
System.out.println("hi,parent");
}
public void sayHiTest(){
Parent3 son3 = new Son3();
son3.sayHi();
}
}
显然若是调用sayHiTest()方法时,打出的应该是"hi,parent",那么,jvm是如何是实现多态的呢,首先可以画出两个类的虚方法表
java -v Son3.class一下,
public void sayHi();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=2, locals=1, args_size=1
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String hi,parent
5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
LineNumberTable:
line 9: 0
line 10: 8
public void sayHiTest();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=2, locals=2, args_size=1
0: new #5 // class Son3
3: dup
4: invokespecial #6 // Method "<init>":()V
7: astore_1
8: aload_1
9: invokevirtual #7 // Method Parent3.sayHi:()V
12: return
LineNumberTable:
line 13: 0
line 14: 8
line 15: 12
}
可以看到 son3.sayHi(); 被编译成了
9: invokevirtual #7 // Method Parent3.sayHi:()V
可以看出,编译器使用了invokevirtual 指令,指向的符号引用是 Parent3.sayHi:()V
Java 字节码中与调用相关的指令共有五种。
invokestatic:用于调用静态方法。
invokespecial:用于调用私有实例方法、构造器,以及使用 super 关键字调用父类的实例方法或构造器,和所实现接口的默认方法。
invokevirtual:用于调用非私有实例方法。
invokeinterface:用于调用接口方法。
invokedynamic:用于调用动态方法。
invokevirtual较为复杂,本文以 invoke_virtual 指令为例,分析 HotSpot JVM 解释器如何从符号引用解析出直接引用信息。上文提到,解析的逻辑则是交给了linkResolve这个类来完成,追一下它的源码:
void LinkResolver::resolve_virtual_call(CallInfo& result, Handle recv, KlassHandle receiver_klass, KlassHandle resolved_klass,
Symbol* method_name, Symbol* method_signature, KlassHandle current_klass,
bool check_access, bool check_null_and_abstract, TRAPS) {
methodHandle resolved_method;
linktime_resolve_virtual_method(resolved_method, resolved_klass, method_name, method_signature, current_klass, check_access, CHECK);
runtime_resolve_virtual_method(result, resolved_method, resolved_klass, recv, receiver_klass, check_null_and_abstract, CHECK);
}
这段解析逻辑用来解析下invokevirtual
指令对应的符号引用,在本例中,就是将Method Parent3.sayHi:()V
转化为直接引用
可以看到,依次调用了链接时和运行时的解析方法,以下分别进行分析:
连接时解析方法:
void LinkResolver::linktime_resolve_virtual_method(methodHandle &resolved_method, KlassHandle resolved_klass,
Symbol* method_name, Symbol* method_signature,
KlassHandle current_klass, bool check_access, TRAPS) {
// normal method resolution
resolve_method(resolved_method, resolved_klass, method_name, method_signature, current_klass, check_access, true, CHECK);
assert(resolved_method->name() != vmSymbols::object_initializer_name(), "should have been checked in verifier");
assert(resolved_method->name() != vmSymbols::class_initializer_name (), "should have been checked in verifier");
// check if private interface method
if (resolved_klass->is_interface() && resolved_method->is_private()) {
//抛出异常
}
// check if not static
if (resolved_method->is_static()) {
////抛出异常
}
//省略
}
除去一些校验逻辑,关注resolve_method
这个方法
methodHandle LinkResolver::resolve_method(const LinkInfo& link_info,
Bytecodes::Code code, TRAPS) {
Handle nested_exception;
KlassHandle resolved_klass = link_info.resolved_klass();
// 1. For invokevirtual, cannot call an interface method
...
// 2. check constant pool tag for called method - must be JVM_CONSTANT_Methodref
...
// 3. lookup method in resolved klass and its super klasses
methodHandle resolved_method = lookup_method_in_klasses(link_info, true, false, CHECK_NULL);
// 4. lookup method in all the interfaces implemented by the resolved klass
if (resolved_method.is_null() && !resolved_klass->is_array_klass()) { // not found in the class hierarchy
resolved_method = lookup_method_in_interfaces(link_info, CHECK_NULL);
if (resolved_method.is_null()) {
// JSR 292: see if this is an implicitly generated method MethodHandle.linkToVirtual(*...), etc
resolved_method = lookup_polymorphic_method(link_info, (Handle*)NULL, (Handle*)NULL, THREAD);
if (HAS_PENDING_EXCEPTION) {
nested_exception = Handle(THREAD, PENDING_EXCEPTION);
CLEAR_PENDING_EXCEPTION;
}
}
}
// 5. method lookup failed
...
// 6. access checks, access checking may be turned off when calling from within the VM.
...
return resolved_method;
}
即先在本类中根据符号引用来找到匹配的方法,如果找不到,就去父类中找,还找不到,就去实现的接口中找(注意,这里的类指的是字节码中的类引用,并未到运行时解析的环节,即本例中的Parent3,而非Son3),寻找方法的这个逻辑在在InstanceKlass::find_method_index
中
int InstanceKlass::find_method_index(const Array<Method*>* methods,
const Symbol* name,
const Symbol* signature,
OverpassLookupMode overpass_mode,
StaticLookupMode static_mode,
PrivateLookupMode private_mode) {
const bool skipping_overpass = (overpass_mode == skip_overpass);
const bool skipping_static = (static_mode == skip_static);
const bool skipping_private = (private_mode == skip_private);
const int hit = binary_search(methods, name);
if (hit != -1) {
const Method* const m = methods->at(hit);
// Do linear search to find matching signature. First, quick check
// for common case, ignoring overpasses if requested.
if (method_matches(m, signature, skipping_overpass, skipping_static, skipping_private)) {
return hit;
}
// search downwards through overloaded methods
int i;
for (i = hit - 1; i >= 0; --i) {
const Method* const m = methods->at(i);
assert(m->is_method(), "must be method");
if (m->name() != name) {
break;
}
if (method_matches(m, signature, skipping_overpass, skipping_static, skipping_private)) {
return i;
}
}
// search upwards
for (i = hit + 1; i < methods->length(); ++i) {
const Method* const m = methods->at(i);
assert(m->is_method(), "must be method");
if (m->name() != name) {
break;
}
if (method_matches(m, signature, skipping_overpass, skipping_static, skipping_private)) {
return i;
}
}
// not found
#ifdef ASSERT
const int index = (skipping_overpass || skipping_static || skipping_private) ? -1 :
linear_search(methods, name, signature);
assert(-1 == index, "binary search should have found entry %d", index);
#endif
}
return -1;
}
逻辑比较清晰,就是遍历类的方法列表,根据符号引用来找到匹配的方法,并返回它的直接引用,既然已经解析到了方法的直接引用。上面看到运行时的解析方法又是做什么的呢?
public void sayHiTest(){
Parent3 son3 = new Son3();
son3.sayHi();
}
本例中,通过连接时解析,将方法符号Method Parent3.sayHi:()V
转换为了指向该方法的直接引用,但是,根据java语义,我们知道,真正执行的应该是Son3重写的方法,那么,这就是 运行时解析 需要处理的逻辑
运行时解析方法:
void LinkResolver::runtime_resolve_virtual_method(CallInfo& result,
const methodHandle& resolved_method,
KlassHandle resolved_klass,
Handle recv,
KlassHandle recv_klass,
bool check_null_and_abstract,
TRAPS) {
// setup default return values
int vtable_index = Method::invalid_vtable_index;
methodHandle selected_method;
...
// do lookup based on receiver klass using the vtable index
if (resolved_method->method_holder()->is_interface()) { // default or miranda method
vtable_index = vtable_index_of_interface_method(resolved_klass,
resolved_method);
assert(vtable_index >= 0 , "we should have valid vtable index at this point");
selected_method = methodHandle(THREAD, recv_klass->method_at_vtable(vtable_index));
} else {
// at this point we are sure that resolved_method is virtual and not
// a default or miranda method; therefore, it must have a valid vtable index.
assert(!resolved_method->has_itable_index(), "");
vtable_index = resolved_method->vtable_index();
// We could get a negative vtable_index for final methods,
// because as an optimization they are they are never put in the vtable,
// unless they override an existing method.
// If we do get a negative, it means the resolved method is the the selected
// method, and it can never be changed by an override.
if (vtable_index == Method::nonvirtual_vtable_index) {
assert(resolved_method->can_be_statically_bound(), "cannot override this method");
selected_method = resolved_method;
} else {
selected_method = methodHandle(THREAD, recv_klass->method_at_vtable(vtable_index));
}
}
...
// setup result
result.set_virtual(resolved_klass, recv_klass, resolved_method, selected_method, vtable_index, CHECK);
}
尤其注意这行代码:
selected_method = methodHandle(THREAD, recv_klass->method_at_vtable(vtable_index));
通过源码上下文可以看出,jvm可以获取到调用invokevirtual
时调用者的实际类型,即本例中的Son3,本行代码中recv_klass就是Son3类的指向instanceKlass
的指针,调用其method_at_vtable
方法即可获取Son3对应虚方法表中特定索引值的直接引用, 那么jvm如何知道该方法的索引值呢
在连接时解析阶段,就已经获得了父类直接引用,读取其对应的虚表索引值(vtable_index
),对于重写方法,父子类的索引值相同,即可直接通method_at_vtable(vtable_index)
方法获取子类方法的直接引用,流程如下:
至此,java重写、多态语义在jvm中的大体实现逻辑已经分析完毕,接下来分析java的另外几个语义:重载、隐藏
重载语义
重载相对于重写而言较为简单,其实现主要在于编译阶段,看下面这个例子:
public class Parent2 {
public void sayHi(String string){
System.out.println("hi,string");
}
public void sayHi(Object object){
System.out.println("hi,object");
}
public static void main(String[] args) {
new Parent2().sayHi("hi");
}
}
void sayHi(String string)
和 void sayHi(Object object)
构成重载,看下字节码中的区别:
public void sayHi(java.lang.String);
descriptor: (Ljava/lang/String;)V
flags: ACC_PUBLIC
...
public void sayHi(java.lang.Object);
descriptor: (Ljava/lang/Object;)V
flags: ACC_PUBLIC
...
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=1, args_size=1
0: new #6 // class Parent2
3: dup
4: invokespecial #7 // Method "<init>":()V
7: ldc #8 // String hi
9: invokevirtual #9 // Method sayHi:(Ljava/lang/String;)V
12: return
二者的描述符就不相同,由于二者区分在编译阶段已经完成,我们可以认为 Java 虚拟机不存在重载这一概念。因此,在某些文章中,重载也被称为静态绑定(static binding),把重写被称为动态绑定(dynamic binding)
继续看这个例子,调用方法时传入的参数为"hi",既是String类型,也是Object类型,但是在main()函数对应的字节码中,直接认定了调用(Ljava/lang/String;)V
,这是为什么呢
Java 编译器选取重载方法的过程共分为三个阶段:
1、在不考虑对基本类型自动装拆箱(auto-boxing,auto-unboxing),以及可变长参数的情况下选取重载方法;
2、如果在第 1 个阶段中没有找到适配的方法,那么在允许自动装拆箱,但不允许可变长参数的情况下选取重载方法;
3、如果在第 2 个阶段中没有找到适配的方法,那么在允许自动装拆箱以及可变长参数的情况下选取重载方法。
如果 Java 编译器在同一个阶段中找到了多个适配的方法,那么它会在其中选择一个最为贴切的,而决定贴切程度的一个关键就是形式参数类型的继承关系。
本例中,String和Object均符合,于是java编译器选择了最为贴切的Ljava/lang/String;)V
方法
再看隐藏:
隐藏语义
public class Parent4 {
public static void sayHi(){
System.out.println("hi,son");
}
}
class Son4 extends Parent4{
public static void sayHi(){
System.out.println("hi,parent");
}
public static void main(String[] args) {
Parent4 son4 = new Son4();
son4.sayHi();
}
}
本例中,将打印出"hi,son",这和上面多态的场景基本一致,区别在于本例中的方法是静态方法,所以尽管真正的实例是Son4类型的,但最后还是调用的父类的方法,jvm是如何处理这个逻辑的,首先看下字节码:
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=2, args_size=1
0: new #5 // class Son4
3: dup
4: invokespecial #6 // Method "<init>":()V
7: astore_1
8: aload_1
9: pop
10: invokestatic #7 // Method Parent4.sayHi:()V
13: return
可以看到,方法调用被编译成了:
10: invokestatic #7 // Method Parent4.sayHi:()V
看看hospot源码中是如何解析它的:
void LinkResolver::resolve_static_call(CallInfo& result, KlassHandle& resolved_klass, Symbol* method_name,
Symbol* method_signature, KlassHandle current_klass,
bool check_access, bool initialize_class, TRAPS) {
methodHandle resolved_method;
linktime_resolve_static_method(resolved_method, resolved_klass, method_name, method_signature, current_klass, check_access, CHECK);
resolved_klass = KlassHandle(THREAD, resolved_method->method_holder());
// Initialize klass (this should only happen if everything is ok)
if (initialize_class && resolved_klass->should_be_initialized()) {
resolved_klass->initialize(CHECK);
linktime_resolve_static_method(resolved_method, resolved_klass, method_name, method_signature, current_klass, check_access, CHECK);
}
// setup result
result.set_static(resolved_klass, resolved_method, CHECK);
}
可以看到,与上文中的resolve_virtual_call
相比,解析过程只有连接时解析(linktime_resolve_static_method
),缺少了运行时解析,连接时解析的逻辑基本一致,而从上文的分析中,连接时解析 只会按照字节码中的符号引用来进行解析,自然,本例的sayHi:()
方法最后解析出来的是方法Parent4.sayHi:()V
的直接引用,调用的是Parent4.sayHi()
方法,表现出来的现象就是子类的方法隐藏
遮蔽语义
下面这段代码,在init()函数中定义了一个与类变量同名的局部变量a
public class Parent5 {
public String a = "out";
public void init(){
String a = "in";
System.out.println(a);
System.out.println(this.a);
}
public static void main(String[] args){
new Parent5().init();
}
}
本例中,a和this.a的值是不同的,当使用简单名a的时候,值是“in”,先看看字节码:
常量池(部分)
Constant pool:
#1 = Methodref #10.#22 // java/lang/Object."<init>":()V
#2 = String #23 // out
#3 = Fieldref #7.#24 // Parent5.a:Ljava/lang/String;
#4 = String #25 // in
#5 = Fieldref #26.#27 // java/lang/System.out:Ljava/io/PrintStream;
#6 = Methodref #28.#29 // java/io/PrintStream.println:(Ljava/lang/String;)V
#7 = Class #30 // Parent5
#8 = Methodref #7.#22 // Parent5."<init>":()V
#9 = Methodref #7.#31 // Parent5.init:()V
#10 = Class #32 // java/lang/Object
init()函数对应的字节码
public void init();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=2, locals=2, args_size=1
0: ldc #4 // String in
2: astore_1
3: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream;
6: aload_1
7: invokevirtual #6 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
10: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream;
13: aload_0
14: getfield #3 // Field a:Ljava/lang/String;
17: invokevirtual #6 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
20: return
首先看字节码中是怎么描述类变量a的,
3 = Fieldref #7.#24 // Parent5.a:Ljava/lang/String;
7 = Class #30 // Parent5
24 = NameAndType #11:#12 // a:Ljava/lang/String;
11 = Utf8 a
12 = Utf8 Ljava/lang/String;
常量池中有个Filedrdf类型的符号引用#3,名称和类型在#24常量中存放,拼起来就是 Parent5.a:Ljava/lang/String;
再看 System.out.println(this.a);这条语句被编译成了什么:
14: getfield #3 // Field a:Ljava/lang/String;
17: invokevirtual #6 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
即先根据常量池中的符号引用#3获取值(Field a:Ljava/lang/String),然后调用打印函数,显然直接指向了类变量a
再看局部变量a在字节码中如何表示:
#4 = String #25 // in
#25 = Utf8 in
常量池中通过这两个常量就完成了对局部变量a的表示,仔细一看,没有变量名,再看System.out.println(a);这条语句是如何读取变量的:
0: ldc #4 // String in
2: astore_1
3: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream;
6: aload_1
7: invokevirtual #6 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
直接通过ldc指令将常量池中的常量#4压到栈顶,即字符串“in”,astore_1指令将“in”字符串的引用保存在局部变量表中,aload_1指令从局部变量表装载入“in”字符串的引用到操作数栈的栈顶,然后执行println方法
可以看出,整个过程中,并没有体现出局部变量a的变量名,自然,尽管在代码中,类变量a和局部变量有着一样的变量名和类型,但是经过编译后,对于jvm而言,二者根本不会产生任何混淆, 甚至如果将局部变量a换成局部变量b,编译出来的字节码一模一样
本例就是遮蔽的一种典型场景,可以看出,该语义的实现在编译期间就已经完成了