干掉面试官1-synchronized底层原理（从Java对象头说到即时编译优化）

阅读原文时间：2021年04月21日阅读：3

synchronized底层原理（从Java对象头说到即时编译优化）

一、两个好用却不被熟知的工具
* 1.1、字节码查看插件（jclasslib Bytecode viewer）
- 1.2、Java对象内存布局查看工具-JOL
二、Java对象在内存中的存储布局
* 2.1、理论
三、synchronized详解
* 3.1、Java源码和字节码层级的synchronized

要想理解透 synchronized，还要从Java对象头说起。要想能直观的观察到内存布局还要借助一些工具。

一、两个好用却不被熟知的工具

1.1、字节码查看插件（jclasslib Bytecode viewer）

常规观看Java类编译后的class文件的字节码较为复杂，需要将java类编译成class文件，再使用javap -verbose ***.class命令才能查看它的字节码。

Idea这么强大，怎么会没有插件，插件的名字是jclasslib Bytecode viewer，至于怎么安装插件，大家自行百度。

这里简单介绍它的使用方式，也很easy，见下图：

字节码显示区中，已将常量池、接口、变量等进行了分类，并且有信息提示、信息关联，字节码指令（点击对应指令还可跳转Oracle官网虚拟机指令API文档）。使用起来非常方便，大家慢慢体会。

1.2、Java对象内存布局查看工具-JOL

JOL是Java Object Layout的缩写，相信不用翻译大家，也已知道它的作用。JOL就是OpenJdk提供的一款小工具，传送门。使用方式如下：

引入JOL的maven依赖

<!-- https://mvnrepository.com/artifact/org.openjdk.jol/jol-core -->
<dependency>
    <groupId>org.openjdk.jol</groupId>
    <artifactId>jol-core</artifactId>
    <version>0.9</version>
</dependency>

编写程序调用即可

private static void main(String[] args){
    Object obj = new Object();
    String layout = ClassLayout.parseInstance(obj).toPrintable();
    System.out.println(layout);
}

打印结果如下：

java.lang.Object object internals:
 OFFSET  SIZE   TYPE DESCRIPTION                               VALUE
      0     4        (object header)                           01 00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4        (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4        (object header)                           28 0f b3 1a (00101000 00001111 10110011 00011010) (447942440)
     12     4        (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

打印结果是一道高级的面试题哦：Object obj = new Object()初始化出的obj对象，在内存中占用多少字节？大家还可尝试声明一个类，分别加上boolean、Boolean、int、Integer、数组、引用对象等成员变量，打印出的结果便可观看出该类型在Java中到底占多少字节。

更深入的用法等大家自行去深究。

二、Java对象在内存中的存储布局

这部分内容还是单独拎出来做一个介绍，因为synchronized锁会用到这部分知识。

2.1、理论

在HotSpot的虚拟机中，Java对象在内存中的存储布局总体分为3块区域：对象头(object header)、实例数据（instance data）、和对齐填充（Padding）。

下图是普通对象实例与数组对象实例的数据结构，其中数组长度为数组对象时才会有的对象头。

2.2、实践

我们就通过1.2章中介绍的JOL工具进行查看

private static void main(String[] args){
    // User中加入成员变量，观察它的内存布局，此时会看到实例数据部分的内容
    User obj1 = new User();
    String layout1 = ClassLayout.parseInstance(obj1).toPrintable();
    System.out.println(layout1);

    // User数组对象，观察它的内存布局，此时会看到数组数据部分的内容
    // 数组数据所占字节数 = 数组长度 * 4；下例中长度为：5 * 4 = 20字节
    User[] obj2 = new User[5];
    String layout2 = ClassLayout.parseInstance(obj2).toPrintable();
    System.out.println(layout2);
}

上述程序的执行结果就不占用文章内容贴出了，动手复制过去自己看下结果，并把玩一下。

2.3、MarkWord浅析及锁膨胀过程

对象头中的MarkWord用于存储对象本身的运行时数据，记录了对象的哈希码、锁和GC标记等相关信息。当使用synchronized关键字加锁时，围绕同步锁的一系列过程均和Mark Word有关。这也是为何会介绍内存存储布局的原因所在。

在jdk的源码openjdk中的个人下载路径\openjdk\hotspot\src\share\vm\oops路径下有markOop.hpp的C++头文件，里面有这样一段注释：

// Bit-format of an object header (most significant first, big endian layout below):
//
//  32 bits:
//  --------
//             hash:25 ------------>| age:4    biased_lock:1 lock:2 (normal object)
//             JavaThread*:23 epoch:2 age:4    biased_lock:1 lock:2 (biased object)
//             size:32 ------------------------------------------>| (CMS free block)
//             PromotedObject*:29 ---------->| promo_bits:3 ----->| (CMS promoted object)
//
//  64 bits:
//  --------
//  unused:25 hash:31 -->| unused:1   age:4    biased_lock:1 lock:2 (normal object)
//  JavaThread*:54 epoch:2 unused:1   age:4    biased_lock:1 lock:2 (biased object)
//  PromotedObject*:61 --------------------->| promo_bits:3 ----->| (CMS promoted object)
//  size:64 ----------------------------------------------------->| (CMS free block)
//
//  unused:25 hash:31 -->| cms_free:1 age:4    biased_lock:1 lock:2 (COOPs && normal object)
//  JavaThread*:54 epoch:2 cms_free:1 age:4    biased_lock:1 lock:2 (COOPs && biased object)
//  narrowOop:32 unused:24 cms_free:1 unused:4 promo_bits:3 ----->| (COOPs && CMS promoted object)
//  unused:21 size:35 -->| cms_free:1 unused:7 ------------------>| (COOPs && CMS free block)

MarkWord在32位的JVM中是32bit，在64位中是64bit。但是对于锁状态的存储内容都是一致的。我们拿相对简洁的32位JVM中的存储举例，MarkWord中的具体组成，如下图：

其中2bit的锁标志位表示锁的状态，1bit的偏向锁标志位表示是否偏向。

当对象初始化后，还未有任何线程来竞争，此时为无锁状态。其中锁标志位为01，偏向锁标志位为0
当有一个线程来竞争锁，锁对象第一次被线程获取时，锁标志位依然为01，偏向锁标志位会被置为1，此时锁进入偏向模式。同时，使用CAS操作将此获取锁对象的线程ID设置到锁对象的Mark Word中，持有偏向锁，下次再可直接进入。
此时，线程B尝试获取锁，发现锁处于偏向模式，但Mark Word中存储的不是本线程ID。那么线程B使用CAS操作尝试获取锁，这时锁是有可能获取成功的，因为上一个持有偏向锁的线程不会主动释放偏向锁。如果线程B获取锁成功，则会将Mark Word中的线程ID设置为本线程的ID。但若线程B获取锁失败，则会执行下述操作。
偏向锁抢占失败，表明锁对象存在竞争，则会先撤销偏向模式，偏向锁标志位重新被置为0，准备升级轻量级锁。首先将在当前线程的帧栈中开辟一块锁记录空间（Lock Record），用于存储锁对象当前的Mark Word拷贝。然后，使用CAS操作尝试把锁对象的Mark Word更新为指向帧栈中Lock Record的指针，CAS操作成功，则代表获取到锁，同时将锁标志位设置为00，进入轻量级锁模式。若CAS操作失败，则进入下述操作。
刚一出现CAS竞争轻量级锁失败时，不会立即膨胀为重量级锁，而是采用自旋的方式，不断重试，尝试抢锁。JDK1.6中，默认开启自旋，自旋10次，可通过-XX:PreBlockSpin更改自旋次数。JDK1.6对于只能指定固定次数的自旋进行了优化，采用了自适应的自旋，重试机制更加智能。
只有通过自旋依然获取不到锁的情况，表明锁竞争较为激烈，不再适合额外的CAS操作消耗CPU资源，则直接膨胀为重量级锁，锁标志位设置为10。在此状态下，所有等待锁的线程都必须进入阻塞状态。（打个广告：对于线程的状态，推荐大家看下我的另外一篇文章：脱掉Java线程状态的衣服）

针对上述的步骤不了解没关系，看完后面的介绍，回过头来再反复品一品。

2.4、指针压缩（-XX:+UseCompressedClassPointers 和-XX:+UseCompressedOops）

这里会引申出“指针压缩”的概念，以及可能会看到的两个JVM的参数-XX:+UseCompressedClassPointers和-XX:+UseCompressedOops，这里做一个简介，并用实验的方式解释清楚它们的含义。

**指针压缩：**JVM最初是32位的，随着64位系统的兴起，JVM也迎来了从32位到64位的转换，32位的JVM对比64位的内存容量比较有限。但是使用64位虚拟机的同时，带来一个问题，64位下的JVM中的对象指针占用内存会比32位的多1.5倍，这是我们不希望看到的。于是在JDK1.6时，引入了指针压缩。

**-XX:+UseCompressedClassPointers参数：**启用类指针(类元数据的指针)压缩。

**-XX:+UseCompressedOops参数：**启用普通对象指针压缩。Oops缩写于：ordinary object pointers

-XX:+UseCompressedClassPointers和-XX:+UseCompressedOops在Jdk1.8中默认开启，可用java -XX:+PrintCommandLineFlags -version此条命令进行检测：

+UseCompressedClassPointers和+UseCompressedOops参数中的+号代表开启参数，-号代表关闭参数。下面例子中会使用-号来关闭参数。通过在Idea中编辑jvm参数，来用实践去检验这两个参数的开和关对内存布局的影响。

我们使用四组不同Vm options来跑下面的小Demo：

-XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+PrintCommandLineFlags
-XX:-UseCompressedClassPointers -XX:+UseCompressedOops -XX:+PrintCommandLineFlags
-XX:+UseCompressedClassPointers -XX:-UseCompressedOops -XX:+PrintCommandLineFlags

-XX:-UseCompressedClassPointers -XX:-UseCompressedOops -XX:+PrintCommandLineFlags

public class HelloJOL {

private boolean flag1 = true;
private Boolean flag2 = true;
private int x = 0;
private Integer y = 0;
private String str = "";
private int[] arrInt = new int[10];
private String[] arrStr;

public static void main(String[] args) {
    HelloJOL o = new HelloJOL();     
    String layout = ClassLayout.parseInstance(o).toPrintable();
    System.out.println(layout);
}

}

通过仔细观察打印结果，会得出如下结论：

-XX:+UseCompressedClassPointers -XX:+UseCompressedOops

对象头的大小为12字节，其中8字节的markword + 4字节的class pointer
-XX:-UseCompressedClassPointers -XX:+UseCompressedOops

仅关闭类指针压缩

对象头的大小为16字节，其中8字节的markword + 8字节的class pointer

说明：64位机器中UseCompressedClassPointers会将class pointer类指针从8字节压缩至4字节
-XX:+UseCompressedClassPointers -XX:-UseCompressedOops

仅关闭普通对象指针

通过-XX:+PrintCommandLineFlags打印的jvm参数会发现，这时UseCompressedClassPointers会被系统默认关闭（虽然你没有设置）；

对象头的大小为16字节。因为类指针压缩被级联关闭；

boolean、int等基础类型的属性的大小不变，依然为1、4字节，但是Boolean、Integer、String、数组等类型的属性，占用大小由4字节变成了8字节。
-XX:-UseCompressedClassPointers -XX:-UseCompressedOops

同时关闭类指针压缩和普通对象指针压缩，效果同同实验3

三、synchronized详解

3.1、Java源码和字节码层级的synchronized

此篇文章基于对synchronized关键字和用法有初步的理解，不再进行基础知识的科普。不了解的先去学习一下：传送门

首先，我们都知道synchronized关键字既可以修饰方法（静态和非静态），也可以修饰代码块。

非静态方法：针对当前实例加锁
静态方法：作用于当前类加锁

修饰代码块：指定加锁对象，既可针对类加锁，也可针对实例对象加锁。

public static synchronized void methodA() {
// 修饰静态方法，执行前必须先获取当前类的锁
}

public synchronized void methodB() {
    // 修饰非静态方法，执行前必须先获取当前实例对象的锁
}

Object lock =  new Object();
public void methodC() {
    synchronized (lock) {
        // 同步块，执行前必须先获取lock实例对象的锁
    }
}

public void methodD() {
    synchronized (Object.class) {
        // 同步块，执行前必须先获取Object类锁
    }
}

对此段代码进行编译，查看字节码文件。jvm对于synchronized关键字既可以修饰方法和修饰代码块的实现是不同的：

修饰方法：方法的访问标志flags中增加了ACC_SYNCHRONIZED标记。用来告诉JVM这是一个同步方法，在进入该方法之前需要获取相应的锁。
修饰代码块：方法的code中，会产生mointerenter和mointerexit指令，由monitorenter指令进入，然后monitorexit释放锁

3.2、JVM层级的synchronized`重点`

 public static void main(String[] args) {

        Object lock = new Object();

        System.out.println("加锁前**********************");
        String layout0 = ClassLayout.parseInstance(lock).toPrintable();
        System.out.println(layout0);

        System.out.println("***********加锁时***********");
        synchronized (lock) {
            // -XX:BiasedLockingStartupDelay=0 偏向锁延时
            String layout1 = ClassLayout.parseInstance(lock).toPrintable();
            System.out.println(layout1);
        }

        System.out.println("*********************释放锁后*");
        String layout2 = ClassLayout.parseInstance(lock).toPrintable();
        System.out.println(layout2);
    }

通过我们在1.2章中介绍的JOL工具查看一下，加锁前、加锁时、释放锁后对象头，都有什么样的变化，jdk版本不同，看的结果会不大相同。但是肯定会看到对象的markword发生了一定的变化。

在上面，我们已经介绍过synchronized修饰代码块时，会产生mointerenter和mointerexit指令。那么，jvm是如何通过这两个指令来搞定加锁的呢？下面我们一步步跟踪openjdk源码中，如何实现的mointerenter和mointerexit。

我使用的是openjdk8，附百du云盘下载链接:https://pan.baidu.com/s/1ZFQLurrriyUzyS78_SwcXw 密码:aeqm

3.2.1 jdk源码中mointerenter和mointerexit

openjdk根路径/hotspot/src/share/vm/interpreter路径下的interpreterRuntime.cpp文件中对mointerenter和mointerexit的定义：

// 解释器的同步代码被分解出来，以便方法调用和同步快可以共享使用
// The interpreter's synchronization code is factored out so that it can
// be shared by method invocation and synchronized blocks.
//%note synchronization_3

//%note monitor_1 monitorenter同步锁加锁方法
IRT_ENTRY_NO_ASYNC(void, InterpreterRuntime::monitorenter(JavaThread* thread, BasicObjectLock* elem))
#ifdef ASSERT
  thread->last_frame().interpreter_frame_verify_monitor(elem);
#endif
  if (PrintBiasedLockingStatistics) { // 打印偏向锁的统计
    Atomic::inc(BiasedLocking::slow_path_entry_count_addr());
  }
  Handle h_obj(thread, elem->obj());
  assert(Universe::heap()->is_in_reserved_or_null(h_obj()),
         "must be NULL or an object");
  if (UseBiasedLocking) { // 如果开启了偏向模式
    // Retry fast entry if bias is revoked to avoid unnecessary inflation
    // 请快速重试进入，如果偏向锁被取消以避免不必要的膨胀
    ObjectSynchronizer::fast_enter(h_obj, elem->lock(), true, CHECK);
  } else {
    // 没开启偏向模式的，则调用slow_enter方法进入轻/重量级锁
    ObjectSynchronizer::slow_enter(h_obj, elem->lock(), CHECK);
  }
  assert(Universe::heap()->is_in_reserved_or_null(elem->obj()),
         "must be NULL or an object");
#ifdef ASSERT
  thread->last_frame().interpreter_frame_verify_monitor(elem);
#endif
IRT_END


//%note monitor_1  monitorexit同步锁的释放锁方法
IRT_ENTRY_NO_ASYNC(void, InterpreterRuntime::monitorexit(JavaThread* thread, BasicObjectLock* elem))
#ifdef ASSERT
  thread->last_frame().interpreter_frame_verify_monitor(elem);
#endif
  Handle h_obj(thread, elem->obj());
  assert(Universe::heap()->is_in_reserved_or_null(h_obj()),
         "must be NULL or an object");
  if (elem == NULL || h_obj()->is_unlocked()) {
    THROW(vmSymbols::java_lang_IllegalMonitorStateException());
  }
  ObjectSynchronizer::slow_exit(h_obj(), elem->lock(), thread);
  // Free entry. This must be done here, since a pending exception might be installed on
  // exit. If it is not cleared, the exception handling code will try to unlock the monitor again.
  elem->set_obj(NULL);
#ifdef ASSERT
  thread->last_frame().interpreter_frame_verify_monitor(elem);
#endif
IRT_END

3.2.2 jdk源码中fast_enter和slow_enter方法

openjdk根路径/hotspot/src/share/vm/runtime/synchronizer.cpp路径下的synchronized.cpp文件中对fast_enter和slow_enter的定义，仔细阅读并结合本文2.3章中于锁膨胀过程的介绍，会对加锁、锁膨胀、释放锁的过程有更清晰的认识。本文2.3章内容一定要反复看，反复品！！！

// -----------------------------------------------------------------------------
// Monitor快速Enter/Exit的方法，解释器和编译器使用了一些汇编语言在其中。如果一下的函数被更改，请确保更新他们。实现方式对竟态条件及其敏感，务必小心。
//  Fast Monitor Enter/Exit
// This the fast monitor enter. The interpreter and compiler use
// some assembly copies of this code. Make sure update those code
// if the following function is changed. The implementation is
// extremely sensitive to race condition. Be careful.

void ObjectSynchronizer::fast_enter(Handle obj, BasicLock* lock, bool attempt_rebias, TRAPS) {
 if (UseBiasedLocking) {// 又判断了一遍是否使用偏向模式
    if (!SafepointSynchronize::is_at_safepoint()) {// 确保当前不在安全点
      // 偏向锁加锁：revoke_and_rebias
      BiasedLocking::Condition cond = BiasedLocking::revoke_and_rebias(obj, attempt_rebias, THREAD);
      if (cond == BiasedLocking::BIAS_REVOKED_AND_REBIASED) {
        return;
      }
    } else {
      assert(!attempt_rebias, "can not rebias toward VM thread");
      BiasedLocking::revoke_at_safepoint(obj);
    }
    assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now");
 }
 // 快速加锁未成功时，采用慢加锁的方式
 slow_enter (obj, lock, THREAD) ;
}

void ObjectSynchronizer::fast_exit(oop object, BasicLock* lock, TRAPS) {
  // 从下面这个断言遍可得知：偏向锁不会进入快锁解锁方法。
  assert(!object->mark()->has_bias_pattern(), "should not see bias pattern here");
  // displaced header是升级轻量级锁过程中，用于存储锁对象MarkWord的拷贝，官方为这份拷贝加了一个Displaced前缀。可参考：《深入理解Java虚拟机》第三版482页的介绍。
  // 如果displaced header是空，先前的加锁便是重量级锁
  // if displaced header is null, the previous enter is recursive enter, no-op
  markOop dhw = lock->displaced_header();
  markOop mark ;
  if (dhw == NULL) {
     // Recursive stack-lock. 递归堆栈锁
     // Diagnostics -- Could be: stack-locked, inflating, inflated. 断定应该是：堆栈锁、膨胀中、已膨胀（重量级锁）
     mark = object->mark() ;
     assert (!mark->is_neutral(), "invariant") ;
     if (mark->has_locker() && mark != markOopDesc::INFLATING()) {
        assert(THREAD->is_lock_owned((address)mark->locker()), "invariant") ;
     }
     if (mark->has_monitor()) {
        ObjectMonitor * m = mark->monitor() ;
        assert(((oop)(m->object()))->mark() == mark, "invariant") ;
        assert(m->is_entered(THREAD), "invariant") ;
     }
     return ;
  }

  mark = object->mark() ; // 锁对象头的MarkWord

  // 此处为轻量级锁的释放过程，使用CAS方式解锁（下述方法中的cmpxchg_ptr即CAS操作）。
  // 如果对象被当前线程堆栈锁定，请尝试将displaced header和锁对象中的MarkWord替换回来。
  // If the object is stack-locked by the current thread, try to
  // swing the displaced header from the box back to the mark.
  if (mark == (markOop) lock) {
     assert (dhw->is_neutral(), "invariant") ;
     if ((markOop) Atomic::cmpxchg_ptr (dhw, object->mark_addr(), mark) == mark) {
        TEVENT (fast_exit: release stacklock) ;
        return;
     }
  }

  ObjectSynchronizer::inflate(THREAD, object)->exit (true, THREAD) ;
}

// -----------------------------------------------------------------------------
// Interpreter/Compiler Slow Case
// 解释器/编译器慢加锁的case。常规操作，此时不需使用fast_enter的方式，因为一定是在解释器/编译器已经失败过了。
// This routine is used to handle interpreter/compiler slow case
// We don't need to use fast path here, because it must have been
// failed in the interpreter/compiler code.
void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, TRAPS) {
  markOop mark = obj->mark();
  assert(!mark->has_bias_pattern(), "should not see bias pattern here");

  if (mark->is_neutral()) {
    // 预期成功的CAS -- 替换标记的ST必须是可见的 <= CAS执行的ST。优先使用轻量级锁（又叫：自旋锁）
    // Anticipate successful CAS -- the ST of the displaced mark must
    // be visible <= the ST performed by the CAS.
    lock->set_displaced_header(mark);
    if (mark == (markOop) Atomic::cmpxchg_ptr(lock, obj()->mark_addr(), mark)) {
      TEVENT (slow_enter: release stacklock) ;
      return ;
    }
    // Fall through to inflate() ... 上面没成功，只能向下执行inflate()锁膨胀方法了
  } else
  if (mark->has_locker() && THREAD->is_lock_owned((address)mark->locker())) { //当前线程已持有锁
    assert(lock != mark->locker(), "must not re-lock the same lock");
    assert(lock != (BasicLock*)obj->mark(), "don't relock with same BasicLock");
    lock->set_displaced_header(NULL);
    return;
  }

#if 0
  // The following optimization isn't particularly useful.
  if (mark->has_monitor() && mark->monitor()->is_entered(THREAD)) {
    lock->set_displaced_header (NULL) ;
    return ;
  }
#endif

  // 对象头将再也不会被移到这个锁锁，所以是什么值并不重要，除非必须是非零的，以避免看起来像是重入锁，而且也不能看起来是锁定的。
  // 重量级锁的mrakword中除了锁标记位为10外，另外30位是：指向重量级锁的指针
  // The object header will never be displaced to this lock,
  // so it does not matter what the value is, except that it
  // must be non-zero to avoid looking like a re-entrant lock,
  // and must not look locked either.
  lock->set_displaced_header(markOopDesc::unused_mark());
  ObjectSynchronizer::inflate(THREAD, obj())->enter(THREAD);
}

// This routine is used to handle interpreter/compiler slow case
// We don't need to use fast path here, because it must have
// failed in the interpreter/compiler code. Simply use the heavy
// weight monitor should be ok, unless someone find otherwise.
void ObjectSynchronizer::slow_exit(oop object, BasicLock* lock, TRAPS) {
  fast_exit (object, lock, THREAD) ;
}

3.2.2 jdk源码中inflate方法

同样是synchronized.cpp文件中的方法，两部分代码没挨着，又比较长，分开放了。

// Note that we could encounter some performance loss through false-sharing as
// multiple locks occupy the same $ line.  Padding might be appropriate.
// 注意：当多个锁并发使用同一 $=行时，错误的共享方式可能会导致一些性能损失。填充可能是合适的。


ObjectMonitor * ATTR ObjectSynchronizer::inflate (Thread * Self, oop object) {
  // Inflate mutates the heap ...
  // Relaxing assertion for bug 6320749.
  assert (Universe::verify_in_progress() ||
          !SafepointSynchronize::is_at_safepoint(), "invariant") ;

  for (;;) {
      const markOop mark = object->mark() ;
      assert (!mark->has_bias_pattern(), "invariant") ;

      // The mark can be in one of the following states:
      // *  Inflated     - just return 仅仅返回
      // *  Stack-locked - coerce it to inflated 轻量级锁，需强迫它膨胀
      // *  INFLATING    - busy wait for conversion to complete 膨胀中，需自旋等待转换完成
      // *  Neutral中立的 - aggressively inflate the object. 积极地使object发生膨胀
      // *  BIASED       - Illegal.  We should never see this 进入此方法必定不是偏向锁状态，直接忽略即可

      // CASE: inflated
      if (mark->has_monitor()) {
          ObjectMonitor * inf = mark->monitor() ;
          assert (inf->header()->is_neutral(), "invariant");
          assert (inf->object() == object, "invariant") ;
          assert (ObjectSynchronizer::verify_objmon_isinpool(inf), "monitor is invalid");
          return inf ;
      }

      // CASE: inflation in progress - inflating over a stack-lock.   锁膨胀正在进行中，膨胀的堆栈锁（轻量级锁）
      // Some other thread is converting from stack-locked to inflated.     其他线程正在从堆栈锁（轻量级锁）定转换为膨胀。
      // Only that thread can complete inflation -- other threads must wait.  只有那个线程才能完成膨胀——其他线程必须等待。
      // The INFLATING value is transient.                    INFLATING状态是暂时的
      // Currently, we spin/yield/park and poll the markword, waiting for inflation to finish. 并发地，我们 spin/yield/park和poll的markword，等待inflation结束。
      // We could always eliminate polling by parking the thread on some auxiliary list.  我们总是可以通过将线程停在某个辅助列表上来消除轮询。
      if (mark == markOopDesc::INFLATING()) {
         TEVENT (Inflate: spin while INFLATING) ;
         ReadStableMark(object) ;
         continue ;
      }

      // CASE: stack-locked 此时锁为：轻量级锁，需强迫它膨胀为重量级锁
      // Could be stack-locked either by this thread or by some other thread.  可能被此线程或其他线程堆栈锁定
      //
      // Note that we allocate the objectmonitor speculatively, _before_ attempting
      // to install INFLATING into the mark word.  We originally installed INFLATING,
      // allocated the objectmonitor, and then finally STed the address of the
      // objectmonitor into the mark.  This was correct, but artificially lengthened
      // the interval in which INFLATED appeared in the mark, thus increasing
      // the odds of inflation contention.
      // 我们大胆地分配objectmonitor，在此之前尝试将INFLATING状态先设置到mark word。
      // 我们先设置了INFLATING状态标记，然后分配了objectmonitor，最后将objectmonitor的地址设置到mark word中。
      // 这是正确的，但人为地延长了INFLATED出现在mark上的时间间隔，从而增加了锁膨胀的可能性。
      // 老外反复说了一堆重复的话，意思无非就是：markword设置状态INFLATING（结合上段对INFLATING处理的代码思考） -> 分配锁 -> markword设置状态INFLATED(膨胀重量级锁成功)
      //
      // We now use per-thread private objectmonitor free lists.
      // These list are reprovisioned from the global free list outside the
      // critical INFLATING...ST interval.  A thread can transfer
      // multiple objectmonitors en-mass from the global free list to its local free list.
      // This reduces coherency traffic and lock contention on the global free list.
      // Using such local free lists, it doesn't matter if the omAlloc() call appears
      // before or after the CAS(INFLATING) operation.
      // See the comments in omAlloc().

      if (mark->has_locker()) {
          ObjectMonitor * m = omAlloc (Self) ;
          // Optimistically prepare the objectmonitor - anticipate successful CAS
          // We do this before the CAS in order to minimize the length of time
          // in which INFLATING appears in the mark.
          m->Recycle();
          m->_Responsible  = NULL ;
          m->OwnerIsThread = 0 ;
          m->_recursions   = 0 ;
          m->_SpinDuration = ObjectMonitor::Knob_SpinLimit ;   // Consider: maintain by type/class

          markOop cmp = (markOop) Atomic::cmpxchg_ptr (markOopDesc::INFLATING(), object->mark_addr(), mark) ;
          if (cmp != mark) {
             omRelease (Self, m, true) ;
             continue ;       // Interference -- just retry
          }

          // We've successfully installed INFLATING (0) into the mark-word.
          // This is the only case where 0 will appear in a mark-work.
          // Only the singular thread that successfully swings the mark-word
          // to 0 can perform (or more precisely, complete) inflation.
          //
          // Why do we CAS a 0 into the mark-word instead of just CASing the
          // mark-word from the stack-locked value directly to the new inflated state?
          // Consider what happens when a thread unlocks a stack-locked object.
          // It attempts to use CAS to swing the displaced header value from the
          // on-stack basiclock back into the object header.  Recall also that the
          // header value (hashcode, etc) can reside in (a) the object header, or
          // (b) a displaced header associated with the stack-lock, or (c) a displaced
          // header in an objectMonitor.  The inflate() routine must copy the header
          // value from the basiclock on the owner's stack to the objectMonitor, all
          // the while preserving the hashCode stability invariants.  If the owner
          // decides to release the lock while the value is 0, the unlock will fail
          // and control will eventually pass from slow_exit() to inflate.  The owner
          // will then spin, waiting for the 0 value to disappear.   Put another way,
          // the 0 causes the owner to stall if the owner happens to try to
          // drop the lock (restoring the header from the basiclock to the object)
          // while inflation is in-progress.  This protocol avoids races that might
          // would otherwise permit hashCode values to change or "flicker" for an object.
          // Critically, while object->mark is 0 mark->displaced_mark_helper() is stable.
          // 0 serves as a "BUSY" inflate-in-progress indicator.


          // fetch the displaced mark from the owner's stack.
          // The owner can't die or unwind past the lock while our INFLATING
          // object is in the mark.  Furthermore the owner can't complete
          // an unlock on the object, either.
          markOop dmw = mark->displaced_mark_helper() ;
          assert (dmw->is_neutral(), "invariant") ;

          // Setup monitor fields to proper values -- prepare the monitor
          m->set_header(dmw) ;

          // Optimization: if the mark->locker stack address is associated
          // with this thread we could simply set m->_owner = Self and
          // m->OwnerIsThread = 1. Note that a thread can inflate an object
          // that it has stack-locked -- as might happen in wait() -- directly
          // with CAS.  That is, we can avoid the xchg-NULL .... ST idiom.
          m->set_owner(mark->locker());
          m->set_object(object);
          // TODO-FIXME: assert BasicLock->dhw != 0.

          // Must preserve store ordering. The monitor state must
          // be stable at the time of publishing the monitor address.
          guarantee (object->mark() == markOopDesc::INFLATING(), "invariant") ;
          object->release_set_mark(markOopDesc::encode(m));

          // Hopefully the performance counters are allocated on distinct cache lines
          // to avoid false sharing on MP systems ...
          if (ObjectMonitor::_sync_Inflations != NULL) ObjectMonitor::_sync_Inflations->inc() ;
          TEVENT(Inflate: overwrite stacklock) ;
          if (TraceMonitorInflation) {
            if (object->is_instance()) {
              ResourceMark rm;
              tty->print_cr("Inflating object " INTPTR_FORMAT " , mark " INTPTR_FORMAT " , type %s",
                (void *) object, (intptr_t) object->mark(),
                object->klass()->external_name());
            }
          }
          return m ;
      }

      // CASE: neutral
      // TODO-FIXME: for entry we currently inflate and then try to CAS _owner.
      // If we know we're inflating for entry it's better to inflate by swinging a
      // pre-locked objectMonitor pointer into the object header.   A successful
      // CAS inflates the object *and* confers ownership to the inflating thread.
      // In the current implementation we use a 2-step mechanism where we CAS()
      // to inflate and then CAS() again to try to swing _owner from NULL to Self.
      // An inflateTry() method that we could call from fast_enter() and slow_enter()
      // would be useful.

      assert (mark->is_neutral(), "invariant");
      ObjectMonitor * m = omAlloc (Self) ;
      // prepare m for installation - set monitor to initial state
      m->Recycle();
      m->set_header(mark);
      m->set_owner(NULL);
      m->set_object(object);
      m->OwnerIsThread = 1 ;
      m->_recursions   = 0 ;
      m->_Responsible  = NULL ;
      m->_SpinDuration = ObjectMonitor::Knob_SpinLimit ;       // consider: keep metastats by type/class

      if (Atomic::cmpxchg_ptr (markOopDesc::encode(m), object->mark_addr(), mark) != mark) {
          m->set_object (NULL) ;
          m->set_owner  (NULL) ;
          m->OwnerIsThread = 0 ;
          m->Recycle() ;
          omRelease (Self, m, true) ;
          m = NULL ;
          continue ;
          // interference - the markword changed - just retry.
          // The state-transitions are one-way, so there's no chance of
          // live-lock -- "Inflated" is an absorbing state.
      }

      // Hopefully the performance counters are allocated on distinct
      // cache lines to avoid false sharing on MP systems ...
      if (ObjectMonitor::_sync_Inflations != NULL) ObjectMonitor::_sync_Inflations->inc() ;
      TEVENT(Inflate: overwrite neutral) ;
      if (TraceMonitorInflation) {
        if (object->is_instance()) {
          ResourceMark rm;
          tty->print_cr("Inflating object " INTPTR_FORMAT " , mark " INTPTR_FORMAT " , type %s",
            (void *) object, (intptr_t) object->mark(),
            object->klass()->external_name());
        }
      }
      return m ;
  }
}

3.3、锁升级过程

重要的事情又来了，又到了反复品本文2.3章内容的时刻！！！

锁升级过程，可以总结为：无锁 -> 偏向锁 -> 轻量级锁（自旋锁，自适应自旋）-> 重量级锁。且只可正向膨胀升级，不存在降级。

对象初始化后，处于无锁状态
当存在一个线程A来获取锁，锁对象第一次被获取使用时，进入偏向锁模式，且可重入。当满足一些苛刻的条件时，如果存在另外一个线程B来获取锁时，偏向锁可被B线程CAS获取到，并替换markword中的线程ID相关信息。
若竞争偏向锁失败，则会升级为轻量级锁（又叫自旋锁、堆栈锁），在升级过程中也采用CAS操作。若首次CAS获取或竞争轻量级锁失败，则会采用spin自旋的方式，自旋N次，重复尝试。自旋也又固定的次数，逐渐优化为更为智能的自适应自旋重试。
若经过自旋，依然无法获取到锁，表明锁竞争较为激烈，CAS自旋较为消耗CPU资源，直接膨胀升级为重量级锁。

超有用的总结：重量级锁，会直接向操作系统申请资源，将等待线程挂起，进入锁池队列阻塞等待，等待操作系统的调度。其余的偏向锁和轻量级锁，本质上并未交由操作系统调度，依然处于用户态，依然消耗CPU资源，只是采用CAS无锁竞争的方式获取锁。CAS又是Java通过Unsafe类中compareAndSwap方法，jni调用jvm中的C++方法，最终通过下述汇编指令锁住cpu中的北桥信号（非锁住总线，锁住总线就什么都干不了了）实现。

lock cmpxchg 指令

3.4、锁消除

引用《深入理解Java虚拟机》第三版对锁消除的一段介绍：

锁消除是指虚拟机即时编译器在运行时，对一些代码要求同步，但是对被检测到不可能存在共享数据竞争的锁进行消除。锁消除的主要判定依据来源于逃逸分析的数据支持，如果判断到一段代码中，在堆上的所有数据都不会逃逸出去被其他线程调用，那就可以把它们当作栈上数据对待，认为它们是线程私有的，同步加锁自然无须再进行。

比如下面一段代码：

     public static String concatString(String str1, String str2, String str3) {
        StringBuffer sb = new StringBuffer();
        sb.append(str1).append(str2).append(str3);
        return sb.toString();
    }

大家都熟知StringBuffer是一个线程安全的字符串拼接类，它的每个方法都加了synchronized关键字，每个方法都需要获取锁才能执行，锁对象就是StringBuffer的实例化对象。上述代码中，锁对象就是sb实例对象，经过虚拟机的逃逸分析后会发现sb对象的作用域仅仅被局限在concatString方法内部，根本不会被外部方法使用或调用。因此，其他线程完全没有机会访问到它，也不会产生资源竞争的同步问题。在解释执行时，这里仍然会加锁，在经过服务端编译器的即时编译后（因为逃逸分析是属于即时编译器的优化技术），这段代码就会忽略所有的同步措施而直接执行。

3.5、锁粗化

Show Code:

 public static String testLockCoarsenin(String str) {
        StringBuffer sb = new StringBuffer();
               for(int i = 0; i < 100; i++){
          sb.append(str1);
        }
        return sb.toString();
  }

比如上述代码，append方法需要获取锁，在未优化的情况下，循环调用100次，则需要获取锁和释放锁各100次，相当浪费资源。JVM 会检测到这样一连串的操作都对同一个对象加锁，将会把加锁同步的范围粗化到整个操作序列的外部（如循环体外部），使得一连串的操作只需要加一次锁即可。

3.6、即时编译器的锁优化`拓展了解`

目前主流的Java虚拟机，如我们最常使用的HotSpot虚拟机采用的是：解释器和编译器并存的架构。Java程序最初通过解释器进行解释执行的，当虚拟机发现某个方法或代码块运行很频繁，就会把这些代码认定为热点代码，并通过编译器即时将热点代码编译成本地机器码，并以各种手段尽可能地优化代码，以提高执行效率。

上述这种解释器和编译器并存的架构使解释器和编译器优势互补：当程序需要迅速启动或执行时，解释器首先介入，省去编译时间；当程序启动后，编译器逐渐发挥作用，把更多的代码编译成本地代码，提高执行效率。

继续Show Code:

public class Demo {
    static volatile int i = 0;
    static volatile int j = 0;

    public static void n() {
        i++;
    }

    public static synchronized void m() {
        j++;
    }

    public static void main(String[] args) throws InterruptedException {
        for (int j = 0; j < 100_0000; j++) {
            m();
            n();
        }
        System.out.println(i);
        System.out.println(j);
    }
}

执行main方法时，加上以下JVM参数（打开诊断模式，打印汇编代码）：-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly 打印汇编代码；或使用-server -XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+PrintAssembly -XX:+LogCompilation -XX:LogFile=TestSynchronizedAssembly.log以log的形式输出到文件，使用jitwatch等工具查看汇编代码。

会看到m和n方法的C1 Compile Level 1 (C1编译器优化)和C2 Compile Level 1 (C2编译器优化)内容。里面都会有lock comxchg .....指令，也就是我们重复执行100万次的m和n方法成为热点代码，经过了两级编译器的优化编译，将较为耗时的synchronized加锁和释放锁操作，优化成了在此处更为合理的底层cas操作，并使用lock指令修饰的同步措施。

注：并非所有的synchronized经过被编译优化为lock comxchg ...指令，不同代码有不同的优化方式，千万、千万不要认为synchronized的底层实现是lock comxchg ...指令。这里只是拿上述代码进行的举例。

如果大家的**-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly** 指令无法正常使用，是因为缺少hsdis的配置，请自行百度或参考《深入理解Java虚拟机》第三版的第11.2.4章。hsdis和强大的jitwatch的下载和安装参考文章：https://www.xuebuyuan.com/3192700.html，以及强大

如果大家对编译器工作内容和原理感兴趣，请自行百度或或参考《深入理解Java虚拟机》第三版的第10章和第11章。

我对上述的底层原理也停留在“纸老虎”阶段，如有理解或表述误差，还请斧正或探讨。

手机扫一扫

移动阅读更方便