当前位置：首页 > news >正文

安徽网站建设公司济宁市城市建设投资中心网站

news 2026/5/5 8:52:45

安徽网站建设公司,济宁市城市建设投资中心网站,正鹏建设工程有限公司网站,广州外贸公司排名前十引入上一篇我们提到HashMap 是线程不安全的#xff0c;并推荐使用线程安全同时性能比较好的 ConcurrentHashMap。而在 Java 8 中#xff0c;对于 ConcurrentHashMap 这个常用的工具类进行了很大的升级#xff0c;对比之前 Java 7 版本在诸多方面都进行了调整和变化。不过…引入上一篇我们提到HashMap 是线程不安全的并推荐使用线程安全同时性能比较好的 ConcurrentHashMap。而在 Java 8 中对于 ConcurrentHashMap 这个常用的工具类进行了很大的升级对比之前 Java 7 版本在诸多方面都进行了调整和变化。不过在 Java 7 中的 Segment 的设计思想依然具有参考和学习的价值。所以在很多情况下面试官都会问ConcurrentHashMap 在 Java 7 和 Java 8 中的结构分别是什么它们有什么相同点和不同点所以今天我们就对 ConcurrentHashMap 在这两个版本的特点和性质进行对比和介绍。 Java 7 版本的 ConcurrentHashMap 我们首先来看一下 Java 7 版本中的 ConcurrentHashMap 的结构。在 ConcurrentHashMap 内部进行了 Segment 分段Segment 继承了ReentrantLock可以理解为一把锁各个 Segment 之间都是相互独立上锁的互不影响。相比于之前的 Hashtable 每次操作都需要把整个对象锁住而言大大提高了并发效率。因为它的锁与锁之间是独立的而不是整个对象只有一把锁。每个 Segment 的底层数据结构与 HashMap 类似仍然是数组和链表组成的拉链法结构。默认有 0~15 共 16 个 Segment所以最多可以同时支持 16 个线程并发操作操作分别分布在不同的 Segment上。16 这个默认值可以在初始化的时候设置为其他值但是一旦确认初始化以后是不可以扩容的。 Java 8 版本的 ConcurrentHashMap 在 Java 8 中几乎完全重写了 ConcurrentHashMap代码量从原来 Java 7 中的 1000 多行变成了现在的 6000 多行所以也大大提高了源码的阅读难度。而为了方便我们理解我们还是先解释一下总体的设计思路然后再去深入细节。 ConcurrentHashMap的节点有三种类型。第一种是最简单的空着的位置代表当前还没有元素来填充。第二种就是和 HashMap 非常类似的拉链法结构在每一个槽中会首先填入第一个节点但是后续如果计算出相同的 Hash 值就用链表的形式往后进行延伸。第三种结构就是红黑树结构这是 Java 7 的 ConcurrentHashMap 中所没有的结构在此之前我们可能也很少接触这样的数据结构。当第二种情况的链表长度大于某一个阈值默认为 8且同时满足一定的容量要求的时候ConcurrentHashMap 便会把这个链表从链表的形式转化为红黑树的形式目的是进一步提高它的查找性能。所以Java 8 的一个重要变化就是引入了红黑树的设计由于红黑树并不是一种常见的数据结构所以我们在此简要介绍一下红黑树的特点。红黑树是每个节点都带有颜色属性的二叉查找树颜色为红色或黑色红黑树的本质是对二叉查找树BST 的一种平衡策略我们可以理解为是一种平衡二叉查找树查找效率高会自动平衡防止极端不平衡从而影响查找效率的情况发生。由于自平衡的特点即左右子树高度几乎一致所以其查找性能近似于二分查找时间复杂度是O(log(n)) 级别反观链表它的时间复杂度就不一样了如果发生了最坏的情况可能需要遍历整个链表才能找到目标元素时间复杂度为 O(n)远远大于红黑树的 O(log(n))尤其是在节点越来越多的情况下O(log(n)) 体现出的优势会更加明显。红黑树的一些其他特点每个节点要么是红色要么是黑色但根节点永远是黑色的。红色节点不能连续也就是说红色节点的子和父都不能是红色的。从任一节点到其每个叶子节点的路径都包含相同数量的黑色节点。正是由于这些规则和要求的限制红黑树保证了较高的查找效率所以现在就可以理解为什么 Java 8 的ConcurrentHashMap 要引入红黑树了。好处就是避免在极端的情况下冲突链表变得很长在查询的时候效率会非常慢。而红黑树具有自平衡的特点所以即便是极端情况下也可以保证查询效率在O(log(n))。分析 Java 8 版本的 ConcurrentHashMap 的重要源码前面我们讲解了 Java 7 和 Java 8 中 ConcurrentHashMap 的主体结构下面我们深入源码分析。由于Java 7 版本已经过时了所以我们把重点放在 Java 8 版本的源码分析上。对Java7实现感兴趣的小伙伴可以自行阅读。核心注释老样子我们先来看看对应源码注释 A hash table supporting full concurrency of retrievals and high expected concurrency for updates. This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access. This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details. Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.) For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries. Similarly, Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/ enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time. Bear in mind that the results of aggregate status methods including size, isEmpty, and containsValue are typically useful only when a map is not undergoing concurrent updates in other threads. Otherwise the results of these methods reflect transient states that may be adequate for monitoring or estimation purposes, but not for program control. The table is dynamically expanded when there are too many collisions (i. e., keys that have distinct hash codes but fall into the same slot modulo the table size), with the expected average effect of maintaining roughly two bins per mapping (corresponding to a 0.75 load factor threshold for resizing). There may be much variance around this average as mappings are added and removed, but overall, this maintains a commonly accepted time/ space tradeoff for hash tables. However, resizing this or any other kind of hash table may be a relatively slow operation. When possible, it is a good idea to provide a size estimate as an optional initialCapacity constructor argument. An additional optional loadFactor constructor argument provides a further means of customizing initial table capacity by specifying the table density to be used in calculating the amount of space to allocate for the given number of elements. Also, for compatibility with previous versions of this class, constructors may optionally specify an expected concurrencyLevel as an additional hint for internal sizing. Note that using many keys with exactly the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties. A Set projection of a ConcurrentHashMap may be created (using newKeySet() or newKeySet(int)), or viewed (using keySet(Object) when only keys are of interest, and the mapped values are (perhaps transiently) not used or all take the same mapping value. A ConcurrentHashMap can be used as scalable frequency map (a form of histogram or multiset) by using java. util. concurrent. atomic. LongAdder values and initializing via computeIfAbsent. For example, to add a count to a ConcurrentHashMapString,LongAdder freqs, you can use freqs. computeIfAbsent(k - new LongAdder()).increment(); This class and its views and iterators implement all of the optional methods of the Map and Iterator interfaces. Like Hashtable but unlike HashMap, this class does not allow null to be used as a key or value. ConcurrentHashMaps support a set of sequential and parallel bulk operations that, unlike most Stream methods, are designed to be safely, and often sensibly, applied even with maps that are being concurrently updated by other threads; for example, when computing a snapshot summary of the values in a shared registry. There are three kinds of operation, each with four forms, accepting functions with Keys, Values, Entries, and (Key, Value) arguments and/ or return values. Because the elements of a ConcurrentHashMap are not ordered in any particular way, and may be processed in different orders in different parallel executions, the correctness of supplied functions should not depend on any ordering, or on any other objects or values that may transiently change while computation is in progress; and except for forEach actions, should ideally be side-effect-free. Bulk operations on Map. Entry objects do not support method setValue. forEach: Perform a given action on each element. A variant form applies a given transformation on each element before performing the action. search: Return the first available non-null result of applying a given function on each element; skipping further search when a result is found. reduce: Accumulate each element. The supplied reduction function cannot rely on ordering (more formally, it should be both associative and commutative). There are five variants: Plain reductions. (There is not a form of this method for (key, value) function arguments since there is no corresponding return type.) Mapped reductions that accumulate the results of a given function applied to each element. Reductions to scalar doubles, longs, and ints, using a given basis value. These bulk operations accept a parallelismThreshold argument. Methods proceed sequentially if the current map size is estimated to be less than the given threshold. Using a value of Long. MAX_VALUE suppresses all parallelism. Using a value of 1 results in maximal parallelism by partitioning into enough subtasks to fully utilize the ForkJoinPool. commonPool() that is used for all parallel computations. Normally, you would initially choose one of these extreme values, and then measure performance of using in-between values that trade off overhead versus throughput. The concurrency properties of bulk operations follow from those of ConcurrentHashMap: Any non-null result returned from get(key) and related access methods bears a happens-before relation with the associated insertion or update. The result of any bulk operation reflects the composition of these per-element relations (but is not necessarily atomic with respect to the map as a whole unless it is somehow known to be quiescent). Conversely, because keys and values in the map are never null, null serves as a reliable atomic indicator of the current lack of any result. To maintain this property, null serves as an implicit basis for all non-scalar reduction operations. For the double, long, and int versions, the basis should be one that, when combined with any other value, returns that other value (more formally, it should be the identity element for the reduction). Most common reductions have these properties; for example, computing a sum with basis 0 or a minimum with basis MAX_VALUE. Search and transformation functions provided as arguments should similarly return null to indicate the lack of any result (in which case it is not used). In the case of mapped reductions, this also enables transformations to serve as filters, returning null (or, in the case of primitive specializations, the identity basis) if the element should not be combined. You can create compound transformations and filterings by composing them yourself under this null means there is nothing there now rule before using them in search or reduce operations. Methods accepting and/ or returning Entry arguments maintain key-value associations. They may be useful for example when finding the key for the greatest value. Note that plain Entry arguments can be supplied using new AbstractMap. SimpleEntry(k,v). Bulk operations may complete abruptly, throwing an exception encountered in the application of a supplied function. Bear in mind when handling such exceptions that other concurrently executing functions could also have thrown exceptions, or would have done so if the first exception had not occurred. Speedups for parallel compared to sequential forms are common but not guaranteed. Parallel operations involving brief functions on small maps may execute more slowly than sequential forms if the underlying work to parallelize the computation is more expensive than the computation itself. Similarly, parallelization may not lead to much actual parallelism if all processors are busy performing unrelated tasks. All arguments to all task methods must be non-null. This class is a member of the Java Collections Framework. 翻译一种支持检索完全并发且预期更新并发度很高的哈希表。此类遵循与Hashtable相同的功能规范并且包含了对应Hashtable每个方法的方法版本。然而尽管所有操作都是线程安全的但检索操作并不涉及锁定也没有任何方式支持锁定整个表以阻止所有访问。这个类与Hashtable在依赖其线程安全性但不依赖其同步细节的程序中是完全互操作的。检索操作包括get通常不会阻塞因此可能与更新操作包括put和remove重叠。检索反映的是在其开始时最已完成的更新操作的结果。更正式地说给定键的更新操作与该键的任何非空检索操作之间存在先行发生关系检索操作报告更新后的值。对于诸如putAll和clear之类的聚合操作并发检索可能仅反映部分条目的插入或删除。同样地迭代器、拆分迭代器和枚举返回的元素反映了在迭代器/枚举创建时或之后的某个时刻哈希表的状态。它们不会抛出ConcurrentModificationException。但是迭代器被设计为一次仅由一个线程使用。请注意诸如size、isEmpty和containsValue之类的聚合状态方法的结果通常只有在映射表未被其他线程并发更新时才有用。否则这些方法的结果反映了瞬态状态可能足以用于监视或估算目的但不适合用于程序控制。当冲突过多即具有不同哈希码但在模表大小下落在同一槽中的键时表会动态扩展其预期平均效果是保持每个映射大约两个桶对应于0.75的负载因子阈值以触发扩展。随着映射的添加和删除这个平均值可能会有很大变化但总体而言这保持了哈希表常用的时间/空间权衡。然而调整此表或任何其他类型哈希表的大小可能是一个相对缓慢的操作。如果可能最好将大小估计值作为可选的initialCapacity构造函数参数提供。另一个可选的loadFactor构造函数参数通过指定表密度来进一步自定义初始表容量以计算给定元素数量的空间分配。此外为了与该类的早期版本兼容构造函数可以可选地指定expectedConcurrencyLevel作为内部大小调整的额外提示。请注意使用许多具有完全相同hashCode()的键是降低任何哈希表示现的可靠方法。为了减轻影响当键是可比较的时此类可能会利用键之间的比较顺序来帮助打破平局。可以通过newKeySet()或newKeySet(int)创建ConcurrentHashMap的Set投影或者在只关心键且映射的值不被使用或都取相同映射值的情况下使用keySet(Object)来查看。ConcurrentHashMap可以通过使用java.util.concurrent.atomic.LongAdder值并通过computeIfAbsent进行初始化用作可扩展的频率映射直方图或多重集的一种形式。例如要向ConcurrentHashMapString,LongAdder freqs中添加计数可以使用freqs.computeIfAbsent(k - new LongAdder()).increment(); 该类及其视图和迭代器实现了Map和Iterator接口的所有可选方法。与Hashtable类似但与HashMap不同该类不允许将null用作键或值。 ConcurrentHashMap支持一系列顺序和并行的批量操作这些操作与大多数流方法不同被设计为可以安全且通常合理地应用于即使被其他线程并发更新的映射表例如当计算共享注册表中值的快照摘要时。有三种操作每种操作有四种形式接受处理键、值、条目以及键值参数和/或返回值的函数。由于ConcurrentHashMap的元素没有特定的顺序并且在不同的并行执行中可能以不同的顺序处理因此提供的函数的正确性不应依赖于任何顺序或在计算进行时可能暂时更改的其他对象或值并且除了forEach操作外最好是没有副作用的。对映射条目的批量操作不支持setValue方法。 forEach:对每个元素执行给定操作。一种变体形式在执行操作之前对每个元素应用给定转换。 search:返回对每个元素应用给定函数得到的第一个非空结果找到结果后停止进一步搜索。 reduce:累积每个元素。提供的归约函数不能依赖顺序更正式地说它应该是结合的和可交换的。有五种变体纯归约。由于没有对应的返回类型因此没有一种形式的方法适用于键值函数参数。映射归约累积对每个元素应用给定函数的结果。使用给定基础值归约到标量双精度浮点数、长整数和整数。这些批量操作接受一个parallelismThreshold参数。如果当前映射大小估计小于给定阈值则方法按顺序进行。使用Long.MAX_VALUE作为值会抑制所有并行性。使用1作为值会通过将任务划分为足够的子任务以充分利用ForkJoinPool.commonPool()来实现最大并行性该池用于所有并行计算。通常最初会选择这些极端值之一然后测量使用介于两者之间的值的性能这些值在开销与吞吐量之间进行权衡。批量操作的并发属性源于ConcurrentHashMap的并发属性从get(key)和相关访问方法返回的任何非空结果与关联的插入或更新之间存在先行发生关系。任何批量操作的结果都反映了这些元素级关系的组合但不一定与映射整体原子相关除非映射以某种方式已知是静止的。相反由于映射中的键和值永远不会为null因此null可以作为所有非标量归约操作的可靠原子指示器表示当前没有结果。为了保持这一特性null作为所有非标量归约操作的隐式基础。对于双精度浮点数、长整数和整数版本基础值应该是与其他值结合时返回该值的更正式地说它应该是归约操作的恒等元。大多数常见归约具有这些特性例如使用基础值0计算总和或使用最大值计算最小值。提供的搜索和转换函数同样应返回null以表示没有结果在这种情况下不会使用该结果。在映射归约的情况下这还允许转换充当过滤器返回null或对于原始特化返回恒等基础如果元素不应被组合。可以通过在搜索或归约操作中使用“null表示现在那里没有东西”的规则来创建复合转换和过滤。接受和/或返回条目参数的方法保持键值关联。例如当查找最大值对应的键时它们可能很有用。注意“纯”条目参数可以使用new AbstractMap.SimpleEntry(k,v)提供。批量操作可能会突然中止抛出在应用提供的函数时遇到的异常。在处理此类异常时请注意其他并发执行的函数也可能抛出异常或者如果第一个异常没有发生它们也会抛出异常。对于小映射上使用简短函数的并行操作与顺序形式相比加速是常见的但不能保证。如果底层并行化计算的工作量比计算本身更昂贵则在小映射上使用简短函数的并行操作可能比顺序形式执行得更慢。同样地如果所有处理器都在执行不相关任务可能不会导致实际的并行性。所有任务方法的所有参数都必须是非空的。该类是Java Collections Framework的成员。除了这个注释以外在类的源代码里还有一个注释很有用我们一起看看 Overview: The primary design goal of this hash table is to maintain concurrent readability (typically method get(), but also iterators and related methods) while minimizing update contention. Secondary goals are to keep space consumption about the same or better than java.util.HashMap, and to support high initial insertion rates on an empty table by many threads. This map usually acts as a binned (bucketed) hash table. Each key-value mapping is held in a Node. Most nodes are instances of the basic Node class with hash, key, value, and next fields. However, various subclasses exist: TreeNodes are arranged in balanced trees, not lists. TreeBins hold the roots of sets of TreeNodes. ForwardingNodes are placed at the heads of bins during resizing. ReservationNodes are used as placeholders while establishing values in computeIfAbsent and related methods. The types TreeBin, ForwardingNode, and ReservationNode do not hold normal user keys, values, or hashes, and are readily distinguishable during search etc because they have negative hash fields and null key and value fields. (These special nodes are either uncommon or transient, so the impact of carrying around some unused fields is insignificant.) The table is lazily initialized to a power-of-two size upon the first insertion. Each bin in the table normally contains a list of Nodes (most often, the list has only zero or one Node). Table accesses require volatile/atomic reads, writes, and CASes. Because there is no other way to arrange this without adding further indirections, we use intrinsics (sun.misc.Unsafe) operations. We use the top (sign) bit of Node hash fields for control purposes -- it is available anyway because of addressing constraints. Nodes with negative hash fields are specially handled or ignored in map methods. Insertion (via put or its variants) of the first node in an empty bin is performed by just CASing it to the bin. This is by far the most common case for put operations under most key/hash distributions. Other update operations (insert, delete, and replace) require locks. We do not want to waste the space required to associate a distinct lock object with each bin, so instead use the first node of a bin list itself as a lock. Locking support for these locks relies on builtin synchronized monitors. Using the first node of a list as a lock does not by itself suffice though: When a node is locked, any update must first validate that it is still the first node after locking it, and retry if not. Because new nodes are always appended to lists, once a node is first in a bin, it remains first until deleted or the bin becomes invalidated (upon resizing). The main disadvantage of per-bin locks is that other update operations on other nodes in a bin list protected by the same lock can stall, for example when user equals() or mapping functions take a long time. However, statistically, under random hash codes, this is not a common problem. Ideally, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average, given the resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The first values are: 0: 0.60653066 1: 0.30326533 2: 0.07581633 3: 0.01263606 4: 0.00157952 5: 0.00015795 6: 0.00001316 7: 0.00000094 8: 0.00000006 more: less than 1 in ten million Lock contention probability for two threads accessing distinct elements is roughly 1 / (8 * #elements) under random hashes. Actual hash code distributions encountered in practice sometimes deviate significantly from uniform randomness. This includes the case when N (130), so some keys MUST collide. Similarly for dumb or hostile usages in which multiple keys are designed to have identical hash codes or ones that differs only in masked-out high bits. So we use a secondary strategy that applies when the number of nodes in a bin exceeds a threshold. These TreeBins use a balanced tree to hold nodes (a specialized form of red-black trees), bounding search time to O(log N). Each search step in a TreeBin is at least twice as slow as in a regular list, but given that N cannot exceed (164) (before running out of addresses) this bounds search steps, lock hold times, etc, to reasonable constants (roughly 100 nodes inspected per operation worst case) so long as keys are Comparable (which is very common -- String, Long, etc). TreeBin nodes (TreeNodes) also maintain the same next traversal pointers as regular nodes, so can be traversed in iterators in the same way. The table is resized when occupancy exceeds a percentage threshold (nominally, 0.75, but see below). Any thread noticing an overfull bin may assist in resizing after the initiating thread allocates and sets up the replacement array. However, rather than stalling, these other threads may proceed with insertions etc. The use of TreeBins shields us from the worst case effects of overfilling while resizes are in progress. Resizing proceeds by transferring bins, one by one, from the table to the next table. However, threads claim small blocks of indices to transfer (via field transferIndex) before doing so, reducing contention. A generation stamp in field sizeCtl ensures that resizings do not overlap. Because we are using power-of-two expansion, the elements from each bin must either stay at same index, or move with a power of two offset. We eliminate unnecessary node creation by catching cases where old nodes can be reused because their next fields wont change. On average, only about one-sixth of them need cloning when a table doubles. The nodes they replace will be garbage collectable as soon as they are no longer referenced by any reader thread that may be in the midst of concurrently traversing table. Upon transfer, the old table bin contains only a special forwarding node (with hash field MOVED) that contains the next table as its key. On encountering a forwarding node, access and update operations restart, using the new table. Each bin transfer requires its bin lock, which can stall waiting for locks while resizing. However, because other threads can join in and help resize rather than contend for locks, average aggregate waits become shorter as resizing progresses. The transfer operation must also ensure that all accessible bins in both the old and new table are usable by any traversal. This is arranged in part by proceeding from the last bin (table.length - 1) up towards the first. Upon seeing a forwarding node, traversals (see class Traverser) arrange to move to the new table without revisiting nodes. To ensure that no intervening nodes are skipped even when moved out of order, a stack (see class TableStack) is created on first encounter of a forwarding node during a traversal, to maintain its place if later processing the current table. The need for these save/restore mechanics is relatively rare, but when one forwarding node is encountered, typically many more will be. So Traversers use a simple caching scheme to avoid creating so many new TableStack nodes. (Thanks to Peter Levart for suggesting use of a stack here.) The traversal scheme also applies to partial traversals of ranges of bins (via an alternate Traverser constructor) to support partitioned aggregate operations. Also, read-only operations give up if ever forwarded to a null table, which provides support for shutdown-style clearing, which is also not currently implemented. Lazy table initialization minimizes footprint until first use, and also avoids resizings when the first operation is from a putAll, constructor with map argument, or deserialization. These cases attempt to override the initial capacity settings, but harmlessly fail to take effect in cases of races. The element count is maintained using a specialization of LongAdder. We need to incorporate a specialization rather than just use a LongAdder in order to access implicit contention-sensing that leads to creation of multiple CounterCells. The counter mechanics avoid contention on updates but can encounter cache thrashing if read too frequently during concurrent access. To avoid reading so often, resizing under contention is attempted only upon adding to a bin already holding two or more nodes. Under uniform hash distributions, the probability of this occurring at threshold is around 13%, meaning that only about 1 in 8 puts check threshold (and after resizing, many fewer do so). TreeBins use a special form of comparison for search and related operations (which is the main reason we cannot use existing collections such as TreeMaps). TreeBins contain Comparable elements, but may contain others, as well as elements that are Comparable but not necessarily Comparable for the same T, so we cannot invoke compareTo among them. To handle this, the tree is ordered primarily by hash value, then by Comparable.compareTo order if applicable. On lookup at a node, if elements are not comparable or compare as 0 then both left and right children may need to be searched in the case of tied hash values. (This corresponds to the full list search that would be necessary if all elements were non-Comparable and had tied hashes.) On insertion, to keep a total ordering (or as close as is required here) across rebalancings, we compare classes and identityHashCodes as tie-breakers. The red-black balancing code is updated from pre-jdk-collections (http://gee.cs.oswego.edu/dl/classes/collections/RBCell.java) based in turn on Cormen, Leiserson, and Rivest Introduction to Algorithms (CLR). TreeBins also require an additional locking mechanism. While list traversal is always possible by readers even during updates, tree traversal is not, mainly because of tree-rotations that may change the root node and/or its linkages. TreeBins include a simple read-write lock mechanism parasitic on the main bin-synchronization strategy: Structural adjustments associated with an insertion or removal are already bin-locked (and so cannot conflict with other writers) but must wait for ongoing readers to finish. Since there can be only one such waiter, we use a simple scheme using a single waiter field to block writers. However, readers need never block. If the root lock is held, they proceed along the slow traversal path (via next-pointers) until the lock becomes available or the list is exhausted, whichever comes first. These cases are not fast, but maximize aggregate expected throughput. Maintaining API and serialization compatibility with previous versions of this class introduces several oddities. Mainly: We leave untouched but unused constructor arguments refering to concurrencyLevel. We accept a loadFactor constructor argument, but apply it only to initial table capacity (which is the only time that we can guarantee to honor it.) We also declare an unused Segment class that is instantiated in minimal form only when serializing. Also, solely for compatibility with previous versions of this class, it extends AbstractMap, even though all of its methods are overridden, so it is just useless baggage. This file is organized to make things a little easier to follow while reading than they might otherwise: First the main static declarations and utilities, then fields, then main public methods (with a few factorings of multiple public methods into internal ones), then sizing methods, trees, traversers, and bulk operations. 翻译概述此哈希表的主要设计目标是在保持并发可读性通常是get()方法还有迭代器和相关方法的同时最小化更新时的争用。次要目标是保持空间消耗与java.util.HashMap相同或更优并支持在空表上由多个线程进行高初始插入速率。此映射通常用作分桶基于桶的哈希表。每个键值对都存储在一个Node中。大多数节点是具有hash、key、value和next字段的基本Node类的实例。然而存在各种子类TreeNodes以平衡树而非链表的形式排列TreeBins保存TreeNode集合的根ForwardingNodes在扩展期间放置在桶的头部ReservationNodes用作在computeIfAbsent及相关方法中建立值的占位符。TreeBin、ForwardingNode和ReservationNode类型不保存普通的用户键、值或哈希且在搜索等操作中容易区分因为它们具有负哈希字段以及null键和值字段。这些特殊节点要么不常见要么是瞬态的因此携带一些未使用的字段的影响可以忽略不计。表在第一次插入时被延迟初始化为2的幂次大小。表中的每个桶通常包含一个Node列表大多数情况下列表只有零个或一个Node。表访问需要volatile/原子读、写和CAS操作。由于没有其他方法可以做到这一点而不添加进一步的间接层我们使用内在函数sun.misc.Unsafe操作。我们使用Node哈希字段的最高位符号位进行控制目的——由于地址约束该位无论如何都是可用的。具有负哈希字段的节点在映射方法中会受到特殊处理或被忽略。在空桶中插入通过put及其变体第一个节点是通过CAS将其设置到桶中完成的。在大多数键/哈希分布下这是put操作最常见的案例。其他更新操作插入、删除和替换需要锁。我们不想浪费为每个桶关联一个独立锁对象所需的空间因此改用桶列表的第一个节点本身作为锁。这些锁的锁定支持依赖于内置的“同步”监视器。仅使用列表的第一个节点作为锁是不够的当一个节点被锁定时任何更新操作必须在锁定后首先验证它仍然是第一个节点如果不是则重试。因为新节点总是被追加到列表中所以一旦一个节点是桶中的第一个节点它将保持为第一个节点直到被删除或桶因扩展而失效。每个桶的锁的主要缺点是受同一个锁保护的桶列表中的其他节点上的其他更新操作可能会被阻塞例如当用户的equals()或映射函数花费很长时间时。然而从统计上看在随机哈希码下这不是一个常见的问题。理想情况下在平均参数约为0.5的情况下桶中节点的频率遵循泊松分布http://en.wikipedia.org/wiki/Poisson_distribution给定0.75的扩展阈值尽管由于扩展粒度较大方差较大。忽略方差大小为k的列表的预期出现次数为exp-0.5* pow0.5k/ factorialk。前几个值是 0: 0.60653066 1: 0.30326533 2: 0.07581633 3: 0.01263606 4: 0.00157952 5: 0.00015795 6: 0.00001316 7: 0.00000094 8: 0.00000006 更多不到一千万分之一在随机哈希下两个线程访问不同元素时的锁争用概率大约为1 /8 *元素数。实际遇到的哈希码分布有时会显著偏离均匀随机。这包括当N1 30时因此某些键必须冲突。同样对于设计为具有相同哈希码或仅在被屏蔽的高位上不同的键的愚蠢或恶意使用。因此当桶中的节点数超过阈值时我们采用第二种策略。这些TreeBins使用平衡树来保存节点一种特殊的红黑树将搜索时间限制为Olog N。TreeBin中的每个搜索步骤至少比普通列表慢两倍但由于N不能超过1 64在地址用完之前这将搜索步骤、锁保持时间等限制为合理的常数每个操作最坏情况下大约检查100个节点只要键是可比较的这非常常见——String、Long等。TreeBin节点TreeNodes还保持与普通节点相同的“next”遍历指针因此可以在迭代器中以相同的方式进行遍历。当占用率超过某个百分比阈值名义上为0.75但见下文时表将扩展。任何注意到桶过满的线程都可以在初始化线程分配并设置好替换数组后帮助扩展。然而这些其他线程不会停滞不前而是可以继续进行插入等操作。TreeBins的使用使我们在扩展进行时免受最坏情况的影响。扩展通过将桶一个接一个地从表转移到新表来完成。然而线程通过字段transferIndex声明要转移的小块索引来减少争用。字段sizeCtl中的代戳确保扩展不会重叠。由于我们使用的是2的幂次扩展因此每个桶的元素必须保持在相同的索引或以2的幂次偏移。我们通过捕获旧节点可以重用的情况因为它们的next字段不会改变来避免不必要的节点创建。平均而言当表大小翻倍时只有大约六分之一的节点需要克隆。它们替换的节点将在不再被任何可能在并发遍历表的读者线程引用后被垃圾回收。转移后旧表桶中只有一个特殊的转发节点其哈希字段为“MOVED”其键包含新表。遇到转发节点时访问和更新操作将重新启动使用新表。每个桶转移需要其桶锁这在扩展期间可能会因等待锁而停滞。然而由于其他线程可以加入并帮助扩展而不是争夺锁随着扩展的进行平均总等待时间会变得更短。转移操作还必须确保旧表和新表中所有可访问的桶都可以被任何遍历使用。这部分通过从最后一个桶table.length -1开始一直向上到第一个桶来完成。遇到转发节点时遍历见Traverser类将安排移动到新表而不重新访问节点。为了确保即使顺序移动也不会跳过任何中间节点将在遍历期间首次遇到转发节点时创建一个堆栈见TableStack类以保持其位置如果稍后处理当前表的话。Traversal堆栈的保存/恢复机制的需求相对较少但一旦遇到一个转发节点通常还会遇到许多其他转发节点。因此Traversal堆栈采用简单的缓存方案以避免创建过多的新TableStack节点。感谢Peter Levart建议在这里使用堆栈。遍历方案也适用于range范围的桶的部分遍历通过Traversal构造函数的替代方案以支持分区聚合操作。此外只读操作如果被转发到null表则会放弃这为关闭风格的清理提供了支持但目前尚未实现。延迟表初始化可以最小化首次使用前的占用空间并且还可以避免在第一次操作是putAll、带映射参数的构造函数或反序列化时进行扩展。这些情况会尝试覆盖初始容量设置但在竞争情况下会无害地失败。元素计数通过LongAdder的专用版本维护。我们需要使用专用版本而不是直接使用LongAdder以便访问隐式的竞争感知机制这将导致创建多个CounterCell。计数机制避免了更新时的竞争但如果在并发访问期间读取过于频繁可能会遇到缓存失效。为了避免如此频繁地读取在添加到已经包含两个或更多节点的桶时才会尝试进行扩展。在均匀哈希分布下此概率在阈值时约为13%这意味着只有大约八分之一的put会检查阈值并且在扩展后更少的put会这样做。 TreeBins在搜索和相关操作中使用特殊的比较形式这是不能使用现有集合如TreeMap的主要原因。TreeBins包含可比较的元素但也可能包含其他元素以及可比较但不一定对相同类型可比较的元素因此不能在它们之间调用compareTo。为此树主要按哈希值排序然后在适用时按Comparable.compareTo顺序排序。在节点处查找时如果元素不可比较或比较为0则在哈希值相同的情况下可能需要同时搜索左子节点和右子节点。这对应于当所有元素都不可比较且具有相同的哈希值时需要进行完整的列表搜索。插入时为了在重新平衡期间保持总顺序或在此处需要的顺序我们将类和identityHashCode作为决胜条件。红黑平衡代码基于CLR《算法导论》中的代码进行了更新CLR又基于Cormen、Leiserson和Rivest的《算法导论》。 TreeBins还需要额外的锁定机制。虽然在更新期间读者总是可以进行列表遍历但树遍历则不然主要是因为树旋转可能会更改根节点和/或其链接。TreeBins包含一个简单的读写锁机制该机制依赖于主桶同步策略与插入或删除相关的结构调整已经被桶锁定因此不会与其他写入者冲突但必须等待正在进行的读者完成。由于只能有一个这样的等待者我们使用一个简单的方案使用一个“等待者”字段来阻止写入者。然而读者永远不需要阻塞。如果持有根锁他们会沿着慢速遍历路径通过next指针前进直到锁可用或列表耗尽以先到者为准。这些情况并不快但最大化了总预期吞吐量。为了保持与此类早期版本的API和序列化兼容性引入了一些奇特之处。主要包括我们保留了未使用的构造函数参数concurrencyLevel。我们接受一个loadFactor构造函数参数但仅在初始表容量时应用它这是我们唯一可以保证遵守它的时候。我们还声明了一个未使用的“Segment”类仅在序列化时以最小形式实例化。此外仅为了与此类早期版本兼容它扩展了AbstractMap尽管所有方法都被覆盖因此它只是无用的包袱。此文件的结构使其在阅读时比其他方式更容易理解首先是主要的静态声明和工具然后是字段然后是主要的公共方法通过将多个公共方法分解为内部方法然后是大小调整方法、树、遍历器和批量操作。 Node 节点接下来我们看看最基础的内部存储结构 Node这就是一个一个的节点如这段代码所示 /*** 表示ConcurrentHashMap中的一个键值对节点。* 该类实现了Map.Entry接口用于存储键值对及其哈希值并且支持链表结构。* 它是ConcurrentHashMap的基本组成单元通常作为链表的节点存在。* 子类可以通过重写某些方法来实现不同的功能如TreeBin中的TreeNode。** param K 键的类型* param V 值的类型*/static class NodeK,V implements Map.EntryK,V {// 节点的哈希值用于确定节点在哈希表中的位置final int hash;// 节点的键不可变final K key;// 节点的值使用volatile关键字保证可见性volatile V val;// 指向下一个节点的引用用于链表结构volatile NodeK,V next;/*** 构造一个新的节点。** param hash 节点的哈希值* param key 节点的键* param val 节点的值* param next 指向下一个节点的引用*/Node(int hash, K key, V val, NodeK,V next) {this.hash hash;this.key key;this.val val;this.next next;}/*** 获取节点的键。** return 节点的键*/public final K getKey() { return key; }/*** 获取节点的值。** return 节点的值*/public final V getValue() { return val; }/*** 计算节点的哈希码。* 哈希码是通过键和值的哈希码异或运算得到的。** return 节点的哈希码*/public final int hashCode() { return key.hashCode() ^ val.hashCode(); }/*** 返回节点的字符串表示形式。* 格式为键值。** return 节点的字符串表示形式*/public final String toString(){ return key val; }/*** 尝试设置节点的值该操作不支持会抛出UnsupportedOperationException。* 因为Node类设计为只读不允许修改值。** param value 要设置的值* throws UnsupportedOperationException 总是抛出该异常*/public final V setValue(V value) {throw new UnsupportedOperationException();}/*** 比较两个节点是否相等。* 两个节点相等的条件是它们的键和值都相等。** param o 要比较的对象* return 如果对象是Map.Entry且键和值都相等则返回true否则返回false*/public final boolean equals(Object o) {Object k, v, u; Map.Entry?,? e;return ((o instanceof Map.Entry) (k (e (Map.Entry?,?)o).getKey()) ! null (v e.getValue()) ! null (k key || k.equals(key)) (v (u val) || v.equals(u)));}/*** 虚拟支持map.get()方法在子类中重写。* 该方法用于在链表中查找具有指定哈希值和键的节点。** param h 要查找的节点的哈希值* param k 要查找的节点的键* return 如果找到匹配的节点则返回该节点否则返回null*/NodeK,V find(int h, Object k) {// 从当前节点开始遍历NodeK,V e this;if (k ! null) {do {K ek;// 检查当前节点的哈希值和键是否匹配if (e.hash h ((ek e.key) k || (ek ! null k.equals(ek))))return e;// 移动到下一个节点} while ((e e.next) ! null);}return null;}} 可以看出每个 Node 里面是 key-value 的形式并且把 value 用 volatile 修饰以便保证可见性同时内部还有一个指向下一个节点的 next 指针方便产生链表结构。下面我们看两个最重要、最核心的方法。 put 方法源码分析我们之前看过HashMap的put方法源码并用它论证了HashMap是线程不安全的所以下面我们先来看ConcurrentHashMap的put方法源码 /*** 将指定的键值对插入到当前的并发哈希表中。如果键已经存在于表中则会更新该键对应的值。** param key 要插入的键不能为 null。* param value 要插入的值不能为 null。* return 如果键已经存在于表中则返回该键之前对应的值如果键不存在则返回 null。* throws NullPointerException 如果指定的键或值为 null。*/public V put(K key, V value) {return putVal(key, value, false);} 可以看到和HashMap类似核心处理逻辑都是在putVal方法中我们接着看putVal源码 /*** 实现 put 和 putIfAbsent 方法的核心逻辑。* 将指定的键值对插入到映射中如果键已经存在则根据 onlyIfAbsent 参数决定是否更新值。** param key 要插入的键不能为 null* param value 要插入的值不能为 null* param onlyIfAbsent 如果为 true则仅在键不存在时插入如果为 false则无论键是否存在都更新值* return 如果键已经存在则返回旧值否则返回 null* throws NullPointerException 如果键或值为 null*/final V putVal(K key, V value, boolean onlyIfAbsent) {// 检查键或值是否为 null如果是则抛出空指针异常if (key null || value null) throw new NullPointerException();// 计算键的哈希值并通过 spread 方法进行处理int hash spread(key.hashCode());// 记录当前桶中的节点数量int binCount 0;// 无限循环直到插入操作完成for (NodeK,V[] tab table;;) {NodeK,V f; int n, i, fh;// 如果表为空或长度为 0则初始化表if (tab null || (n tab.length) 0)tab initTable();// 如果计算得到的桶位置为空则尝试使用 CAS 操作插入新节点else if ((f tabAt(tab, i (n - 1) hash)) null) {if (casTabAt(tab, i, null,new NodeK,V(hash, key, value, null)))break; // 插入成功跳出循环}// 如果桶的头节点的哈希值为 MOVED表示正在进行扩容操作帮助进行扩容else if ((fh f.hash) MOVED)tab helpTransfer(tab, f);// 槽点上是有值的情况else {V oldVal null;// 对桶的头节点加锁确保线程安全synchronized (f) {// 再次检查桶的头节点是否发生变化if (tabAt(tab, i) f) {// 如果桶的头节点的哈希值大于等于 0表示是链表结构if (fh 0) {binCount 1;// 遍历链表for (NodeK,V e f;; binCount) {K ek;// 如果找到相同的键if (e.hash hash ((ek e.key) key ||(ek ! null key.equals(ek)))) {oldVal e.val;// 如果 onlyIfAbsent 为 false则更新值if (!onlyIfAbsent)e.val value;break;}NodeK,V pred e;//到了链表的尾部也没有发现该 key说明之前不存在就把新值添加到链表的最后if ((e e.next) null) {pred.next new NodeK,V(hash, key,value, null);break;}}}// 如果桶的头节点是 TreeBin 类型表示是红黑树结构else if (f instanceof TreeBin) {NodeK,V p;binCount 2;// 调用 TreeBin 的 putTreeVal 方法插入节点if ((p ((TreeBinK,V)f).putTreeVal(hash, key,value)) ! null) {oldVal p.val;// 如果 onlyIfAbsent 为 false则更新值if (!onlyIfAbsent)p.val value;}}}}// 如果桶中的节点数量不为 0if (binCount ! 0) {// 检查是否满足条件并把链表转换为红黑树的形式默认的 TREEIFY_THRESHOLD 阈值是 8if (binCount TREEIFY_THRESHOLD)treeifyBin(tab, i);// 如果旧值不为 null则返回旧值if (oldVal ! null)return oldVal;break;}}}// 更新计数器addCount(1L, binCount);return null;} 通过以上的源码分析我们对于 putVal 方法有了详细的认识可以看出方法中会逐步根据当前槽点是未初始化、空、扩容、链表、红黑树等不同情况做出不同的处理。 get 方法源码分析 get 方法比较简单我们同样用源码注释的方式来分析一下 /*** 返回指定键所映射的值如果此映射不包含该键的映射关系则返回 {code null}。** p更正式地说如果此映射包含从键 {code k} 到值 {code v} 的映射* 使得 {code key.equals(k)}则此方法返回 {code v}否则返回 {code null}。* 最多只能有一个这样的映射。** param key 要返回其关联值的键* return 指定键所映射的值如果此映射不包含该键的映射关系则返回 {code null}* throws NullPointerException 如果指定的键为 null*/public V get(Object key) {// 定义局部变量用于存储哈希表数组、节点、数组长度、节点哈希值和键NodeK,V[] tab; NodeK,V e, p; int n, eh; K ek;// 计算键的哈希值并进行扩展以减少哈希冲突int h spread(key.hashCode());// 如果整个数组是空的或者当前槽点的数据是空的说明 key 对应的 value 不存在直接返回 nullif ((tab table) ! null (n tab.length) 0 (e tabAt(tab, (n - 1) h)) ! null) {// 判断头结点是否就是我们需要的节点如果是则直接返回if ((eh e.hash) h) {// 检查第一个节点的键是否和传入的键相等if ((ek e.key) key || (ek ! null key.equals(ek)))// 相等则返回该节点的值return e.val;}// 如果头结点 hash 值小于 0说明是红黑树或者正在扩容就用对应的 find 方法来查找else if (eh 0)// 调用节点的 find 方法查找匹配的节点return (p e.find(h, key)) ! null ? p.val : null;// 遍历链表查找匹配的节点while ((e e.next) ! null) {if (e.hash h ((ek e.key) key || (ek ! null key.equals(ek))))// 找到匹配的节点返回其值return e.val;}}// 未找到匹配的节点返回 nullreturn null;} 总结一下 get 的过程计算 Hash 值并由此值找到对应的槽点如果数组是空的或者该位置为 null那么直接返回 null 就可以了如果该位置处的节点刚好就是我们需要的直接返回该节点的值如果该位置节点是红黑树或者正在扩容就用 find 方法继续查找否则那就是链表就进行遍历链表查找。对比Java7 和Java8 的异同 1. 数据结构与锁机制维度Java 7Java 8数据结构Segment 数组 HashEntry 链表Node 数组链表/红黑树锁粒度分段锁Segment 级别桶级别锁synchronized CAS线程安全实现ReentrantLock 锁住整个 SegmentCAS 无锁化插入 synchronized 锁头节点核心差异 Java 7采用 Segment 分段锁来保证安全而 Segment 是继承自 ReentrantLock。 Java 8放弃了 Segment 的设计采用 Node CAS synchronized 保证线程安全。 2. 并发性能优化 Java 7 中每个 Segment 独立加锁最大并发个数就是 Segment 的个数默认是 16。但是到了 Java 8 中锁粒度更细理想情况下 table 数组元素的个数也就是数组长度就是其支持并发的最大个数并发度比之前有提高。哈希冲突处理 Java 7仅支持链表极端情况下查询复杂度退化为 O(n)。 Java 8链表长度 ≥8 时转为红黑树树化阈值查询复杂度优化为 O(log n)。 Java 7 在 Hash 冲突时会使用拉链法也就是链表的形式。 Java 8 先使用拉链法在链表长度超过一定阈值时将链表转换为红黑树来提高查找效率。扩容机制 Java 7各 Segment 独立扩容易导致内存碎片化。 Java 8多线程协同扩容触发扩容的线程负责迁移其他线程协助迁移相邻桶。 3. 关键算法升级 Hash 算法优化 Java 7使用 Wang/Jenkins Hash 算法易产生哈希碰撞。 Java 8引入 Spread Hash 算法高位参与运算降低碰撞概率。统计元素数量 Java 7遍历所有 Segment 的 count 累加可能不准确。 Java 8基于 LongAdder 的分段计数baseCount CounterCell[]保证高并发下精确统计。 4. 内存与 API 增强维度JDK 1.7JDK 1.8内存占用高Segment 固定内存开销低动态 Node 数组按需树化API 扩展基础方法put/get/remove新增函数式 APIcompute, forEach迭代器一致性弱一致性可能遗漏更新弱一致性优化遍历逻辑 5. 查询时间复杂度 Java 7 遍历链表的时间复杂度是 O(n)n 为链表长度。 Java 8 如果变成遍历红黑树那么时间复杂度降低为 O(log(n))n 为树的节点个数。

查看全文

http://www.hkea.cn/news/14539479/