6 HAT Implications(暗示、启示)
With an understanding of which semantics are HAT-compliant, in this section, we analyze the implications of these results for existing systems and briefly study HAT systems on public cloud infrastructure. Specifically:
- We revisit traditional database concurrency control with a focus on coordination costs and on high availability.
- We examine the properties required by an OLTP application based on the TPC-C benchmark.
- We perform a brief experimental evaluation of HAT versus non-HAT properties on public cloud infrastructure.
了解了哪些语义符合HAT,在本节中,我们分析了这些结果对现有系统的影响,并简要研究了公共云基础设施上的HAT系统。明确地表示:
- 我们重新讨论了传统的数据库并发控制,重点是协调成本和高可用性。
- 我们检查基于TPC-C基准的OLTP应用程序所需的属性。
- 我们在公共云基础设施上对HAT和非HAT属性进行了简短的实验评估。
6.1 HA and Existing Algorithms
While we have shown that many database isolation levels are achievable as HATs, many traditional concurrency control mechanisms(机制) do not provide high availability—even for HAT-compliant isolation levels. Existing mechanisms often presume (or are adapted from) single-server non-partitioned deployments or otherwise focus on serializability as a primary use case. In this section, we briefly discuss design decisions and algorithmic details that preclude(阻止,妨碍) high availability.
虽然我们已经证明,许多数据库隔离级别可以像HATs一样实现,但许多传统的并发控制机制甚至不能为符合HAT的隔离级别提供高可用性。现有的机制通常假定(或改编自)单服务器非分区部署,或者将可序列化性作为主要用例。在本节中,我们将简要讨论排除高可用性的设计决策和算法细节。
Serializability To establish a serial order on transactions, algorithms for achieving serializability of general-purpose read-write transactions in a distributed setting [14, 28] require at least one RTT before committing. As an example, traditional two-phase locking for a transaction of length T may require T lock operations and will require at least one lock and one unlock operation. In a distributed environment, each of these lock operations requires coordination, either with other database servers or with a lock service. If this coordination mechanism is unavailable, transactions cannot safely commit. Similarly, optimistic concurrency control requires coordinating via a validation step, while deterministic transaction scheduling [56] requires contacting a scheduler. Serializability under multi-version concurrency control requires checking for update conflicts. All told, the reliance on a globally agreed total order necessitates a minimum of one round-trip to a designated master or coordination service for each of these classic algorithms. As we saw in Section 2, is will be determined by the deployment environment; we will further demonstrate this in Section 6.3.
串行化(Serializability)。为了在事务上建立串行顺序,在分布式设置中实现通用读写事务串行化的算法[14,28]在提交之前至少需要一个RTT。例如,对于长度为T的事务的传统两阶段锁定(2PL)可能需要T锁定操作,并且将需要至少一个锁定和一个解锁操作。在分布式环境中,每个锁操作都需要与其他数据库服务器或锁服务进行协调。如果此协调机制不可用,则事务无法安全提交。类似地,乐观并发控制需要通过验证步骤进行协调,而确定性事务调度[56]需要联系调度器。多版本并发控制下的可序列化性要求检查更新冲突。总的来说,依赖于全局一致同意的全局顺序,这些经典算法中的每一个都需要至少一次往返到指定的主机或协调服务。正如我们在第2节中看到的,将由部署环境决定;我们将在第6.3节中进一步说明这一点。
Non-serializability Most existing distributed implementations of weak isolation are not highly available. Lock-based mechanisms such as those in Gray’s original proposal [38] do not degrade gracefully in the presence of partial failures. (Note, however, that lock-based protocols do offer the benefit of recency guarantees.) While multi-versioned storage systems allow for a variety of transactional guarantees, few offer traditional weak isolation (e.g., non-“tentative update” schemes) in this context. Chan and Gray’s read-only transactions have item-cut isolation with causal consistency and MAV (session PL-2L [2]) but are unavailable in the presence of coordinator failure and assume serializable update transactions [20]; this is similar to read-only and write-only transactions more recently pro- posed by Eiger [48]. Brantner’s S3 database [15] and Bayou [60] can all provide variants of session PL-2L with high availability, but none provide this HAT functionality without substantial modification. Accordingly, it is possible to implement many guarantees weaker than serializability—including HAT semantics—and still not achieve high availability. We view high availability as a core design consideration in future concurrency control designs.
非串行化(Non-serializability)。大多数现有的弱隔离分布式实现都不是高可用的。基于锁的机制,比如格雷最初的方案[38]中的机制,在出现部分故障时不会正常降级。(然而,请注意,基于锁的协议确实提供了近期担保的好处。)虽然多版本存储系统允许各种事务性保证,但在这种情况下,很少有系统提供传统的弱隔离(例如,非“临时更新”方案)。Chan和Gray的只读事务具有item快照隔离,其具有因果一致性和MAV(会话PL-2L[2]),但在存在协调器故障时不可用,并假定可序列化更新事务[20];这类似于Eiger最近提出的只读和只读事务[48]。Brantner的S3数据库[15]和Bayou[60]都可以提供具有高可用性的会话PL-2L变体,但没有一个能够在不进行实质性修改的情况下提供这种HAT功能。因此,有可能实现比序列化性弱的许多保证,包括HAT语义,但仍然无法实现高可用性。我们将高可用性视为未来并发控制设计的核心设计考虑因素。
6.2 Application Requirements
Thus far, we have largely ignored the question of when HAT semantics are useful (or otherwise are too weak). As we showed in Section 5, the main cost of high availability and low latency comes in the inability to prevent Lost Update, Write Skew, and provide recency bounds. To better understand the impact of HAT-compliance in an application context, we consider a concrete application: the TPC-C benchmark. In brief, we find that four of five transactions can be executed via HATs, while the fifth requires unavailability.
到目前为止,我们在很大程度上忽略了HAT语义何时有用(或者太弱)的问题。正如我们在第5节中所展示的,高可用性和低延迟的主要成本在于无法防止更新丢失、写倾斜和提供最近性限制。为了更好地理解应用程序上下文中的HAT-compliance的影响,我们考虑一个具体的应用:TPC-C基准。简言之,我们发现五个事务中有四个可以通过HATs执行,而第五个事务需要不可用。
TPC-C consists of five transactions, capturing the operation of a wholesale warehouse, including sales, payments, and deliveries. Two transactions—Order-Status and Stock-Level—are read-only and can be executed safely with HATs. Clients may read stale data, but this does not violate TPC-C requirements and clients will read their writes if they are sticky-available. Another transaction type, Payment, updates running balances for warehouses, districts, and customer records and provides an audit trail. The transaction is monotonic—increment-and append-only—so all balance increase operations commute, and MAV allows the maintenance of foreign-key integrity constraints (e.g., via UPDATE/DELETE CASCADE).
TPC-C由五个事务组成,捕获批发仓库的操作,包括销售、支付和交付。订单状态(Order-Status)和库存水平(Stock-Level)两个事务是只读的,可以通过HATs安全地执行。客户机可能会读取过时的数据,但这并不违反TPC-C要求,如果写操作粘性可用,客户机将读取写操作。另一种事务“付款”更新仓库、地区和客户记录的运行余额,并提供审计跟踪(audit trail)。事务(交易)是单调递增的,只会增加和追加,因此,所有余额增加操作commute(so all balance increase operations commute),MAV允许维护外键完整性约束 (例如,通过UPDATE/DELETE CASCADE)。
New-Order and Delivery. While three out of five transactions are easily achievable with HATs, the remaining two transactions are not as simple. The New-Order transaction places an order for a variable quantity of data items, updating warehouse(仓库,货栈) stock as needed. It selects a sales district, assigns the order an ID number, adjusts the remaining warehouse stock, and writes a placeholder entry for the pending order. The Delivery transaction represents the fulfillment of a New-Order: it deletes the order from the pending list, updates the customer’s balance, updates the order’s carrier ID and delivery time, and updates the customer balance.
New-Order and Delivery. 虽然使用HATs可以轻松实现五分之三的事务,但剩下的两个事务并没有那么简单。New-Order新订单事务为可变数量的数据项下订单,并根据需要更新仓库库存。它选择一个销售地区,为订单分配一个ID号,调整剩余的仓库库存,并为待定订单写入一个占位符条目。Delivery交付事务代表新订单的履行:它从待定列表中删除订单,更新客户余额,更新订单的承运商ID和交付时间,并更新客户余额。
IDs and decrements. The New-Order transaction presents two challenges: ID assignment and stock maintenance. First, each New-Order transaction requires a unique ID number for the order. We can create a unique number by, say, concatenating the client ID and a timestamp. However, the TPC-C specification requires order numbers to be sequentially assigned within a district, which requires preventing Lost Update. Accordingly, HATs cannot provide compliant TPC-C execution but can maintain uniqueness constraints. Second, the New-Order transaction decrements inventory counts: what if the count becomes negative? Fortunately, New-Order restocks each item’s inventory count (increments by 91) if it would become negative as the result of placing an order. This means that, even in the presence of concurrent New-Orders, an item’s stock will never fall below zero. This is TPC-C compliant, but a HAT system might end up with more stock than in an unavailable implementation with synchronous coordination.
IDs and decrements. 新订单事务带来了两个挑战:ID分配和库存维护。首先,每个NewOrder交易都需要订单的唯一ID号。我们可以通过连接客户端ID和时间戳来创建一个唯一的数字。然而,TPC-C规范要求在一个地区内顺序分配订单号,这需要防止更新丢失(译注:T2丢失T1的修改)。因此,HATs不能提供兼容的TPC-C执行,但可以保持唯一性约束。第二,新订单交易减少库存计数:如果计数变为负数怎么办?幸运的是,如果下订单后库存数量变为负值,NewOrder会重新进货(增加91)。这意味着,即使同时出现新订单,一个项目的库存也永远不会低于零。这是TPC-C兼容的,但HAT系统最终可能会比同步协调的不可用实现拥有更多库存。
TPC-C Non-monotonicity. The Delivery(出货) transaction is challenging due to non-monotonicity. Each Delivery deletes a pending order from the New-Order table and should be idempotent in order to avoid billing a customer twice; this implies a need to prevent Lost Update. This issue can be avoided by moving the non-monotonicity to the real world—the carrier that picks up the package for an order can ensure that no other carrier will do so—but cannot provide a correct execution with HATs alone. However, according to distributed transaction architects [40], these compensatory actions are relatively common in real-world business processes.
TPC-C Non-monotonicity. 由于非单调性(译注:删除操作),交付事务具有挑战性。每次Delivery都会从新订单表中删除一个待定订单,并且应该是幂等的,以避免向客户开两次账单;这意味着需要防止更新丢失。通过将非单调性转移到现实世界,可以避免这个问题。为订单提货的承运人可以确保没有其他承运人会这样做,但不能单独使用HATs提供正确的执行。然而,根据分布式事务架构[40],这些补偿行为在现实世界的业务流程中相对常见。
Integrity Constraints. Throughout execution, TPC-C also requires the maintenance of several integrity constraints. For example, Consistency Condition 1 (3.3.2.1) requires that each warehouse’s sales count must reflect the sum of its subordinate sales districts. This integrity constraint spans two tables but, given the ability to update rows in both tables atomically via MAV, can be easily maintained. Consistency Conditions 4 through 12 (3.3.2.4-12) can similarly be satisfied by applying updates atomically across tables. Consistency Conditions 2 and 3 (3.3.2.2-3) concern order ID assignment and are problematic. Finally, while TPC-C is not subject to multi-key anomalies, we note that many TPC-E isolation tests (i.e., simultaneously modifying a product description and its thumbnail) are also achievable using HATs.
(完整性约束)Integrity Constraints. 在整个执行过程中,TPC-C还需要维护几个完整性约束。例如,一致性条件1(3.3.2.1)要求每个仓库的销售计数必须反映其下属销售区域的总和。这个完整性约束跨越两个表,但由于能够通过MAV原子地更新两个表中的行,因此可以很容易地维护。一致性条件4到12(3.3.2.4-12)同样可以通过在表之间应用原子更新来满足。一致性条件2和3(3.3.2.2-3)涉及订单ID分配,存在问题。最后,虽然TPC-C不受多key异常的影响,但我们注意到,许多TPC-E隔离测试(即,同时修改产品描述及其缩略图)也可以使用HATs实现。
Summary. Many—but not all—TPC-C transactions are well served by HATs. The two problematic transactions, New-Order and Payment, rely on non-monotonic state update. The former can be modified to ensure ID uniqueness but not sequential ID ordering, while the latter is inherently non-monotonic, requiring external compensation or stronger consistency protocols. Based on these experiences and discussions with practitioners, we believe that HAT guarantees can provide useful semantics for a large class of application functionality, while a (possibly small) subset of operations will require stronger, unavailable properties.
Summary. HATs可以很好地处理许多但并非所有TPC-C事务。新订单和支付这两个有问题的事务依赖于非单调状态更新。前者可以修改,以确保ID唯一性,但不保证顺序ID排序,而后者本质上是非单调的,需要外部补偿或更强的一致性协议。基于这些经验和与从业者的讨论,我们相信,HAT保证可以为一大类应用程序功能提供有用的语义,而一小部分操作将需要更强的、不可用的属性。
6.3 Experimental Costs
To demonstrate the performance implications of HAT guarantees in a real-world environment, we implemented a HAT database pro- totype. We verify that, as Section 2.2’s measurements suggested, “strongly consistent” algorithms incur substantial latency penalties (over WAN, 10 to 100 times higher than their HAT counterparts) compared to HAT-compliant algorithms, which scale linearly. Our goal is not a complete performance analysis of HAT semantics but instead a proof-of-concept demonstration of a small subset of the HAT space on real-world infrastructure.
Implementation. Our prototype database is a partially replicated (hash-based partitioned) key-value backed by LevelDB and imple- mented in Java using Apache Thrift. It currently supports eventual consistency (hereafter, eventual; last-writer-wins RU with stan- dard all-to-all anti-entropy between replicas) and the efficient HAT MAV algorithm as sketched in Section 5.1.2. (hereafter, MAV). We support non-HAT operation whereby all operations for a given key are routed to a (randomly) designated master replica for each key (guaranteeing single-key linearizability, as in Gilbert and Lynch’s CAP Theorem proof [35] and in PNUTS [23]’s “read latest” oper- ation; hereafter, master) as well as distributed two-phase locking. Servers are durable: they synchronously write to LevelDB before responding to client requests, while new writes in MAV are syn- chronously flushed to a disk-resident write-ahead log.
Configuration. We deploy the database in clusters—disjoint sets of database servers that each contain a single, fully replicated copy of the data—typically across datacenters and stick all clients within a datacenter to their respective cluster (trivially providing read-your- writes and monotonic reads guarantees). By default, we deploy 5 Amazon EC2 m1.xlarge instances as servers in each cluster. For our workload, we link our client library to the YCSB bench- mark [24], which is well suited to LevelDB’s key-value schema, grouping every eight YCSB operations from the default workload (50% reads, 50% writes) to form a transaction. We increase the number of keys in the workload from the default 1,000 to 100,000 with uniform random key access, keeping the default value size of 1KB, and running YCSB for 180 seconds per configuration.
Geo-replication. We first deploy the database prototype across an increasing number of datacenters. Figure 3A shows that, when op- erating two clusters within a single datacenter, mastering each data item results in approximately half the throughput and double the la- tency of eventual. This is because HAT models are able to utilize replicas in both clusters instead of always contacting the (single) master. RC—essentially eventual with buffering—is almost iden- tical to eventual, while MAV—which incurs two writes for every client-side write (i.e., new writes are sent to the WAL then subse- quently moved into LevelDB once stable)—achieves 75% of the throughput. Latency increases linearly with the number of YCSB clients due to contention within LevelDB.
In contrast, when the two clusters are deployed across the conti- nental United States (Figure 3B), the average latency of master in- creases to 300ms (a 278–4257% latency increase; average 37ms la- tency per operation). For the same number of YCSB client threads, master has substantially lower throughput than the HAT config- urations. Increasing the number of YCSB clients does increase the throughput of master, but our Thrift-based server-side connec- tion processing did not gracefully handle more than several thou- sand concurrent connections. In contrast, across two datacenters, the performance of eventual, RC, and MAV are near identical to a single-datacenter deployment.
https://zhuanlan.zhihu.com/p/47445841