Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

Implementation Rationale 實現原理

The intent of this library is to implement the unordered containers in the draft standard, so the interface was fixed. But there are still some implementation decisions to make. The priorities are conformance to the standard and portability.
本庫的目的是實現標準草案中的無序容器,所以庫的接口是固定的。不過還是做一些實現上的決定。首先是與標準的一致性和可移植性。

The wikipedia article on hash tables has a good summary of the implementation issues for hash tables in general.
wikipedia article on hash tables 上有關於通常的散列表實現問題的一個很好的總結。

Data Structure 數據結構

By specifying an interface for accessing the buckets of the container the standard pretty much requires that the hash table uses chained addressing.
通過指定一個接口用於訪問容器中的桶,標準幾乎就是要求散列表使用鏈接法。

It would be conceivable to write a hash table that uses another method. For example, it could use open addressing, and use the lookup chain to act as a bucket but there are a some serious problems with this:
可以想像編寫使用其它方法的散列表。例如,它可以使用開放法,並使用查找鏈來作為桶,不過這樣會存在一些嚴重的問題:

So chained addressing is used.
所以我們使用了鏈接法。

For containers with unique keys I store the buckets in a single-linked list. There are other possible data structures (such as a double-linked list) that allow for some operations to be faster (such as erasing and iteration) but the possible gain seems small compared to the extra memory needed. The most commonly used operations (insertion and lookup) would not be improved at all.
對於唯一鍵的容器,我將桶保存在一個單鏈表中。有一些其它的數據結構(如雙鏈表)可以使得某些操作更快(如刪除和迭代),不過可能的收益似乎比所需的額外內存開銷要小。畢竟最常用的操作(插入和查找)並沒有得到提升。

But for containers with equivalent keys a single-linked list can degrade badly when a large number of elements with equivalent keys are inserted. I think it's reasonable to assume that users who choose to use unordered_multiset or unordered_multimap do so because they are likely to insert elements with equivalent keys. So I have used an alternative data structure that doesn't degrade, at the expense of an extra pointer per node.
但對於非唯一鍵的容器,使用單鏈表會在插入大量相同鍵值的元素時性能嚴重退化。我認為,假設選擇使用 unordered_multisetunordered_multimap 的用戶將很可能會插入相等鍵值的元素,這是很合理的。所以我對此種容器使用了另一種性能不會退化的數據結構,其代價是每個節點多用一個指針。

This works by adding storing a circular linked list for each group of equivalent nodes in reverse order. This allows quick navigation to the end of a group (since the first element points to the last) and can be quickly updated when elements are inserted or erased. The main disadvantage of this approach is some hairy code for erasing elements.
這是通過為每組相等的節點在相反順序上增加一個循環的鏈表來實現的。這樣可以快速地定位到該組節點的末尾(因為第一個元素指向最後一個),並且在插入或刪除元素時也可以快速更新。這種方法的主要缺點是刪除元素的代碼有點不太好看。

Number of Buckets 桶的數量

There are two popular methods for choosing the number of buckets in a hash table. One is to have a prime number of buckets, another is to use a power of 2.
有兩種常見的方法來選擇在散列表中的桶的數量。一種是讓桶的數量為素數,另一種是使用2的冪數。

Using a prime number of buckets, and choosing a bucket by using the modulus of the hash function's result will usually give a good result. The downside is that the required modulus operation is fairly expensive.
使用素數個桶,並且通過對散列函數的結果取模來選擇桶通常有不錯的結果。缺點是模操作的代價有點高。

Using a power of 2 allows for much quicker selection of the bucket to use, but at the expense of loosing the upper bits of the hash value. For some specially designed hash functions it is possible to do this and still get a good result but as the containers can take arbitrary hash functions this can't be relied on.
使用2的冪則可以更快地選擇要用的桶,但是它的代價是損失了散列值的高位信息。對於一些特殊設計的散列函數,這樣是可以的,並且結果也還不錯,但是考慮到容器可能使用任意的散列函數,所以不能依賴於它。

To avoid this a transformation could be applied to the hash function, for an example see Thomas Wang's article on integer hash functions. Unfortunately, a transformation like Wang's requires knowledge of the number of bits in the hash value, so it isn't portable enough. This leaves more expensive methods, such as Knuth's Multiplicative Method (mentioned in Wang's article). These don't tend to work as well as taking the modulus of a prime, and the extra computation required might negate efficiency advantage of power of 2 hash tables.
為了避免這個問題,可以對散列函數進行一個轉換,有關例子請見 Thomas Wang 關於整數散列函數的論文。 不幸,像 Wang 所給出的轉換方法要求知道散列值中的二進制位數,所以它的可移植性不夠。還有一些更昂貴的方法,如 Knuth 的相乘法(在 Wang 的論文中提及)。它們和對素數取模一樣不太好用,而且所需的額外計算可能還會消除2的冪的散列表所帶來的性能優勢。

So, this implementation uses a prime number for the hash table size.
所以,我們實現使用素數作為散列表的大小。

Equality operators 相等操作符

operator== and operator!= are not included in the standard, but I've added them as I think they could be useful and can be efficiently implemented. They are specified differently to the standard associative containers, comparing keys using the equality predicate rather than operator==. This is inconsistent with the other containers but it is probably closer to user's expectations.
operator==operator!= 並沒有包含在標準中,不過我把它們加了進來,因為我認為它們可能是有用的,也可以高效地實現。它們與標準關聯式容器是非常不同的,對鍵值的比較是使用等同性謂詞而不是 operator== 的。這一點與其它容器是不一致的,但是它可能更接近於用戶的期待。

Active Issues and Proposals 活躍的問題與建議

Removing unused allocator functions 刪除無用的分配器函數

In N2257, removing unused allocator functions, Matt Austern suggests removing the construct, destroy and address member functions - all of which Boost.Unordered calls. Changing this will simplify the implementation, as well as make supporting emplace easier, but means that the containers won't support allocators which require these methods to be called. Detlef Vollmann opposed this change in N2339.
在 N2257, 刪除無用的分配器函數 一文中,Matt Austern 建議刪除 construct, destroyaddress 成員函數 - 它們都是 Boost.Unordered 調用的。這個修改將簡化實現,並可以更容易地支持 emplace,但是將意味著容器不支持分配器,因為分配器要調用這些方法。Detlef Vollmann 在 N2339 中反對了這個修改。

Swapping containers with unequal allocators 交換具有不同分配器的容器

It isn't clear how to swap containers when their allocators aren't equal. This is Issue 431: Swapping containers with unequal allocators.
如果交換具有不同分配器的容器,這一點還不清楚。這正是 Issue 431: 交換具有不同分配器的容器

Howard Hinnant wrote about this in N1599 and suggested swapping both the allocators and the containers' contents. But the committee have now decided that swap should do a fast swap if the allocator is Swappable and a slow swap using copy construction otherwise. To make this distinction requires concepts.
Howard Hinnant 在 N1599 中寫到了這個問題,並建議同時交換分配器和容器中的內容。但是委員會當前的決定是,如果分配器是可交換的則 swap 應進行快速的交換,否則使用複製構造來進行慢速的交換。要實現這一區分需要用到概念。

In N2387, Omnibus Allocator Fix-up Proposals, Pablo Halpern suggests that there are actually two distinct allocator models, "Moves with Value" and "Scoped" which behave differently:
N2387, 多項分配器修正的建議書 中,Pablo Halpern 建議區分兩種分配器模型,"Moves with Value" 和 "Scoped",它們的行為有以下差異:

When allocators are allowed to have state, it is necessary to have a model for determining from where an object obtains its allocator. We』ve identified two such models: the 「Moves with Value」 allocator model and the 「Scoped」 allocator model.
如果分配器允許具有狀態,則需要有一個模型來判斷一個對像從何處得到它的分配器。我們已經確定兩種此類模型:「Moves with Value」 分配器模型和 「Scoped」 分配器模型。

In the 「Moves with Value」 allocator model, the copy constructor of an allocator-aware class will copy both the value and the allocator from its argument. This is the model specified in the C++03 standard. With this model, inserting an object into a container usually causes the new container item to copy the allocator from the object that was inserted. This model can be useful in special circumstances, e.g., if the items within a container use an allocator that is specially tuned to the item』s type.
在 「Moves with Value」 分配器模型中,一個帶分配器的類的複製構造函數應從它的參數中同時複製對象的值和分配器。這是在 C++03 標準中指定的模型。使用此模型,插入一個對像到容器中通常會導致容器中的新元素從被插入對像中複製分配器。這種模型基特定環境下有用,例如,如果容器中的 元素使用了一個為該元素類型特定調整的分配器。

In the 「Scoped」 allocator model, the allocator used to construct an object is determined by the context of that object, much like a storage class. With this model, inserting an object into a container causes the new container item to use the same allocator as the container. To avoid allocators being used in the wrong context, the allocator is never copied during copy or move construction. Thus, it is possible using this model to use allocators based on short-lived resources without fear that an object will transfer its allocator to a copy that might outlive the (shared) allocator resource. This model is reasonably safe and generally useful on a large scale. There was strong support in the 2005 Tremblant meeting for pursuing an allocator model that propagates allocators from container to contained objects.
在 「Scoped」 分配器模型中,用於構造一個對象的分配器是由對象的上下文來決定的,很可能是一個存儲類。使用此模型,手稿一個對像到容器中,將導致容器中的新元素使用與 容器相同的分配器。為避免分配器被用於錯誤的上下文中,分配器在複製或移動構造時從不進行複製。因此,可以將此模型用於在短期資源上的分配器,而不需要擔 心一個對像會將它的分配器傳給一個可能比(共享)分配器資源長命的拷貝。這種模型相當安全,通常用於較大的規模。在 2005 Tremblant 會議上,有人大力支持推行一種分配器模型,即由容器將分配器傳播給所包含的對象。

With these models the choice becomes clearer:
對於這些模型,選擇變得更清晰了:

I introduced the 「Moves with Value」 allocator model and the 「Scoped」 allocator model. In the former case, the allocator is copied when the container is copy-constructed. In the latter case it is not. Swapping the allocators is the right thing to do if the containers conform to the 「Moves with Value」 allocator model and absolutely the wrong thing to do if the containers conform to the 「Scoped」 allocator model. With the two allocator models well-defined, the desired behavior becomes clear.
我已經介紹了 「Moves with Value」 分配器模型玫 「Scoped」 分配器模型。在前一種情況下,當容器被複製構造時,分配器會被複製。而在後一種情況下則不會。如果容器符合 「Moves with Value」 分配器模型,則正確的方式是交換分配器,但如果容器符合 「Scoped」 分配器模型,這樣做就絕對是錯誤的。兩種分配器模型具有良好的定義,想得到的行為也變得清楚了。

The proposal is that allocators are swapped if the allocator follows the "Moves with Value" model and the allocator is swappable. Otherwise a slow swap is used. Since containers currently only support the "Moves with Value" model this is consistent with the committee's current recommendation (although it suggests using a trait to detect if the allocator is swappable rather than a concept).
建議是,如果分配器符合 "Moves with Value" 模型且分配器是可交換的,則應交換分配器。否則使用慢交換。由於當前的容器只支持 "Moves with Value" 模型,這就和委員會當前的推薦是一致的(雖然它建議使用 trait 而不是概念來檢查分配器是否可交換)。

Since there is currently neither have a swappable trait or concept for allocators this implementation always performs a slow swap.
由於當前對於分配器還沒有一個'可交換' trait 或概念,所以這個實現總是執行慢交換。

Are insert and erase stable for unordered_multiset and unordered_multimap? 

對於 unordered_multiset 和 unordered_multimap 來說,插入和刪除是穩定的嗎?

It is not specified if unordered_multiset and unordered_multimap preserve the order of elements with equivalent keys (i.e. if they're stable under insert and erase). This is issue 581. The current proposal is that insert, erase and rehash are stable - so they are here. (Update: during the release of this version, this requirement was added to the lastest working draft).
沒有規定 unordered_multisetunordered_multimap 是否要保持相等鍵值元素的順序(即在 inserterase 時它們是否穩定)。這是 issue 581。當前的建議是,插入、刪除和重散列都應是穩定的 - 所以它們就是這樣的。(更新:在這個版本發佈期間,這一要求被加入到 最後工作草案)。

const_local_iterator cbegin, cend missing from TR1 

TR1中缺少 const_local_iterator cbegin, cend

Issue 691 is that cbegin and cend are missing for local iterators. The current resolution is that they'll be added, so I've added them.
Issue 691 是,對於局部迭代器缺少 cbegincend。當前的決定是要加入它們,所以我已經加了。


PrevUpHomeNext