Boost C++ Libraries Home Libraries People FAQ More


The Data Structure 數據結構

The containers are made up of a number of 'buckets', each of which can contain any number of elements. For example, the following diagram shows an unordered_set with 7 buckets containing 5 elements, A, B, C, D and E (this is just for illustration, containers will typically have more buckets).
這些容器由多個'桶'組成,每個桶可以包含任意數量的元素。例如,下圖示範了一個有7個桶、包含5個元素 A, B, C, DEunordered_set& nbsp;(這只是用於示例的,通常的容器會有很多桶)。

In order to decide which bucket to place an element in, the container applies the hash function, Hash, to the element's key (for unordered_set and unordered_multiset the key is the whole element, but is referred to as the key so that the same terminology can be used for sets and maps). This returns a value of type std::size_t. std::size_t has a much greater range of values then the number of buckets, so that container applies another transformation to that value to choose a bucket to place the element in.
為了確定將一個元素放在哪個桶中,容器要將散列函數 Hash 用於元素的鍵值(對於 unordered_setunordered_multiset, 鍵即為整個元素本身,不過我們還是稱之為鍵,與 sets 和 maps 中所使用的術語保持一致)。函數返回一個類型為 std::size_t 的值。std::size_t 擁有比桶的數量更大的值範圍,所以容器要對這個值應用另一次轉換,以選擇某一個桶來存放元素。

Retrieving the elements for a given key is simple. The same process is applied to the key to find the correct bucket. Then the key is compared with the elements in the bucket to find any elements that match (using the equality predicate Pred). If the hash function has worked well the elements will be evenly distributed amongst the buckets so only a small number of elements will need to be examined.
根據一個給定鍵值取回元素也很簡單。對該鍵值應用相同的過程,找到正確的桶。然後將鍵值與桶中的各個元素進行比較,(使用等同性謂詞 Pred) 找到匹配的元素。如果散列函數工作的好的話,各元素將均勻分佈在各個桶中,則只需要測試少量的元素。

There is more information on hash functions and equality predicates in the next section.
在下一節中,有 更多的關於散列函數和等同性謂 詞的信息

You can see in the diagram that A & D have been placed in the same bucket. When looking for elements in this bucket up to 2 comparisons are made, making the search slower. This is known as a collision. To keep things fast we try to keep collisions to a minimum.
從圖中你可以看到,AD 被置於同一個桶中。在這個桶中查找元素需要最多2次的比較,這使得查找減慢。這一情況被稱為衝突。要使得查找更快,我們就要將衝突保持在最小。

Table 24.1. Methods for Accessing Buckets
表 24.1. 桶的訪問方法

Method 方法

Description 說明

size_type bucket_count() const The number of buckets.
size_type max_bucket_count() const An upper bound on the number of buckets.
size_type bucket_size(size_type n) const The number of elements in bucket n.
n 中的元素數量。
size_type bucket(key_type const& k) const Returns the index of the bucket which would contain k
返回包含 k 的桶的索引。
local_iterator begin(size_type n); Return begin and end iterators for bucket n.
返回桶 n 的 begin 和 end 迭代器。
local_iterator end(size_type n);
const_local_iterator begin(size_type n) const;
const_local_iterator end(size_type n) const;
const_local_iterator cbegin(size_type n) const;
const_local_iterator cend(size_type n) const;

Controlling the number of buckets 控制桶的數量

As more elements are added to an unordered associative container, the number of elements in the buckets will increase causing performance to degrade. To combat this the containers increase the bucket count as elements are inserted. You can also tell the container to change the bucket count (if required) by calling rehash.
當越來越多的元素被加入到無序關聯式容器中時,各桶中元素數量的上升會導致性能下降。為防止性能的下降,容器要在元素插入時增加桶的數量。你也可以通過調 用  rehash 來告訴容器(在需要時)改變桶的數量。

The standard leaves a lot of freedom to the implementer to decide how the number of buckets are chosen, but it does make some requirements based on the container's 'load factor', the average number of elements per bucket. Containers also have a 'maximum load factor' which they should try to keep the load factor below.
標準留給實現者很大的自由度來決定桶的數量,不過也有基於容器的'負載因子'的一些要求,'負載因子'即每個桶中元素的平均數量。容器還有一個'最大負載 因子',容器會嘗試將負載因子保持在'最大因子'之下。

You can't control the bucket count directly but there are two ways to influence it:

max_load_factor doesn't let you set the maximum load factor yourself, it just lets you give a hint. And even then, the draft standard doesn't actually require the container to pay much attention to this value. The only time the load factor is required to be less than the maximum is following a call to rehash. But most implementations will try to keep the number of elements below the max load factor, and set the maximum load factor to be the same as or close to the hint - unless your hint is unreasonably small or large.
max_load_factor 並不是讓你直接設定最大負載因子,它只是讓你給定一個 hint 提示。儘管如此,標準草案並不真正要求容器必須關注這個值。只是在調用 rehash 後才要求負載因子必須小於最大值。不過大多數實現都努力將元素數量保持在最大負載因子之下,並設置最大負載因子必須與提示值相同或相近 - 除非你的提示值不合理地過小或過大。

Table 24.2. Methods for Controlling Bucket Size
表 24.2. 控制桶大小的方法

Method 方法

Description 說明

float load_factor() const

The average number of elements per bucket.


float max_load_factor() const

Returns the current maximum load factor.


float max_load_factor(float z)

Changes the container's maximum load factor, using z as a hint.

修改容器的最大負載因子,以 z 作為提示。

void rehash(size_type n)

Changes the number of buckets so that there at least n buckets, and so that the load factor is less than the maximum load factor.

修改桶的數量,至少為 n 個桶,且負載因子小於最大負載因子

Iterator Invalidation 迭代器的失效

It is not specified how member functions other than rehash affect the bucket count, although insert is only allowed to invalidate iterators when the insertion causes the load factor to be greater than or equal to the maximum load factor. For most implementations this means that insert will only change the number of buckets when this happens. While iterators can be invalidated by calls to insert and rehash, pointers and references to the container's elements are never invalidated.
除了 rehash 以外,其它成員函數如何影響桶的數量並沒有規定,insert 操作僅當插入導致負載因子大於或等於最大負載因子時允許使得迭代器失效。對於多數實現來說,這意味著只有當此事發生時,插入操作才會改變桶的數量。因此, 迭代器只會在調用 insertrehash 時才可能失效,而指向容器中的元素的指針和引用則永不失效。

In a similar manner to using reserve for vectors, it can be a good idea to call rehash before inserting a large number of elements. This will get the expensive rehashing out of the way and let you store iterators, safe in the knowledge that they won't be invalidated. If you are inserting n elements into container x, you could first call:
和對 vectors 使用 reserve 一樣,在往無序關聯式容器中插入大量元素之前,最好先調用 rehash。這樣可以養活不必要 的代價昂貴的重散列,讓你可以保存迭代器,並確知它們不會失效。如果你要往容器 x 中插入 n 個元素,你可以先調用:

x.rehash((x.size() + n) / x.max_load_factor() + 1);