Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

Performance 性能

Back insertion and destruction 後端插入和析構
Reversing 反序
Sorting 排序
Write access 寫訪問
Conclusions 結論

Boost.Intrusive containers offer speed improvements compared to non-intrusive containers primarily because:
Boost.Intrusive 容器與非介入式容器相比,提高了速度,主要是因為:

This section will show performance tests comparing some operations on boost::intrusive::list and std::list:
這一節將展示在 boost::intrusive::liststd::list 之上的一些操作的性能測試比較:

Given an object of type T, boost::intrusive::list<T> can replace std::list<T> to avoid memory allocation overhead, or it can replace std::list<T*> when the user wants containers with polymorphic values or wants to share values between several containers. Because of this versatility, the performance tests will be executed for 6 different list types:
給定一個類型為 T 的對象,boost::intrusive::list<T> 可以替代 std::list<T>, 以避免內存分配的開銷,在用戶希望容器保存多態值或者想在多個容器間共享保存的值時,則可以替代 std::list<T*>。 因為這種多功能性,以下的性能測試將對6種不同的鏈表類型進行測試:

Both test_class and itest_class are templatized classes that can be configured with a boolean to increase the size of the object. This way, the tests can be executed with small and big objects. Here is the first part of the testing code, which shows the definitions of test_class and itest_class classes, and some other utilities:
test_classitest_class 都是模板化類,它們可以用一個布爾值來配置以增加對象的大小。這樣,測試可以分別對小對像和大對像執行。以下是測試代碼的開始部分,展示了 test_classitest_class 類的定義,以及一些其它工具:

//Iteration and element count defines 循環次數和元素數量定義
const int NumIter = 100;
const int NumElements = 50000;
using namespace boost::intrusive;
template<bool BigSize> struct filler { int dummy[10]; };
template <> struct filler<false> {};
template<bool BigSize> //The object for non-intrusive containers 用於非介入式容器的對象
struct test_class : private filler<BigSize>
{
int i_;
test_class() {}
test_class(int i) : i_(i) {}
friend bool operator <(const test_class &l, const test_class &r) { return l.i_ < r.i_; }
friend bool operator >(const test_class &l, const test_class &r) { return l.i_ > r.i_; }
};
template <bool BigSize, link_mode_type LinkMode>
struct itest_class //The object for intrusive containers 用於介入式容器的對象
: public list_base_hook<link_mode<LinkMode> >, public test_class<BigSize>
{
itest_class() {}
itest_class(int i) : test_class<BigSize>(i) {}
};
template<class FuncObj> //Adapts functors taking values to functors taking pointers 將接受值的仿函數適配為接受指針的仿函數
struct func_ptr_adaptor : public FuncObj
{
typedef typename FuncObj::first_argument_type* first_argument_type;
typedef typename FuncObj::second_argument_type* second_argument_type;
typedef typename FuncObj::result_type result_type;
result_type operator()(first_argument_type a, second_argument_type b) const
{ return FuncObj::operator()(*a, *b); }
};

As we can see, test_class is a very simple class holding an int. itest_class is just a class that has a base hook (list_base_hook) and also derives from test_class.
正如我們看到的,test_class 是一個非常簡單的類,它持有一個 intitest_class 則只是一個帶有基類鉤子(list_base_hook) 且派生自 test_class 的類。

func_ptr_adaptor is just a functor adaptor to convert function objects taking test_list objects to function objects taking pointers to them.
func_ptr_adaptor 是一個函數對像適配器,將一個接受 test_list 對象的函數對像轉換為接受對像指針的函數對象。

You can find the full test code code in the perf_list.cpp source file.
你可以在 perf_list.cpp 源文件中找到完整的測試代碼。

The first test will measure the benefits we can obtain with intrusive containers avoiding memory allocations and deallocations. All the objects to be inserted in intrusive containers are allocated in a single allocation call, whereas std::list will need to allocate memory for each object and deallocate it for every erasure (or container destruction).
第一個測試將測量由於介入式容器避免了內存分配和釋放而獲得的優勢。所有被插入到介入式容器中的對象都是在一次內存分配調用中分配的,而 std::list 則需要為每一個對像分配一次內存,並為每一次刪除(或是容器的析構)釋放一次內存。

Let's compare the code to be executed for each container type for different insertion tests:
我們來比較一下對於各種容器類型、對於各種不同的插入測試所要執行的代碼:

std::vector<typename ilist::value_type> objects(NumElements);
ilist l;
for(int i = 0; i < NumElements; ++i)
l.push_back(objects[i]);
//Elements are unlinked in ilist's destructor 在 ilist 的析構函數中,元素被斷鏈
//Elements are destroyed in vector's destructor 在 vector 的析構函數中,元素被銷毀

For intrusive containers, all the values are created in a vector and after that inserted in the intrusive list.
對於介入式容器,所有值在一個 vector 中創建,稍後被插入到介入式鏈表中。

stdlist l;
for(int i = 0; i < NumElements; ++i)
l.push_back(typename stdlist::value_type(i));
//Elements unlinked and destroyed in stdlist's destructor 在 stdlist 的析構函數中,元素被斷鏈並銷毀

For a standard list, elements are pushed back using push_back().
對於標準 list,使用 push_back() 將元素插入到後端。

std::vector<typename stdlist::value_type> objects(NumElements);
stdptrlist l;
for(int i = 0; i < NumElements; ++i)
l.push_back(&objects[i]);
//Pointers to elements unlinked and destroyed in stdptrlist's destructor
//
在 stdptrlist 的析構函數中,元素指針被斷鏈並銷毀
//Elements destroyed in vector's destructor 在 vector 的析構函數中,元素被銷毀

For a standard compact pointer list, elements are created in a vector and pushed back in the pointer list using push_back().
對於標準的緊湊指針鏈表,元素在一個 vector 中創建,並用 push_back() 插入到指針鏈表的後端。

stdlist objects; stdptrlist l;
for(int i = 0; i < NumElements; ++i){
objects.push_back(typename stdlist::value_type(i));
l.push_back(&objects.back());
}
//Pointers to elements unlinked and destroyed in stdptrlist's destructor
//在 stdptrlist 的析構函數中,元素指針被斷鏈並銷毀
//Elements unlinked and destroyed in stdlist's destructor 在 stdlist 的析構函數中,元素被斷鏈並銷毀

For a disperse pointer list, elements are created in a list and pushed back in the pointer list using push_back().
對於 鬆散指針鏈表, 元素在一個 list 中創建,並用 push_back() 插入到指針鏈表的後端。

These are the times in microseconds for each case, and the normalized time:
以下是在各種情況下測試得到的毫秒時間,以及規格化時間:

Table 10.2. Back insertion + destruction times for Visual C++ 7.1 / Windows XP
表 10.2. 後端插入 + 析構的時間,Visual C++ 7.1 / Windows XP

Container

容器

Time in us/iteration (small object / big object)

毫秒/次循環 (小對像/大對像)

Normalized time (small object / big object)

規格化時間 (小對像/大對像)

normal_link intrusive list

normal_link 介入式鏈表

5000 / 22500

1 / 1

safe_link intrusive list

safe_link 介入式鏈表

7812 / 32187

1.56 / 1.43

auto_unlink intrusive list

auto_unlink 介入式鏈表

10156 / 41562

2.03 / 1.84

Standard list

標準鏈表

76875 / 97500

5.37 / 4.33

Standard compact pointer list

標準的緊湊指針鏈表

76406 / 86718

15.28 / 3.85

Standard disperse pointer list

標準的鬆散指針鏈表

146562 / 175625

29.31 / 7.80


Table 10.3. Back insertion + destruction times for GCC 4.1.1 / MinGW over Windows XP
表 10.3
. 後端插入 + 析構的時間,GCC 4.1.1 / MinGW 於 Windows XP

Container

容器

Time in us/iteration (small object / big object)

毫秒/次循環 (小對像/大對像)

Normalized time (small object / big object)

規格化時間 (小對像/大對像)

normal_link intrusive list

normal_link 介入式鏈表

4375 / 22187

1 / 1

safe_link intrusive list

safe_link  介 入式鏈表

7812 / 32812

1.78 / 1.47

auto_unlink intrusive list

auto_unlink  介 入式鏈表

10468 / 42031

2.39 / 1.89

Standard list

標準鏈表

81250 / 98125

18.57 / 4.42

Standard compact pointer list

標準的緊湊指針鏈表

83750 / 94218

19.14 / 4.24

Standard disperse pointer list

標準的鬆散指針鏈表

155625 / 175625

35.57 / 7.91


Table 10.4. Back insertion + destruction times for GCC 4.1.2 / Linux Kernel 2.6.18 (OpenSuse 10.2)
表 10.4
. 後端插入 + 析構的時間,GCC 4.1.2 / Linux Kernel 2.6.18 (OpenSuse 10.2)

Container

容器

Time in us/iteration (small object / big object)

毫秒/次循環 (小對像/大對像)

Normalized time (small object / big object)

規格化時間 (小對像/大對像)

normal_link intrusive list

normal_link 介入式鏈表

4792 / 20495

1 / 1

safe_link intrusive list

safe_link  介 入式鏈表

7709 / 30803

1.60 / 1.5

auto_unlink intrusive list

auto_unlink  介 入式鏈表

10180 / 41183

2.12 / 2.0

Standard list

標準鏈表

17031 / 32586

3.55 / 1.58

Standard compact pointer list

標準的緊湊指針鏈表

27221 / 34823

5.68 / 1.69

Standard disperse pointer list

標準的鬆散指針鏈表

102272 / 60056

21.34 / 2.93


The results are logical: intrusive lists just need one allocation. The destruction time of the normal_link intrusive container is trivial (complexity: O(1)), whereas safe_link and auto_unlink intrusive containers need to put the hooks of erased values in the default state (complexity: O(NumElements)). That's why normal_link intrusive list shines in this test.
結果是符合邏輯的:介入式鏈表只需要一次內存分配。normal_link 介入式容器的析構時間是平凡的(複雜度:O(1)),而 safe_linkauto_unlink 介入式容器則需要將被移除值的鉤子設置為缺省狀態(複雜度:O(NumElements))。這正是 normal_link 介入式鏈表在這個測試中勝出的原因。

Non-intrusive containers need to make many more allocations and that's why they lag behind. The disperse pointer list needs to make NumElements*2 allocations, so the result is not surprising.
非介入式容器需要進行多次的內存分配,這正是它們落在後面的原因。disperse pointer list 需要進行 NumElements*2 次分配,所以結果並不意外。

The Linux test shows that standard containers perform very well against intrusive containers with big objects. Nearly the same GCC version in MinGW performs worse, so maybe a good memory allocator is the reason for these excellent results.
Linux 上的測試顯示,對於大的對象,標準容器和介入式容器一樣好。而在 MinGW 平台上的相近版本的 GCC 則較差,因此也許一個好的內存分配器是這一卓越結果的原因。

The next test measures the time needed to complete calls to the member function reverse(). Values (test_class and itest_class) and lists are created as explained in the previous section.
下一個測試測量完成對成員函數 reverse() 的調用所需的時間。值(test_classitest_class)和鏈表同樣是按上一節所說的那樣創建。

Note that for pointer lists, reverse does not need to access test_class values stored in another list or vector, since this function just needs to adjust internal pointers, so in theory all tested lists need to perform the same operations.
注意,對於指針鏈表,reverse 並不需要訪問保存在另一個 list 或 vector 中的 test_class,因為該函數只需調整內部的指針,所以理論上所有被測試的鏈表都只需完成相同的操作。

These are the results:
結果如下:

Table 10.5. Reverse times for Visual C++ 7.1 / Windows XP
表 10.5. 反序時間,Visual C++ 7.1 / Windows XP

Container
容器

Time in us/iteration (small object / big object)
毫秒/次循環 (小對像/大對像)

Normalized time (small object / big object)
規格化時間 (小對像/大對像)

normal_link intrusive list

2656 / 10625

1 / 1.83

safe_link intrusive list

2812 / 10937

1.05 / 1.89

auto_unlink intrusive list

2710 / 10781

1.02 / 1.86

Standard list

5781 / 14531

2.17 / 2.51

Standard compact pointer list

5781 / 5781

2.17 / 1

Standard disperse pointer list

10781 / 15781

4.05 / 2.72


Table 10.6. Reverse times for GCC 4.1.1 / MinGW over Windows XP
表 10.6. 反序時間,
GCC 4.1.1 / MinGW 於 Windows XP

Container
容器

Time in us/iteration (small object / big object)
毫秒/次循環 (小對像/大對像)

Normalized time (small object / big object)
規格化時間 (小對像/大對像)

normal_link intrusive list

2656 / 10781

1 / 2.22

safe_link intrusive list

2656 / 10781

1 / 2.22

auto_unlink intrusive list

2812 / 10781

1.02 / 2.22

Standard list

4843 / 12500

1.82 / 2.58

Standard compact pointer list

4843 / 4843

1.82 / 1

Standard disperse pointer list

9218 / 12968

3.47 / 2.67


Table 10.7. Reverse times for GCC 4.1.2 / Linux Kernel 2.6.18 (OpenSuse 10.2)
表 10.7. 反序時間,
GCC 4.1.2 / Linux Kernel 2.6.18 (OpenSuse 10.2)

Container
容器

Time in us/iteration (small object / big object)
毫秒/次循環 (小對像/大對像)

Normalized time (small object / big object)
規格化時間 (小對像/大對像)

normal_link intrusive list

2742 / 10847

1 / 3.41

safe_link intrusive list

2742 / 10847

1 / 3.41

auto_unlink intrusive list

2742 / 11027

1 / 3.47

Standard list

3184 / 10942

1.16 / 3.44

Standard compact pointer list

3207 / 3176

1.16 / 1

Standard disperse pointer list

5814 / 13381

2.12 / 4.21


For small objects the results show that the compact storage of values in intrusive containers improve locality and reversing is faster than with standard containers, whose values might be dispersed in memory because each value is independently allocated. Note that the dispersed pointer list (a list of pointers to values allocated in another list) suffers more because nodes of the pointer list might be more dispersed, since allocations from both lists are interleaved in the code:
對 於小對象,介入式容器中的值的緊湊存儲提高了局部性,其反序操作比標準容器快一些,標準容器的值可能會鬆散分佈在內存中,因為它的每一個值都是獨立分配 的。注意,鬆散指針鏈表(其值在另一個 list 中分配的指針鏈表)更差一些,因為該指針鏈表的節點可能更加分散,在代碼中的兩個鏈表的分配是交錯的:

//Object list (holding `test_class`) 對像鏈表(保存 `test_class`)
stdlist objects;
//Pointer list (holding `test_class` pointers) 指針鏈表(保存 `test_class` 指針)
stdptrlist l;
for(int i = 0; i < NumElements; ++i){
//Allocation from the object list 從對像鏈表分配
objects.push_back(stdlist::value_type(i));
//Allocation from the pointer list 從指針鏈表分配
l.push_back(&objects.back());
}

For big objects the compact pointer list wins because the reversal test doesn't need access to values stored in another container. Since all the allocations for nodes of this pointer list are likely to be close (since there is no other allocation in the process until the pointer list is created) locality is better than with intrusive containers. The dispersed pointer list, as with small values, has poor locality.
對於大對象,緊湊指針鏈表勝出,因為反序測試無需訪問保存在另一個容器中的值。由於這個指針鏈表的節點的所有分配可能非常靠近(因為在該指針鏈表創建完成之前沒有其它的分配),其局部性比介入式容器更好。和小對象的情況一樣,鬆散指針鏈表具有較差的局部性。

The next test measures the time needed to complete calls to the member function sort(Pred pred). Values (test_class and itest_class) and lists are created as explained in the first section. The values will be sorted in ascending and descenting order each iteration. For example, if l is a list:
下一個測試測量完成對成員函數 sort(Pred pred) 的調用所需的時間。值(test_classitest_class)和鏈表同樣是按第一節所說的那樣創建。在每次循環中,這些值將以升序和降序排序。例如,如果 l 為一個鏈表:

for(int i = 0; i < NumIter; ++i){
if(!(i % 2))
l.sort(std::greater<stdlist::value_type>());
else
l.sort(std::less<stdlist::value_type>());
}

For a pointer list, the function object will be adapted using func_ptr_adaptor:
對於指針鏈表,將使用 func_ptr_adaptor 對函數對像進行適配:

for(int i = 0; i < NumIter; ++i){
if(!(i % 2))
l.sort(func_ptr_adaptor<std::greater<stdlist::value_type> >());
else
l.sort(func_ptr_adaptor<std::less<stdlist::value_type> >());
}

Note that for pointer lists, sort will take a function object that will access test_class values stored in another list or vector, so pointer lists will suffer an extra indirection: they will need to access the test_class values stored in another container to compare two elements.
注意,對於指針鏈表,sort 將接受一個要訪問保存在另一個 list 或 vector 中的 test_class的函數對象,所以指針鏈表需要多一個間接層:它們需要訪問保存在另一個容器中的 test_class 值來比較兩個元素。

These are the results:
結果如下:

Table 10.8. Sort times for Visual C++ 7.1 / Windows XP
表 10.8. 排序時間,Visual C++ 7.1 / Windows XP

Container
容器

Time in us/iteration (small object / big object)
毫秒/次循環 (小對像/大對像)

Normalized time (small object / big object)
規格化時間 (小對像/大對像)

normal_link intrusive list

16093 / 38906

1 / 1

safe_link intrusive list

16093 / 39062

1 / 1

auto_unlink intrusive list

16093 / 38906

1 / 1

Standard list

32343 / 56406

2.0 / 1.44

Standard compact pointer list

33593 / 46093

2.08 / 1.18

Standard disperse pointer list

46875 / 68593

2.91 / 1.76


Table 10.9. Sort times for GCC 4.1.1 / MinGW over Windows XP
表 10.9. 排序時間,
GCC 4.1.1 / MinGW 於 Windows XP

Container
容器

Time in us/iteration (small object / big object)
毫秒/次循環 (小對像/大對像)

Normalized time (small object / big object)
規格化時間 (小對像/大對像)

normal_link intrusive list

15000 / 39218

1 / 1

safe_link intrusive list

15156 / 39531

1.01 / 1.01

auto_unlink intrusive list

15156 / 39531

1.01 / 1.01

Standard list

34218 / 56875

2.28 / 1.45

Standard compact pointer list

35468 / 49218

2.36 / 1.25

Standard disperse pointer list

47656 / 70156

3.17 / 1.78


Table 10.10. Sort times for GCC 4.1.2 / Linux Kernel 2.6.18 (OpenSuse 10.2)
表 10.10. 排序時間,
GCC 4.1.2 / Linux Kernel 2.6.18 (OpenSuse 10.2)

Container
容器

Time in us/iteration (small object / big object)
毫秒/次循環 (小對像/大對像)

Normalized time (small object / big object)
規格化時間 (小對像/大對像)

normal_link intrusive list

18003 / 40795

1 / 1

safe_link intrusive list

18003 / 41017

1 / 1

auto_unlink intrusive list

18230 / 40941

1.01 / 1

Standard list

26273 / 49643

1.45 / 1.21

Standard compact pointer list

28540 / 43172

1.58 / 1.05

Standard disperse pointer list

35077 / 57638

1.94 / 1.41


The results show that intrusive containers are faster than standard containers. We can see that the pointer list holding pointers to values stored in a vector is quite fast, so the extra indirection that is needed to access the value is minimized because all the values are tightly stored, improving caching. The disperse list, on the other hand, is slower because the indirection to access values stored in the object list is more expensive than accessing values stored in a vector.
結 果顯示,介入式容器快於標準容器。我們可以看到,持有保存在 vector 的值的指針的指針鏈表速度很快,訪問值時所需的額外間接性是最小的,因為所有的值被緊密保存,這提升了緩存的作用。另一方面,鬆散鏈表較慢,因為訪問保存 在 list 中的值的間接性要比訪問保存在 vector 中值耗費更多。

The next test measures the time needed to iterate through all the elements of a list, and increment the value of the internal i_ member:
下一個測試測量遍歷一個鏈表中所有元素所需的時間,並將內部的 i_ 成員的值加一:

stdlist::iterator it(l.begin()), end(l.end());
for(; it != end; ++it)
++(it->i_);

Values (test_class and itest_class) and lists are created as explained in the first section. Note that for pointer lists, the iteration will suffer an extra indirection: they will need to access the test_class values stored in another container:
值(test_classitest_class)和鏈表同樣是按第一節所說的那樣創建。注意,對於指針鏈表,遍歷操作需要多一個間接層:它們需要訪問保存在另一個容器中的 test_class 值:

stdptrlist::iterator it(l.begin()), end(l.end());
for(; it != end; ++it)
++((*it)->i_);

These are the results:
結果如下:

Table 10.11. Write access times for Visual C++ 7.1 / Windows XP
表 10.11. 寫訪問時間,Visual C++ 7.1 / Windows XP

Container
容器

Time in us/iteration (small object / big object)
毫秒/次循環 (小對像/大對像)

Normalized time (small object / big object)
規格化時間 (小對像/大對像)

normal_link intrusive list

2031 / 8125

1 / 1

safe_link intrusive list

2031 / 8281

1 / 1.01

auto_unlink intrusive list

2031 / 8281

1 / 1.01

Standard list

4218 / 10000

2.07 / 1.23

Standard compact pointer list

4062 / 8437

2.0 / 1.03

Standard disperse pointer list

8593 / 13125

4.23 / 1.61


Table 10.12. Write access times for GCC 4.1.1 / MinGW over Windows XP
表 10.12. 寫訪問時間,
GCC 4.1.1 / MinGW 於 Windows XP

Container
容器

Time in us/iteration (small object / big object)
毫秒/次循環 (小對像/大對像)

Normalized time (small object / big object)
規格化時間 (小對像/大對像)

normal_link intrusive list

2343 / 8281

1 / 1

safe_link intrusive list

2500 / 8281

1.06 / 1

auto_unlink intrusive list

2500 / 8281

1.06 / 1

Standard list

4218 / 10781

1.8 / 1.3

Standard compact pointer list

3906 / 8281

1.66 / 1

Standard disperse pointer list

8281 / 13750

3.53 / 1.66


Table 10.13. Write access times for GCC 4.1.2 / Linux Kernel 2.6.18 (OpenSuse 10.2)
表 10.13. 寫訪問時間,
GCC 4.1.2 / Linux Kernel 2.6.18 (OpenSuse 10.2)

Container
容器

Time in us/iteration (small object / big object)
毫秒/次循環 (小對像/大對像)

Normalized time (small object / big object)
規格化時間 (小對像/大對像)

normal_link intrusive list

2286 / 8468

1 / 1.1

safe_link intrusive list

2381 / 8412

1.04 / 1.09

auto_unlink intrusive list

2301 / 8437

1.01 / 1.1

Standard list

3044 / 9061

1.33 / 1.18

Standard compact pointer list

2755 / 7660

1.20 / 1

Standard disperse pointer list

6118 / 12453

2.67 / 1.62


As with the read access test, the results show that intrusive containers outperform all other containers if the values are tightly packed in a vector. The disperse list is again the slowest.
和讀訪問測試一樣,結果顯示如果元素值是緊密壓縮在一個 vector 中的話,介入式容器要比所有其它容器都好。鬆散鏈表又一次成為最慢的。

Intrusive containers can offer performance benefits that can not be achieved with equivalent non-intrusive containers. Memory locality improvements are noticeable when the objects to be inserted are small. Minimizing memory allocation/deallocation calls is also an important factor and intrusive containers make this simple if the user allocates all the objects to be inserted in intrusive containers in containers like std::vector or std::deque.
介入式容器可以提供對等的非介入式容器不能實現的性能優勢。當插入的對象較小時,內存局部性的提升是顯著的。內存分配和釋放調用的次數最小化也是一個重要的因素,如果用戶將要插入到介入式容器中的所有對象分配到一個象 std::vectorstd::deque 的容器中,那麼介入式容器就很簡單了。


PrevUpHomeNext