boost::pool_allocator는 생각만큼 빠르지 않다?

프로그래밍/c++

boost::pool_allocator는 생각만큼 빠르지 않다?

제페 2016. 1. 23. 08:09

간단한 코드를 보자!

하나는 boost::pool_allocator로 부터 shared_ptr 할당을, 하나는 기본 new 연산자로 shared_ptr 할당을 한 것이다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
#include <future>
#include <chrono>
#include <iostream>
#include <boost/pool/pool_alloc.hpp>
#include <boost/chrono/stopwatches/stopwatch.hpp>
 
 
 
using namespace std;
 
struct obj
{
  int a;
  int b;
  int c;
};
 
using pool_allocator_type = boost::pool_allocator<obj>;
size_t test_loop_count = 100000;
size_t test_func_count = 1000;
pool_allocator_type alloc;
 
void make_shared_test()
{
  std::shared_ptr<obj> pobj;
 
  for (size_t i = 0; i != test_loop_count; ++i)
  {
    pobj = std::make_shared<obj>();
  }
}
 
void allocate_shared_test()
{
  std::shared_ptr<obj> pobj;
 
  for (size_t i = 0; i != test_loop_count; ++i)
  {
    pobj = std::allocate_shared<obj>(alloc);
  }
}
 
int main()
{
 
  auto shared_test_result = std::async([]() 
  {
    boost::chrono::stopwatch<> watch;
    watch.restart();
    for (size_t i = 0; i != test_func_count; ++i)
    {
      make_shared_test();
    }
 
    auto count = watch.elapsed().count();
 
    cout << "shared_test_result: " << count << endl;
  });
 
  auto allocate_shared_result = std::async([]() 
  {
    boost::chrono::stopwatch<> watch;
    watch.restart();
    for (size_t i = 0; i != test_func_count; ++i)
    {
      allocate_shared_test();
    }
 
    auto count = watch.elapsed().count();
 
    cout << "allocate_shared_result: " << count << endl;
  });
 
  shared_test_result.wait();
  allocate_shared_result.wait();
 
  std::system("pause");
 
  return 0;
}
Colored by Color Scripter
cs

당연히 pool에서 할당한게 더 빠르지 않아? 라고 생각할 수도 있지만..

결과

차이가 더럽게 큰 부분에서 pool_allocator가 얼마나 느린지는 충분히 파악 가능하다.

pool을 사용하면서 이렇게 느리다니? 이유가 뭘까

바로 여러 스레드에서 pool_allocator에 접근할 때를 대비해 내부적으로 동기화를 하기 때문이었다.

동기화에 발생되는 비용 때문에 저런 큰 차이가 벌어진다는 것.

동기화 때문에 벌어진 차이이므로, 이 문제를 해결하기 위해선 사용하는 mutex를 null_mutex로 바꾸면 된다.

1
using pool_allocator_type = boost::pool_allocator<obj, boost::default_user_allocator_new_delete, boost::details::pool::null_mutex>;

자, null_mutex로 바꿨으니 다시 퍼포먼스를 확인해보자.

결과

pool을 사용해 allocate_shared를 하는 것이 약 2배 가까이 빨라진 것을 확인할 수 있다.