Fork me on GitHub

Concurrent lock-free data structure

Skeleton of the Linked List

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
template <typename T>
class SList {
private:
typedef struct Node {
T data;
Node *next;
Node(T& data) : value(data), next(NULL) { }
} Node;
pthread_mutex_t _lock;
Node *head, *tail;
public:
void push_back(T& value);
void insert_after(T& previous, T& value); // insert data after previous
void remove(const T& value);
bool find(const T& value); // return true on success
SList( );
~SList( );
};

The implementation of push_back

1
2
3
4
5
6
7
8
9
10
11
void SList<T>::push_back(T& data) {
pthread_mutex_lock(&_lock);
if (head == NULL) {
head = new Node(data);
tail = head;
}else {
tail->next = new Node(data);
tail = tail->next;
}
pthread_mutex_unlock(&_lock);
}

Drawback

Let’s consider the situation of batch insertion, this implementation will do harm to our efficiency becaue of the cost on lock(which requires the contet switch)

Yet a better way to insert item to linked list

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
void SList<T>::push_back(T* data, int count) {
Node *begin = new Node(data[0]);
Node *temp = beging;
for (int i = 1; i < count; i++) {
temp->next = new Node(data[i]);
temp = temp->next;
}

pthread_mutex_lock(&_lock);
if(head == NULL) {
head = begin;
tail = head;
}else {
tail->next = begin;
tail = temp;
}
pthread_mutex_unlock(&_lock);
}

Optimization for Searching Item

Considering when some process are doing iteration task, while the others is inserting or deleting.Some we need to classify these two situations, change or not change.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
template <typename T>
class SList {
private:
typedef struct Node {
T data;
Node *next;
Node(T& data) : value(data), next(NULL) { }
} Node;
pthread_rwlock_t _rwlock; // Not pthread_mutex_t any more!
Node *head, *tail;
public:
SList( ) : head(NULL), tail(NULL) {
pthread_rwlock_init(&_rwlock, NULL);
}
~SList( ) {
pthread_rwlock_destroy(&_rwlock);
// … now cleanup nodes
}
}

Search Code

1
2
3
4
5
6
7
8
9
10
11
12
13
bool SList<T>::find(const T& value){
pthread_rwlock_rdlock (&_rwlock);
Node* temp = head;
while (temp) {
if (temp->value == data) {
status = true;
break;
}
temp = temp->next;

pthread_rwlock_unlock(&_rwlock);
return status;
}

Concurrent Insertion

To support current insertion, we need add the lock for each Node, so the new framework should be:

For the linked list, we assign a read-write lock. For each node in the linked list, we assign a mutex node.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
template <typename T>
class SList {
private:
typedef struct Node {
// … same as before
} Node;
pthread_rwlock_t _rwlock; // Not pthread_mutex_t any more!
Node *head, *tail;
public:
// … other API remain as-is
SList( ) : head(NULL), tail(NULL) {
pthread_rwlock_init(&_rwlock, NULL);
}
~SList( ) {
pthread_rwlock_destroy(&_rwlock);
// … now cleanup nodes
}
};

Add the read lock for searching, while add the mutex lock for the modified node.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
void SList<T>::insert_after(T& previous,T& value) {
pthread_rwlock_rdlock(&_rwlock);
Node* temp = head;
while (temp) {
if (temp->value == previous) {
break;
}
temp = temp->next;
}
Node* newNode = new Node(value);

pthread_mutex_lock(&temp->lock);
newNode->next = temp->next;
temp->next = newNode;
pthread_mutex_unlock(&temp->lock);

pthread_rwlock_unlock(&_rwlock);
return status;
}

Is lock necessary?

First let us introduce the CMPXCHG instruction on Intel chips, it is an atomic instruction but composed of serveral simple function. If we translate its function into C language:

1
2
3
4
5
6
7
int compare_and_swap ( int *memory_location, int expected_value, int new_value) 
{
int old_value = *memory_location;
if (old_value == expected_value)
*memory_location = new_value;
return old_value;
}

This is short for compare and swap instruction. When you reach this, you must be curious about what the hell this is doing for?The answer is to design a lock free data structure.

Design a lock-free concurrent data structure

First let’s walk through the layout of to-be-designed data structure.

1
2
3
4
5
6
7
8
9
10
11
12
13
template <typename T> 
class Stack {
typedef struct Node {
T data;
Node* next;
Node(const T& d) : data(d), next(0) { }
} Node;
Node *top;
public:
Stack( ) : top(0) { }
void push(const T& data);
T pop( ) throw (…);
};

The implementation of push should use the CAS instruction. Let us paraphrase the __sync_bool_compare_and_swap(&top, n->next, n), which means that in each time we move the n to the next node, we need to check whether n->next == top, true means no other thread change the value of top, we can let the top point to the current top(n). Otherwise, we should redo the n->next =top to update the new top.

1
2
3
4
5
6
7
8
9
void Stack<T>::push(const T& data){
Node *n = new Node(data);
while(1) {
n->next = top;
if (__sync_bool_compare_and_swap(&top, n->next, n)) { // CAS
break;
}
}
}

Here we need to pay attention that the execution of __sync_bool_compare_and_swap is atomic for CPU, this is the prerequisite of our design.

For the same reason, we implement the pop function. In case of the system crush when we do the &top since top is NULL, we need to add the non_NULL check through the code.s

1
2
3
4
5
6
7
8
9
10
11
T Stack<T>::pop( ) 
{
while (1) {
if (top == NULL)
throw std::string(“Cannot pop from empty stack”);
Node* next = top->next;
if (top && __sync_bool_compare_and_swap(&top, top, next)) { // CAS
return top->data;
}
}
}