Execution with 4 cores is slower than with 2 cores

Dmitriy Vyukov
Total Points: 10363
Status Points: 9863
Brown Belt
August 28, 2008 7:09 PM PDT
Rate
 
#2
svetlana_m:

for(i=nStart; i<nEnd; i++)

{

k=A[i].key-1;

next = new(stack_elem);

next->number=A[i].number;

next->prev=bstacks[k];

bstacks[k]=next;

}




Also watch out for false-sharing.
In this example false-sharing can occur on variables bstacks1, bstacks2.
If you have something like this:
stack_elem* bstacks1;
stack_elem* bstacks2;

Than separate variables this way:
size_t const arch_cache_line_size = 64;
stack_elem* bstacks1;
char pad [arch_cache_line_size];
stack_elem* bstacks2;

Or better, allocate bstacks1/bstacks2 variables directly on stacks of the threads.
This can have great impact on performance/scalability!

Forum Statistics

4474 users have contributed to 24002 threads and 69880 posts to date.
In the past 24 hours, we have 41 new thread(s) 169 new posts(s), and 216 new user(s).

In the past 3 days, the most popular thread for everyone has been Catastrophic error The most posts were made to Using the Partner Program Website The post with the most views is You can report them here if

Please welcome our newest member Udaysimha Mysore (Intel)