<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-3988506956146082063</id><updated>2011-04-21T22:12:36.120+02:00</updated><title type='text'>Momchil Velikov</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://chill-yo.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3988506956146082063/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://chill-yo.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Momchil Velikov</name><uri>http://www.blogger.com/profile/08422473430047499566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>1</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-3988506956146082063.post-531921338151913515</id><published>2007-02-20T11:25:00.000+02:00</published><updated>2007-02-20T12:31:24.022+02:00</updated><title type='text'>Quantifying cache bounces</title><content type='html'>Not many people seem to realize just how expensive is memory sharing and cache bounce in parallel systems.  The following code is intended as a simple demonstration/benchmark of the bounce costs.&lt;br /&gt;&lt;br /&gt; The main thread spawns NTHREADS worker threads.  Each worker thread proceeds to independently increment a distinct memory location, NLOOPS times.  In  the first variant, the memory locations are in different cache lines.  In the second variant (obtained by compiling with -DBOUNCE), the memory locations are adjiacent and in the same cache line.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;#include &amp;lt;stdio.h&amp;gt;&lt;br /&gt;#include &amp;lt;pthread.h&amp;gt;&lt;br /&gt;&lt;br /&gt;#define NTHREADS 4&lt;br /&gt;#define NLOOPS 200000000&lt;br /&gt;&lt;br /&gt;static void *&lt;br /&gt;worker (void *_cnt)&lt;br /&gt;{&lt;br /&gt;  int i;&lt;br /&gt;  volatile int *cnt = _cnt;&lt;br /&gt;&lt;br /&gt;  *cnt = 0;&lt;br /&gt;&lt;br /&gt;  for (i = 0; i &lt; NLOOPS; ++i)&lt;br /&gt;    (*cnt)++;&lt;br /&gt;&lt;br /&gt;  return 0;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;static int cnt [NTHREADS][128];&lt;br /&gt;&lt;br /&gt;int&lt;br /&gt;main ()&lt;br /&gt;{&lt;br /&gt;  int i, sum;&lt;br /&gt;  pthread_t t [NTHREADS];&lt;br /&gt;&lt;br /&gt;  for (i = 0; i &lt; NTHREADS; ++i)&lt;br /&gt;    {&lt;br /&gt;#ifdef BOUNCE&lt;br /&gt;        pthread_create (&amp;t [i], 0, worker, &amp;cnt [0][i]);&lt;br /&gt;#else&lt;br /&gt;        pthread_create (&amp;t [i], 0, worker, &amp;cnt [i][0]);&lt;br /&gt;#endif&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;  sum = 0;&lt;br /&gt;  for (i = 0; i &lt; NTHREADS; ++i)&lt;br /&gt;    {&lt;br /&gt;      pthread_join (t [i], 0);&lt;br /&gt;#ifdef BOUNCE&lt;br /&gt;      sum += cnt [0][i];&lt;br /&gt;#else&lt;br /&gt;      sum += cnt [i][0];&lt;br /&gt;#endif&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;  printf ("%d\n", sum);&lt;br /&gt;  return 0;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I ran the benchmark on GNU/Linux, kernel 2.6.15, 4x (2x dual-core) Opteron 2.2Mhz (Sun Fire X4200) and obtained the following results:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;$ gcc -O3 cache-test.c -lpthread&lt;br /&gt;$ time ./a.out&lt;br /&gt;800000000&lt;br /&gt;&lt;br /&gt;real    0m0.648s&lt;br /&gt;user    0m2.396s&lt;br /&gt;sys     0m0.000s&lt;br /&gt;$ time ./a.out&lt;br /&gt;800000000&lt;br /&gt;&lt;br /&gt;real    0m0.615s&lt;br /&gt;user    0m2.384s&lt;br /&gt;sys     0m0.004s&lt;br /&gt;$ time ./a.out&lt;br /&gt;800000000&lt;br /&gt;&lt;br /&gt;real    0m0.622s&lt;br /&gt;user    0m2.436s&lt;br /&gt;sys     0m0.004s&lt;br /&gt;$ gcc -O3 -DBOUNCE cache-test.c -lpthread&lt;br /&gt;$ time ./a.out&lt;br /&gt;800000000&lt;br /&gt;&lt;br /&gt;real    0m9.991s&lt;br /&gt;user    0m29.374s&lt;br /&gt;sys     0m0.008s&lt;br /&gt;$ time ./a.out&lt;br /&gt;800000000&lt;br /&gt;&lt;br /&gt;real    0m9.957s&lt;br /&gt;user    0m29.310s&lt;br /&gt;sys     0m0.016s&lt;br /&gt;$ time ./a.out&lt;br /&gt;800000000&lt;br /&gt;&lt;br /&gt;real    0m9.974s&lt;br /&gt;user    0m29.330s&lt;br /&gt;sys     0m0.012s&lt;br /&gt;$&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Taking the minimum of each measurement gives 9.957/0.615 == 16.9, in other words the cache bounces cause close to &lt;span style="font-weight: bold;"&gt;1700% slowdown&lt;/span&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3988506956146082063-531921338151913515?l=chill-yo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://chill-yo.blogspot.com/feeds/531921338151913515/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3988506956146082063&amp;postID=531921338151913515' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3988506956146082063/posts/default/531921338151913515'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3988506956146082063/posts/default/531921338151913515'/><link rel='alternate' type='text/html' href='http://chill-yo.blogspot.com/2007/02/quantifying-cache-bounces.html' title='Quantifying cache bounces'/><author><name>Momchil Velikov</name><uri>http://www.blogger.com/profile/08422473430047499566</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry></feed>
