Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Main Memory Latency

Just for perspective, here's the main memory access latencies we've determined:

Kernel-level, fixed-TLB benchmark:

Processor

MHz

Architecture

Ticks/Access

NSec/access

Xeon Dual-Core 3060

2400

Core 2

185

77

  • Determined using a modified HiStar kernel, which remaps a 2MB data page as uncached and does 4-byte aligned accesses 100e6 times. The average is then taken. There is one initial TLB miss, but no other activity in the system.

User-level benchmark (includes TLB misses, except for 1GB Phenom case):

Processor

MHz

Architecture

Page Size

Ticks/Access

NSec/access

Phenom 9850 Quad-Core

2500

K10

4k

529

212

Phenom 9850 Quad-Core

2500

K10

1g

244

98

Xeon Dual-Core 3060

2400

Core 2

4k

262

109

Xeon Dual-Core 3060

2400

Core 2

4m

193

80

Core i7-920

2667

Nehalem

4k

99

37

Core i7-920

2667

Nehalem

2m

63

24

  • Determined in userspace by the following:
    int main() {
    	uint32_t *buf = getbuf();
    	const int loops = 100 * 1000 * 1000;
    	uint64_t b;
    	uint64_t blah = 0;	// don't compile away
    	int i;
    
    	b = rdtsc();
    	for (i = 0; i < loops; i++)
    		blah += random() % (maxmem / sizeof(buf[0]));
    	uint64_t random_ticks = rdtsc() - b; 
    
    	printf("%" PRIu64 " ticks for random-mod (%" PRIu64 " each)\n",
    	    random_ticks, random_ticks / loops);
    
    	b = rdtsc();
    	for (i = 0; i < loops; i++)
    		blah += buf[random() % (maxmem / sizeof(buf[0]))];
    	uint64_t access_ticks = rdtsc() - b;
    
    	printf("%" PRIu64 " total ticks (%" PRIu64 " each)\n", access_ticks,
    	    access_ticks / loops);
    	printf("%" PRIu64 " ticks not including random-mod (%" PRIu64 " each)\n",
    	    access_ticks - random_ticks, (access_ticks - random_ticks) / loops);
    
    	return blah;
    }
    
  • Where getbuf() returns a 1GB region of va (maxmem = 1 * 1024 * 1024 * 1024).
  • Note that Phenom and Nehalem have about 23MB of L1 and L2 data TLB coverage. The Xeon is likely similar, if less.
  • All chips have < 10MB cache, so > 99% of the data set is uncached.
  • No labels