Your laptop spends all day lying to you. If you want your code to run well on server hardware, best write it on server hardware.
This post highlights some differences between laptop and server hardware, outlining why I swapped to use a multi-socket Xeon machine as my normal work station.
Laptops often have faster CPU clocks than servers
My laptops’ i7 CPU has a base frequency of 2.8GHz, turbo boosted (more on that below) up to 3.9GHz. Meanwhile, the Xeons I get on Google Cloud have a base rate of 2.3GHz. What gives?
The thought is this: consumer workloads are serving one person doing few things. Server workloads are serving many users.
Playing a video game at home, you want high clock frequency for the main game loop.
Hanging out in the old data center, rendering a web pages for a thousand people simultaneously? A different story. It often makes sense to give up clock frequency to “fit” more cores in the same heat / power space when buying server hardware.
There are plenty of exceptions. As you move into higher-CPU count consumer machines, clock frequencies generally drop. And, the Xeon range is wide. You can get CPUs about as fast as you like.
But in general:
Your laptop may tell you a single threaded workload outperforms a parallel workload when the opposite is true.
and
Your laptop may tell you your performance bottleneck is not CPU when, in production, it is.
Laptops have uniform memory, servers do not
Most consumer machines have a single socket with RAM DIMMs located around it. Accessing any part of RAM has, roughly, uniform latency.
Servers usually have many CPU sockets. Each RAM chip has a “home” socket, a specific socket it is connected to. Accessing RAM that is “local” to the socket on is fast. Accessing RAM that is on another socket is slow.
Memory access is Non-Uniform, or NUMA.
By default, mmap will give you memory on the DIMMs connected to the socket you made the syscall on. See set_mempolicy(2).
Your laptop will tell you a program that allocates memory in one thread and uses it in another is fast
Laptops have really fast disks
A good laptop today comes with a big honkin’ NVMe SSD plugged right into its little brain. NVMe is screaming fast, exactly what you want for working with large files like photos, video or playing games.
Server have, on average, different needs. A Rails app that takes a network request and makes a database call may barely touch disk. A file server often cache files in buffer pools, serving them from RAM rather than disk.
The default disk you get on GCP, regional persistent disks, will top out around 240MiB/s. That is one order of magnitude slower than the NVMe disk on your laptop.
Your laptop will make you think your logging framework isn’t your primary bottleneck
Finally, a note on Turbo Boost
Xeon CPUs have Turbo Boost as well, so this isn’t strictly a difference between consumer and server hardware. But it is a difference in consumer and server workloads.
On a server with parallel load, Turbo Boost may not kick in. It’s designed to speed up low-concurrency workloads, like your game rendering loop. If all the cores on the server are working, it will already be running nice and hot, no turbo boost for you.
Hence, your single-threaded benchmark will get that sweet 5GHz warp speed and in production? Enjoy the 2.4GHz park stroll.
Your laptop test may see twice the clock speed that you’ll see in the real world, thanks to Turbo Boost
Making the swap
If your main development environment is terminal-based, a popular option is to simply run a shell on a cloud VM.
For me, I picked up a dual-Xeon ThinkStation P720, which I now use as my main development machine. Every day I learn a bit more about the hardware I’m writing software for.
Compilation speeds are through the roof.
My office is notably warmer.