How to perfrom a test with 10000 devices?

The long expected release of the new official product brand - Dwarfguard 0.8.0 has just entered performance testing period. The plan is to thoroughly test that it can handle 40 000 devices.

Well, maybe to disappoint a few of you, we do not have 40 000 devices handy, not even speaking about hoarding 40 000 industrial cellular routers on a heap somewhere. So how come we are able to test this?

Every device has an agent installed on the device. This agent communicates with the Dwarfguard server periodically, exchanging information. To test the server, we need to perform this communication 40 000 times per the device interval, obviously with 40 000 different datasets. While part of the data constituting the device state (like CPU usage, IP traffic statistics, temperature etc.) can be static as the processing on the server consumes the same resources regardless if the transferred number is 10 or 50, part of the data is actually unique per each device. Examples are Device ID, Device Token or Machine ID.

While not really as straightforward as saving a dataset once and sending it over 40 000 times per the given time period, it is a task that is doable. We actually went one step further, making our Emulator able to do more - exchange some of the data on the fly, be able to run in paralell threads, compute throughput, summarize results etc.

But in one of the more demanding Benchmark Test during perftest of MAMAS 0.7.0 we hit the cellar of the tool. So we needed one more step - but more on that in a short while.

For the 0.x versions, we do perform two types of performance testing:

  1. Stability testing
    • To proove the server on a given HW sizing (e.g. a VM container with 1 GiB RAM)  can safely run a number of devices over a time period. We also measure the resource consumption showing a load on the container/machine.
  2. Benchmark testing
    • To measure a throughput per second for the defined set of containers/machines. This comes handy when you need to determine what can be handled under your custom setting. E.g. when you shorten the data send time period from 260 seconds to 130 seconds, the number of connections per second grows two times in average. Re-computing the required throughput and comparing to the benchmark results gives you a clue what size of container/machine is needed to handle your use case.

While stability testing for MAMAS 0.7.0 went fine, the benchmark testing hit the limit on the traffic generator running our Emulator. While the HW (read: CPU) was strong enough to support the 8-SSL-threads, the 16-SSL-threads benchmark generated too high load for our liking and the bench-32-threads was actually showing a decline of throughput due to the too-high-load on the performance generator machine.

You can actually see that indication in the performance testing protocol. The comparison for C4 test container (8 CPU threads, 4 GiB RAM) throughput:

  • bench-4-SSL: 189.71 pushes/second
  • bench-8-SSL: 288.66 pushes/second
  • bench-16-SSL: 297.30 pushes/second
  • bench-32-SSL: 294.05 pushes/second

So, how did we improve for testing Dwarfguard 0.8.0 ?

Well, we went parallell!

Now, our test scripts use multiple traffic generation machines running Emulator SW so we hope to bring you some real throughput numbers for the higher-demanding testes.

Also, for all of you piting our Emulator being so poor that 16 busy threads cannot saturate the only 2 working threads on the server, don't despair anymore! Hopefully we find the ceiling of the lonely 2 worker threads on server, making us to test the throughput in the use case server admin boosts the processing threads of the Dwarfguard server to 4! Or even 8! Yeah, kind of bragging with our product performance being so good it can handle 30 000 devices in 2 threads... but you know what? Can't help...