When the RPI4 was announced, I was disappointed in the lack of NVMe/PCIe connectivity. However, one thing I found quite interesting was the massively improved USB performance and introduction of USB3 ports, using a direct PCIe interface to the SOC. It was enough to make me order one as early as possible to run some tests. I haven’t seen much out there that looks at the Raspberry Pi as a “storage node”. But TomsHardware did have an interesting set of results using a USB->NVMe adapter. Most of the metrics don’t apply to my use case, but there were some synthetic IOZone numbers showing a huge increase in performance. I wanted to go a bit further, looking at latency and sequential and random performance.
Thinking about building a Ceph cluster of small single-board computers, I had thought that NVMe (m.2 pcie) storage was a must. It seems quite silly to burden a small computer with the translation from the CPU to USB, USB to SATA, and SATA to Flash by connecting a SATA SSD.
However, I’m a firm believer in good-enough is good-enough. Right now an SBC with NVMe would be something like a RockPi4. And while it looks great, it doesn’t have the community and volume shipments of the Raspberry Pi.
With what I had on hand already, heres’ the test I put together:
- A RPi3B+ in a ArgonOne case
- A RPi4B Sitting on a cardboard box (no cooling) because I don’t have the new ArgonOne Case for it yet.
- A StarTech USB3->SATA3 Adapter I had lying around
- A 240GB Intel SSD730 with many years of use in a server
- Buster-based Raspbian (latest) with FIO 3.12
In short, the performance is awesome! Here’s some graphs.
Raspberry Pi 3B+
First, the sad stuff. The Raspberry Pi 3B+ USB2.0 interface appears to limit performance quite a bit. USB 2.0 should be 480Mbps but for some reason I see only about exactly half that.
Maximum random read/write IOPS are similarly pretty depressing. I noticed the CPU was about 70% idle during these tests, and quite a lot of time on IOWait. So I guess most of the latency is in handling and processing the smaller transaction sizes for the random IO’s to the USB->SATA controller.
When looking at using a piece of hardware for storage use cases, it can be interesting to measure how latency and IO’s per second are related. By increasing the amount of workload, and measuring the maximum amount of IO’s and Latency at the same time, we can tell a few things about the limits of the system. In this case, it shows that no matter how many simultaneous operations are being sent to the storage, they all get in line and wait their turn. This tells you that there’s a bottleneck of a single queue, processing transactions headed to the storage.
Ok, enough of the sadness. Let’s see if the new Raspberry Pi 4B can be a viable fast SBC-based storage node. Note that I’ve kept the scale in the bar graphs the same, so you can visualize how much of an improvement there is.
Raspberry Pi 4B
First off, sequential throughput. Keep in mind, this is the same SSD connected via the same USB interface plugged into one of the USB3.0 Ports on the Raspberry Pi 4B. CPU Idle % was higher here in all tests, over 80%. Nearly 10x better storage throughput!
Next we have random read & write IOPS. Again we can see almost a 10x increase. This is fantastic. Keep in mind here, that the network port (1Gb) is only going to be able to transfer at most 16k 8K IO’s per second, and as a ‘storage node’ every transaction will be coming through the network to the disk.
Finally, here’s the latency vs IOPS graph. We can see that as we increase the queue depth sent to the USB->SATA bridge, we do see performance improvements up to a point. The graph above shows the best performance at the “knee” of the latency/IOPS curve. I’ll admit that these latency numbers aren’t that impressive compared to any low end PC with a PCIe M.2. device. But I believe they’re still quite usable for a storage node in a super-cheap distributed flash cluster.
Overall, I think this shows that the Raspberry Pi 4B has some good potential uses as a small storage node. When used as a Flash-based storage node, you could imagine that a small stack of these could provide a high performance (compared to HDD) feature-rich storage cluster.
Instead of flash, if you consider attaching an SATA HDD to the USB->SATA adapter, there’s no question that this platform has more than enough horsepower to occupy any modern nearline HDD. And when you consider that 4GB is now available for the RPi, which happens to be the default memory requirement for a single Ceph OSD, it should be able to cache a decent amount of IO from the disk.
Personally, I want to cobble together 3 or 4 of these raspberry pi’s in some decent (Active cooled) case, and then attach them each to the cheapest $/TB SATA SSD’s I can find. I’d love to see what a small cluster of these could do as central storage for various VM’s or containers of my other projects. As a bonus, Ceph Luminous seems to be standard and operational in the latest Raspbian. Stay tuned if I manage to pull the trigger on this project!
Quick thoughts on a parts list:
- Single power supply: https://www.amazon.com/gp/product/B0773K737F/ref=ox_sc_act_title_1?smid=ATVPDKIKX0DER&psc=1
- USB->SATA Enclosure for an SSD: https://www.amazon.com/gp/product/B00OJ3UJ2S/ref=ox_sc_act_title_2?smid=ATVPDKIKX0DER&psc=1
- Active cooled case for the RPi4: https://www.amazon.com/gp/product/B07VX3HQGJ/ref=ox_sc_act_title_3?smid=A34CQKEVNF2MJX&psc=1
- Raspberry Pi 4: https://www.amazon.com/Vilros-Raspberry-USB-C-Adapters-Quickstart/dp/B07TVVJZQT/ref=sr_1_3?crid=1V1H67RYINJXI&keywords=raspberry+pi+4+4gb&qid=1567111351&s=gateway&sprefix=raspbe%2Caps%2C192&sr=8-3
So $25 for the power plus $100 per raspberry pi node, and then just add your disks. I’ve seen plenty of 1TB/$100 Sata SSD’s as of late, so a 4-node setup could be had for $825 or so. That’d be 3TB (EC3+1) of usable distributed flash storage.