Storage Spaces Performance Woes

     

So after being quite happy with the SSD performance of Storage Spaces, it turned worse. I realized this by accident, but now it seems persistent. After reboots, SS performance randomly drops to a very specific level, i.e. it’s the same low values every time it happens. The only workaround is to keep rebooting until performance is restored.

Pretty bad. It almost feels like sometimes PCI-E fails to get 3.0 mode and falls back to 2.0. I really don’t know. I checked for an updated BIOS but there’s none. I’ve also tried to enforce PCI-E 3.0 via the BIOS but to no avail. Then I updated the LSI BIOS, firmware and drivers, still no improvement. It may also be a SS bug, but I just can’t figure this out. Has anyone faced a similar issue?

Update: I’ve dug deeper but to no avail.

I tried to find a pattern, but there’s none. It goes like this: fast, slow, slow, fast, slow, slow, slow, fast, fast, fast, slow, slow, slow, fast, slow, slow, slow, fast, slow, slow, slow, fast, slow, fast. Figure that out!

I also tried to find a change in configuration. I generated a full AIDA system report both for a slow and a fast boot and compared the results. Of course I didn’t find any meaningful difference. I also checked out the logs in the Event Viewer, pretty much all of them, but nope, nothing.

Then I thought that maybe one of the HBAs fails to initialize properly so I created two VDs corresponding to the SSDs connected to each HBAs, then benchmarked them separately. Guess what, they provided roughly half the performance of a “fast” boot, that’s actually almost the same performance as the “slow” full array. So when I get a “slow” boot and 2.5M writes with a full 24 SSD array, then I split the array and perform a test on a 12 SSD array, it’s also about 2.5M (without rebooting). That means it’s not the SSDs or the HBA that’s failing but Storage Spaces itself.

At this point I can’t think of anything else than a bug in SS. Unfortunately I don’t have 2 months or so to cut through a horde of outsourced helpdesk agents to get to an actual engineer, then another 1 to track it down. So I’ll prolly just leave it as it is and keep rebooting when the shit hits the fan…