Imagine the scenario: You have a sturdy, long-running VP2510 chassis providing quality phone service for thousands of subscriber lines. Then one day you get an alarm about a hardware failure – one of your processor cards (CPU cards) has failed.
Of course this is annoying, but no big deal because
- the Metaswitch chassis is redundant, and so the other processor is happily providing service without any interruption
- you have a spare card on the shelf, just in case this happened. Good job – you can pat yourself on the back.
But wait a minute, I have an important question: where did you get that spare card from?
The answer to this question can make the difference between a smooth restoration of service and a messy outage.
Wait… what? Why?
SP4000, SP4100 and SP4200: a CPU history
Before we get to that, let me be very specific about the situation, to avoid unnecessary concern.
In this article we’re discussing the CPU / Processor cards that are used in the VP3510, MG3510, VP2510 and MG2510 Metaswitch chassis. If you’re using ATCA (CH6010, CH6050) you can skip the rest of this article and move on with your life.
Now there were three generations of CPU cards on these chassis.
SP4000 card with RSP4000 RTM: With this model the hard disk is on the front card (the SP4000). The last date Metaswitch provided any kind of support on these cards was 1 September 2016, and the last software release that supported these cards was V8.3.11.
SP4100 card with RSP4000 RTM: This is a direct replacement of the SP4000, again with the hard disk on the front card, and the last date of support (and last date for hardware replacement) will be 30 November 2019. The last software release that supports these cards is V9.3.20.
SP4200 card with RSP4200 RTM: The last generation of these cards moved the hard disk to the rear card (hence the new RSP4200 RTM), which meant that anyone upgrading from an SP4100 to an SP4200 also had to switch out their RTMs. These cards will continue to be supported until the chassis itself reaches end-of-support on 30 November 2023.
It’s worth understanding the different variations, due to the different support dates, and the fact that you can’t directly replace an SP4100 with an SP4200 without also replacing the RTM. That’s not a problem, but important to understand if you’re relying on shelf spares.
A spare card is not the same as a production card
So let’s get back to our scenario – why does the origin of the card potentially spell danger?
It all comes down to how the card was prepared or “bootstrapped” in Metaswitch terminology.
If you have a genuine spare card, that was originally designated as a spare card on the day you bought it, then it will be configured explicitly as a spare – so when you first insert that card into a chassis, it will be ready to go through the “recovery bootstrap” process to restore full redundancy to the chassis.
But what if the SP4200 card you intended to use as a replacement was previously used in a different chassis. Maybe you bought it used from another local telco, or maybe you took it out of an old switch you have since retired?
Let’s take a moment to imagine the life of this poor, unloved processor card. Let’s call him Pete the Processor.
- Once upon a time, Pete was active and important – he had an identity, an IP address, a bunch of subscriber configuration and translations – and he was busy doing his job processing calls, helping people to communicate and generally fulfilling his life’s purpose.
- Then one day, the rough hands of a CO tech pull Pete unexpectedly out of his home – his VP3510 chassis – and all of that stops. He goes to sleep.
- Pete is placed in a box on a shelf, and forgotten.
- Years pass.
- Finally, one auspicious day, Pete is needed. A Processor card has failed in a production chassis, and so Pete is removed from his box ready for action.
- Pete is inserted into a new chassis that’s already processing calls.
- As soon as Pete powers up he remembers everything he was doing before – he remembers his IP address, he remembers his configuration, he remembers the calls he was trying to process…
- And so he tries to pick up where he left off – with that same IP, same configuration, same partnered CPU – and then things get weird.
- Because there’s another CPU in this chassis (let’s call her Cecilia CPU) – with a different IP address and different configuration, possibly even different software – and Cecilia has entirely different plans about what this VP3510 should be doing right now.
- What happens next?
To be honest, I have no idea – there are no rules for what happens next in this scenario. Perhaps Pete and Cecilia communicate across the backplane – trying to figure out which of them is in charge. Perhaps they both try to program media for calls on the PB cards. Perhaps one of them tries to copy its configuration from the other (just like after a regular CPU reboot).
This is the double-headed monster scenario. Two CPUs, both trying to take the lead, both trying to control the system – and it’s a monstrous mess.
The best case is that Pete doesn’t work at all, and Cecilia keeps on running the chassis. The worst case is some kind of nasty outage and possibly even some data corruption.
Bottom line: don’t do this. Use a real spare.
What do I need to remember from this incredibly confusing anthropomorphic fable?
You didn’t like my story, eh? In that case I’ll keep the conclusions very straight-forward.
- You can only use designated spare CPU cards as replacements. You cannot use CPUs you retrieved from other chassis. If you have other SP 4×00 cards and you’d like to get them converted into spares try talking to your Metaswitch account manager about it.
- SP4200s are different from the earlier breeds of CPUs. If you want to replace a failed SP4100 with an SP4200 then that’s possible, but you need to replace the RTM as well (so make sure you have a spare of both the SP4200 and the RSP4200).
- All of these cards are either EOL or soon-to-be EOL. So make sure you know which version of these cards you’re using, when they reach end-of-hardware support, and what your plan is if you have a card failure after that.
If you’re still running this generation of Metaswitch hardware and you’re considering an upgrade, make sure you check out my hardware guide and upgrade decision tree.
If you’re running a VP3510 or VP2510 and have no plans to upgrade then I’m interested to hear from you. I sometimes hear from telcos who have old hardware they no longer need, so if you’re short on spares I may be able to help you out.