I'm currently evaluating Victron Quattro 48/15000/200/100-100. That decodes to 48vdc nominal, 15kva inverter, 200a charger, dual 100a AC feed-through. I believe it has the exact same programmability as all of the Quattros and perhaps the Multipluses as well. Which is to say, less every day, because they seem to be removing features from the software year over year.
I am not wedded to this device, but I like the general modularity and flexibility of the Victron line.
I think I read this discussion too, somewhere. I'll have to go look it up and see if there's more to learn from his experience. I don't find it hard to imagine that a cell might do something like that at some point, but I'm also expecting that it will happen gradually and that I'll be able to observe the side effects of it with careful monitoring. It's possible that I'm wrong on one or both of those expectations.
Like you, I'm less concerned about catastrophic shorts. Were one to occur, though, the parallel-then-serial topology makes that pretty exothermic without cell level fusing.
We just don't hear about many of these, so I have to conclude that they're uncommon and/or benign. I'm inclined not to chase this small risk too hard; there are plenty of other things I could do to improve my expected lifespan that would be easier.
And have you seen a complete failure? Could you also comment on the general type of work you've needed to do and at what frequency?
I'm surprised this is so challenging. Maybe I'm being too blase about this SOC thing, but if I'm not planning on running from 0 to 100 (or anywhere close to either of those boundaries), ever, then I feel like I have a ton of room for slippage, even though on top of that we observe very little in real packs where I have seen data. Heck, I'll barely even get these cells out of their linear regime, it seems like.
I am not wedded to this device, but I like the general modularity and flexibility of the Victron line.
The problem I see with this is that the problem cell in five years time might be a good cell at the moment. I know of one case from a knowledgeable and credible poster on the Australian Energy Matters forum of a CALB cell loosing around 25% of capacity after about three years of use. It was his BMS that picked this up. I have not heard of any cases of cells developing internal short circuits without being abused beforehand.
Like you, I'm less concerned about catastrophic shorts. Were one to occur, though, the parallel-then-serial topology makes that pretty exothermic without cell level fusing.
We just don't hear about many of these, so I have to conclude that they're uncommon and/or benign. I'm inclined not to chase this small risk too hard; there are plenty of other things I could do to improve my expected lifespan that would be easier.

With the two systems I am responsible for which are organised with cells in parallel and then is series I have found that I only have to turn the system power off for a few minutes if I need to do any work on the battery. Even if you have the complete failure of one cell in the battery it would still work but with a reduced capacity.
The main reason I don't like bottom balancing for off-grid systems that are in constant use is that the only way you know if the battery has stayed in balance is to empty it. This might be fine for a EV sitting in a garage for the majority of its life but is not very practical for systems in use all the time. If you top balance you know every time you charge the battery if it has stayed in balance. By charging the battery to nearly 100% you can also easily and accurately reset your SOC meter. I have found my coulomb(current) counting SOC meter is the most useful way of measuring the state of my battery and allows me to plan my energy usage.
Comment