1. Did you forget to force your “unreachable” state to transition to an initial (reset) state? Clock glitches, power surges, radiation, high EM etc will occasionally cause your system to jump to a state that is not defined. When this happens, your design should reset itself, rather than crash or generatel illegal outputs.
2. Do you have internal registers that you cannot access or test? If you can set a register you must have some way of reading the register from outside the chip. In many cases inaccessible or stale registers can cause unexplained system behavior that cannot be debugged. Only full system reset can recover the system to a sane state.
3. Is there any chip in the system that controls your chip? It could be possible that this other chip is buggy. All of your external control lines should be able to be disabled or controlled, so that you can isolate the source of the problem.
4. Not enough decoupling capacitors on your board? The analog world is cruel and very unusual. Voltage spikes, current surges, crosstalk, etc can all corrupt the integrity of digital signals. Trying to save a few cents on decoupling capacitors can cause headaches and significant financial costs in the future.
5. Did you only test your system in the lab, not in the real world? As a product, systems will need to be run for months in the field to encounter all known and unknown issues. Simulation and simple lab testing won’t catch all of the weirdness of the real world. This will be the limit of real world stress test.
6. Did you not adequately test the corner cases and boundary conditions? Every corner case is as important as the main case. Even if some weird event happens only once every six months, if you do not handle it correctly, the bug can still make your system unreliable, unusable and of-course unsellable.