[Aida-compute-users] Resolved: AIDA DGX-2 Service stop tomorrow Tuesday 2020-09-01 17:00

Joel Hedlund joel.hedlund at liu.se
Wed Sep 2 12:29:32 CEST 2020


We are now back in production, and your VMs have been brought up again.

The service stop took longer than expected, but I believe we are back in better shape.

The delay was caused by the systems not fully recovering into the intended production configuration automatically. We have taken steps to ensure that this does not happen again.

Your VMs have been brought up on a new network with better security. You should not be able to notice any difference,  apart from a slight improvement in network performance (possibly?).

Since nvidia-vm does not offer suspend/resume, your VMs were instead brought down and back up again using nvidia-vm shutdown/start. This seems to have done the job just fine judging from what I've seen with my test VMs.

I believe I managed to not break anything on your end, but please let me know if I did anyway!

/Joel Hedlund
AIDA DGX-2 Service owner

On 2020-08-31 09:23, Joel Hedlund wrote:
> Hi!
> All running DGX-2 VMs will be suspended tomorrow Tuesday 2020-09-01 at 17:00.
> This will allow for an orderly shutdown of all systems, in preparation for an early morning test of the emergency power scheduled for the day after, where Linköping University Hopsital will power down all facilities including elevators and supercomputers. We will bring your VMs up as soon as possible after the test.
> The expectation is that your VMs will continue to work after suspend/resume, however we advise you to back up any precious data ahead of time (to /proj or elsewhere).
> We acknowledge that this late notice is far from ideal, and are possibly as surprised as you are.
> Cheers!
> /Joel Hedlund
> AIDA DGX-2 Service owner

More information about the Aida-compute-users mailing list