This causes the jupyter kernel to bind to tcp port 29500 and *listen* on the loopback interface (because torch.dist.is_available() returns true) as if it is to be the MPI master, and it starts making connections to itself, as you see below.
If you interrupt or otherwise re-run the notebook from the same
Jupyter kernel, this tcp port will already be in use and the call to
_setup_dist_from_mpi()
will try for n_attempts times and then bail out.
A simple fix for this behavior is
to “Restart runtime” or “Restart and run all”
from Colab’s Runtime menu, which will free up the port so that
it the setup_dist_from_mpi() call can succeed (for the first time,
like before, again but the same condition will occur if you attempt
to re-execute the cell in the same Jupyter runtime.)