Skip to content

How to handle SSH keys for workloads on the data mover nodes?

We have the following use case: We want to transfer data to / from another HPC centre to which we can initiate SSH connections from the NESH frontends and from the nodes belonging to the data queue. For long-running transfers via rsync , we'd like to be able to resume connections if, e.g., the other side hangs up. We'd also like to be able to run parallel transfers with rsync which first transfer the directory structure in one call to rsync and then parallelise over the then independent transfers of files and links. A password based authentication is not feasible, because the password would need to be entered every time a new connection is established.

Neglecting all security, we could get away with just storing private SSH keys without pass phrases. But this is not acceptable. So our current solution is starting a SSH agent on the host that also handles the transfer, and letting the job running rsync discover this agent (which is possible by finding the socket which is located somewhere in /tmp or ${TMPDIR}. (I'll add an example job script in the next comment.)

Does anyone see an elegant way of making it easy to interactively populate an SSH agent in a job that runs detached from the user otherwise?

(cc: @szrzs212 @szrzs282 @smomw091 @smomw235)