lunes, 8 de septiembre de 2014

AWS: Recovering keypairs (Linux)

As you know, keypairs are used to connect to AWS instances. During launch process, you select keypair associated to each one. All keypairs have two parts: private key (the PEM file you download from AWS console when it's created) and the public key. Public key is configured inside authorized_keys file associated to login username.

If authorized_keys file is modified, there are ownership/permissions issues associated to this file or .ssh directory where authorized_keys is stored, keypair will be refused and you will get a Permission denied (publickey) error message:

When this occurs, main problem is you won't be able to login to your instance. To resolve this issue an easy way could be:
  • Stop the instance
  • Create an image from faulty instance
  • Launch a replacement instance using this new AMI. During launch process, make sure to select a known keypair (or create a new one)
An alternative procedure to recover unhealthy instances could be use a third (healthy) instance and repair keypair. In our example we have an instance named WIMKP instance with a keypair named testkey associated:

Unfortunately, something happened and we're not able to login using testkey.pem file. To repair it, we'll need to follow next steps:
  • Launch a work instance. We'll use this instance to perform all required operations. When finished, you can terminate this instance. It'll be only required during procedure. Make sure you launch work instance in the same availability zone where unhealthy instance is hosted.

  • From AWS web console, EC2 service, Instances section, stop unhealthy instance:

  • Go to Volumes section and search for root volume associated to unhealthy instance. Set an appropriated name to easily recognize it (in my example, I established ROOT WIMKP). Also, don't forget to copy device name associated to root volume of unhealthy instance (in my example: /dev/xvda). We'll need this information later:

  • Right click over root volume of unhealthy instance and select Detach Volume. Wait until volume becomes available:

  • Right click over root volume of unhealthy instance and select Attach Volume. Select work instance and attach volume as a secondary volume for this instance (by default, it'll be attached as /dev/sdf device). Wait until attached:

  • Copy keypair file inside work instance and login to work instance:

  • As you can see in previous screenshot, review dmesg output to know details about how root volume of unhealthy instance has been recognized. In my example, device was named internally as /dev/xvdf1. If you obtain unknown partition table message, this means secondary volume is identified as /dev/xvdf. Please, take this under consideration to adapt next mount command according to your scenario:
  1. sudo mkdir /disk
  2. sudo mount /dev/xvdf1 /disk
  • Now, inside /disk directory root volume of unhealthy instance is mounted. So, we can review content and repair, if required. Because ec2-user is the username required to connect to WIMKP instance, I'll check files and directories associated. Feel free to adapt next check commands according to your needs. For example, Ubuntu instances use ubuntu as default login username. So, with Ubuntu instances, you'll need to review home directory associated to ubuntu username instead of ec2-user:

  • By default:
  1. home directory should be owned by root with 755 permissions
  2. ec2-user home directory should be owned by ec2-user with 700 permissions
  3. .ssh directory inside ec2-user home directory should be owned by ec2-user with 700 permissions
  4. authorized_keys file inside .ssh directory should be owned by ec2-user with 600 permissions

  • If ownership or permissions are not correct, repair them (in my example, .ssh and authorized_keys ownership are incorrect):

  • The commands (again, make sure to understand the concept and adapt according to your specific scenario):
  1. sudo chmod 755 /disk/home
  2. sudo chmod 700 /disk/home/ec2-user
  3. sudo chmod 700 /disk/home/ec2-user/.ssh
  4. sudo chmod 600 /disk/home/ec2-user/.ssh/authorized_keys

  • Finally, don't forget ownership. To know correct UID and GID numbers associated to login username, inspect passwd file with next command (don't forget to replace ec2-user with your login username):
  1. sudo cat /disk/etc/passwd | grep ^ec2-user:
  • In my example, 500:500 is UID:GID associated to ec2-user. So, I need to run next command to repair ownership:
  1. sudo chown -R 500:500 /disk/home/ec2-user

To verify keypair is correct, inspect authorized_keys file to be sure public and private key are related. To check it, just run next commands (don't forget to replace testkey.pem with filename associated to your private keypair and ec2-user with your login username):

  1. chmod 600 testkey.pem
  2. ssh-keygen -y -f testkey.pem
  3. sudo cat /disk/home/ec2-user/.ssh/authorized_keys
If keypair is correct you should obtain the same string in 2. and 3. previous steps. Example of correct output:

If not, you need to replace keypair. To do it, follow next steps:
  1. ssh-keygen -y -f testkey.pem | sudo tee /disk/home/ec2-user/.ssh/authorized_keys
  2. sudo chmod 600 /disk/home/ec2-user/.ssh/authorized_keys
  3. sudo chown -R 500:500 /disk/home/ec2-user
From previous commands make sure (as always) to replace testkey.pem with your keypair file, ec2-user with login username and 500:500 with UID:GID associated to your login username. Example:

Done. Now we can umount /disk and mount root volume associated to faulty instance:
  1. sudo umount /disk
  • In AWS web console, EC2 service, Volumes section, detach root volume of faulty instance from work instance. Wait until becomes available.

  • Attach root volume of faulty instance to faulty instance. Don't forget to put device name you copied previously (in my example: /dev/xvda) to attach root volume as root volume. Wait until attached.

Finally, in AWS EC2 web console, Intances section, select faulty instance, right click over it and select Start. Wait until started. If everything was correctly done, you should be able to login now using your existing keypair.

Bonus track

If you want existing users can perform sudo commands without password, login to your instance and add next line to /etc/sudoers file replacing username with the username you want to grant sudo permissions:


Last, next shell script named could be useful if you need to create new users establishing different keypairs, repair ownership/permissions or reset existing keypairs. Just copy the shell script inside your instance and use it. The script is designed to be run by an username with root permissions (or an existing username enabled to perform sudo commands as root username). Feel free to use it!

NOTE: Previous procedure won't work with Marketplace based instances. This kind of instances have signed devices and because of this you won't be able to perform attach/detach actions. If you need to recover information from faulty Marketplace instances, contact with AWS Support team.