Project Pegasus – Flying in the Cloud with Automated AWS Deployment

Pegasus is released under Apache License v2.0 and enables anyone with an Amazon Web Services (AWS) account to quickly deploy a number of distributed technologies all from their laptop or personal computer. The installation is fairly basic and should not be used for production. The purpose of this project is to enable fast prototyping of various distributed data pipelines and also help others explore distributed technologies without the headache of installing them.
We want to continue improving this tool by adding more features and other installations, so send us your pull requests or suggestions!
Supported commands:

peg config – shows the current configurations pegasus is using
peg aws <options> – query AWS for information about vpcs, subnets, and security groups
peg validate <template-path> – check if proper fields are set in the instance template yaml file
peg up <template-path> – launch an AWS cluster using the instance template yaml file
peg fetch <cluster-name> – fetch the hostnames and Public DNS of nodes in the AWS cluster and store locally
peg describe <cluster-name> – show the type of instances, hostnames, and Public DNS of nodes in the AWS cluster
peg install <cluster-name> <technology> – install a technology on the cluster
peg service <cluster-name> <technology> <start|stop> – start and stop a service on the cluster
peg uninstall <cluster-name> <technology> – uninstall a specific technology from the cluster
peg ssh <cluster-name> <node-number> – SSH into a specific node in your AWS cluster
peg sshcmd-node <cluster-name> <node-number> "<cmd>" – run a bash command on a specific node in your AWS cluster
peg sshcmd-cluster <cluster-name> "<cmd>" – run a bash command on every node in your AWS cluster
peg scp <to-local|to-rem|from-local|from-rem> <cluster-name> <node-number> <local-path> <remote-path> – copy files or folders to and from a specific node in your AWS cluster
peg down <cluster-name> – terminate a cluster
peg retag <cluster-name> <new-cluster-name> – retag an existing cluster with a different name
peg start <cluster-name> – start an existing cluster with on demand instances and put into running mode
peg stop <cluster-name> – stop and existing cluster with on demand instances and put into stop mode
peg port-forward <cluster-name> <node-number> <local-port>:<remote-port> – port forward your local port to the remote cluster node’s port

Install Pegasus on your local machine
Query for AWS VPC information
Spin up your cluster on AWS
Fetching AWS cluster DNS and hostname information
Describe cluster information
Setting up a newly provisioned AWS cluster
Start installing!
Starting and stopping services
Uninstalling a technology
SSH into a node
Terminate a cluster
Retag a cluster
Starting and stopping on demand clusters
Port forwarding to a node
Deployment Pipelines

Install Pegasus on your local machine

This will allow you to programmatically interface with your AWS account. There are two methods to install Pegasus: using a pre-baked Docker image or manually installing it into your environment.

Prerequisites

AWS account
VPC with DNS Resolution enabled
Subnet in VPC
Security group accepting all inbound and outbound traffic (recommend locking down ports depending on technologies)
AWS Access Key ID and AWS Secret Access Key ID

Manual

Clone the Pegasus project to your local computer and install awscli

$ git clone https://github.com/InsightDataScience/pegasus.git
$ pip install awscli

Next we need to add the following to your ~/.bash_profile.

export AWS_ACCESS_KEY_ID=XXXX
export AWS_SECRET_ACCESS_KEY=XXXX
export AWS_DEFAULT_REGION=XX-XXXX-X
export REM_USER=ubuntu
export PEGASUS_HOME=<path-to-pegasus>
export PATH=$PEGASUS_HOME:$PATH

Source the .bash_profile when finished.

$ source ~/.bash_profile

It is essential you store the key information in your ~/.bash_profile and not push it to GitHub. AWS scans github to see keys that are being stored. AWS will block your account and revoke all access if it finds it.

Docker (If Manual doesn’t work)

Add the following to your ~/.bash_profile.

export AWS_ACCESS_KEY_ID=XXXX
export AWS_SECRET_ACCESS_KEY=XXXX
export AWS_DEFAULT_REGION=XX-XXXX-X

Source the .bash_profile when finished.

$ source ~/.bash_profile

Execute the run_peg_docker.sh script

$ ./run_peg_docker.sh <pem-key-name> <path-to-folder-with-instance-template-files>

Everytime the container is started fresh, you will need to enable the ssh-agent otherwise you will not be able to SSH into your AWS nodes

root@containerid$ eval `ssh-agent -s`

Verify installation

Once the Docker container is running or you have set up Pegasus manually, you can verify the current configurations in Pegasus with peg config

$ peg config
access key: XXXX
secret key: XXXX
    region: us-west-2
  SSH User: ubuntu

You can test your AWS-CLI access by querying for the available regions for your AWS account:

$ aws ec2 --output json describe-regions --query Regions[].RegionName
[
    "eu-west-1",
    "ap-southeast-1",
    "ap-southeast-2",
    "eu-central-1",
    "ap-northeast-2",
    "ap-northeast-1",
    "us-east-1",
    "sa-east-1",
    "us-west-1",
    "us-west-2"
]

Query for AWS VPC information

Note: You can find all these information at in your AWS UI at aws.amazon.com console. Here is how pegasus can help you.

The following queries can help you quickly determine which subnet-id and security-group-id to use in your instance deployments.

VPCs

Let’s say we want to deploy our instances in the VPC named my-vpc. We can view all VPCs in my region with peg aws vpcs

$ peg aws vpcs
--------------------------------------
|            DescribeVpcs            |
+---------------+--------------------+
|    VPC_ID     |     VPC_NAME       |
+---------------+--------------------+
|  vpc-add2e6c3	|  default           |
|  vpc-c2a496a1	|  my-vpc            |

We can see that vpc-c2a496a1 is the VPC id we would need my subnet-id and security-group-id associated with.

Subnets

To choose the specific subnet-id we will use in my deployment, we can view all Subnets in our region with peg aws subnets

$ peg aws subnets
------------------------------------------------------------------------------------------
|                                     DescribeSubnets                                    |
+------------+-------+------------------+-------------------------------+----------------+
|     AZ     |  IPS  |    SUBNET_ID     |          SUBNET_NAME          |    VPC_ID      |
+------------+-------+------------------+-------------------------------+----------------+
|us-west-2c  |  251  |  subnet-6ac0bd26 |  private-subnet-west-2c       |  vpc-c2a496a1	 |	    	
|us-west-2b  |  4089 |  subnet-9fe6e3df |  aws-us-west-2b               |  vpc-add2e6c3  |

We see here that the first subnet is associated with the same VPC id we specified previously, so subnet-6ac0bd26 is the subnet-id I will need to use in my instance deployment later on.
We can also filter the Subnets down to a specific VPC name with peg aws subnets <vpc-name> if we have too many subnets to search through

$ peg aws subnets my-vpc
-------------------------------------------------------------------------------------
|                              DescribeSubnets                                      |
+------------+-------+-------------------+-------------------------+----------------+
|     AZ     |  IPS  |     SUBNET_ID     |   SUBNET_NAME           |    VPC_ID      |
+------------+-------+-------------------+-------------------------+----------------+
|  us-west-2c|  251  |  subnet-6ac0bd26  |  private-subnet-west-2c |  vpc-c2a496a1	|

Security groups

The last network related information we would need for our instance deployment is the security-group-id. We can view all Security Groups in our region with peg aws security-groups

$ peg aws security-groups
--------------------------------------------
|         ...