pegasus
Project Pegasus – Flying in the Cloud with Automated AWS Deployment
Pegasus is released under Apache License v2.0 and enables anyone with an Amazon Web Services (AWS) account to quickly deploy a number of distributed technologies all from their laptop or personal computer. The installation is fairly basic and should not be used for production. The purpose of this project is to enable fast prototyping of various distributed data pipelines and also help others explore distributed technologies without the headache of installing them.
We want to continue improving this tool by adding more features and other installations, so send us your pull requests or suggestions!
Supported commands:
-
peg config
– shows the current configurations pegasus is using -
peg aws <options>
– query AWS for information about vpcs, subnets, and security groups -
peg validate <template-path>
– check if proper fields are set in the instance template yaml file -
peg up <template-path>
– launch an AWS cluster using the instance template yaml file -
peg fetch <cluster-name>
– fetch the hostnames and Public DNS of nodes in the AWS cluster and store locally -
peg describe <cluster-name>
– show the type of instances, hostnames, and Public DNS of nodes in the AWS cluster -
peg install <cluster-name> <technology>
– install a technology on the cluster -
peg service <cluster-name> <technology> <start|stop>
– start and stop a service on the cluster -
peg uninstall <cluster-name> <technology>
– uninstall a specific technology from the cluster -
peg ssh <cluster-name> <node-number>
– SSH into a specific node in your AWS cluster -
peg sshcmd-node <cluster-name> <node-number> "<cmd>"
– run a bash command on a specific node in your AWS cluster -
peg sshcmd-cluster <cluster-name> "<cmd>"
– run a bash command on every node in your AWS cluster -
peg scp <to-local|to-rem|from-local|from-rem> <cluster-name> <node-number> <local-path> <remote-path>
– copy files or folders to and from a specific node in your AWS cluster -
peg down <cluster-name>
– terminate a cluster -
peg retag <cluster-name> <new-cluster-name>
– retag an existing cluster with a different name -
peg start <cluster-name>
– start an existing cluster with on demand instances and put into running mode -
peg stop <cluster-name>
– stop and existing cluster with on demand instances and put into stop mode -
peg port-forward <cluster-name> <node-number> <local-port>:<remote-port>
– port forward your local port to the remote cluster node’s port
Table of Contents
- Install Pegasus on your local machine
- Query for AWS VPC information
- Spin up your cluster on AWS
- Fetching AWS cluster DNS and hostname information
- Describe cluster information
- Setting up a newly provisioned AWS cluster
- Start installing!
- Starting and stopping services
- Uninstalling a technology
- SSH into a node
- Terminate a cluster
- Retag a cluster
- Starting and stopping on demand clusters
- Port forwarding to a node
- Deployment Pipelines
Install Pegasus on your local machine
This will allow you to programmatically interface with your AWS account. There are two methods to install Pegasus: using a pre-baked Docker image or manually installing it into your environment.
Prerequisites
- AWS account
- VPC with DNS Resolution enabled
- Subnet in VPC
- Security group accepting all inbound and outbound traffic (recommend locking down ports depending on technologies)
- AWS Access Key ID and AWS Secret Access Key ID
Manual
Clone the Pegasus project to your local computer and install awscli
$ git clone https://github.com/InsightDataScience/pegasus.git $ pip install awscli
Next we need to add the following to your ~/.bash_profile
.
export AWS_ACCESS_KEY_ID=XXXX export AWS_SECRET_ACCESS_KEY=XXXX export AWS_DEFAULT_REGION=XX-XXXX-X export REM_USER=ubuntu export PEGASUS_HOME=<path-to-pegasus> export PATH=$PEGASUS_HOME:$PATH
Source the .bash_profile
when finished.
$ source ~/.bash_profile
It is essential you store the key information in your
~/.bash_profile
and not push it to GitHub. AWS scans github to see keys that are being stored. AWS will block your account and revoke all access if it finds it.
Docker (If Manual doesn’t work)
Add the following to your ~/.bash_profile
.
export AWS_ACCESS_KEY_ID=XXXX export AWS_SECRET_ACCESS_KEY=XXXX export AWS_DEFAULT_REGION=XX-XXXX-X
Source the .bash_profile
when finished.
$ source ~/.bash_profile
Execute the run_peg_docker.sh
script
$ ./run_peg_docker.sh <pem-key-name> <path-to-folder-with-instance-template-files>
Everytime the container is started fresh, you will need to enable the ssh-agent otherwise you will not be able to SSH into your AWS nodes
root@containerid$ eval `ssh-agent -s`
Verify installation
Once the Docker container is running or you have set up Pegasus manually, you can verify the current configurations in Pegasus with peg config
$ peg config access key: XXXX secret key: XXXX region: us-west-2 SSH User: ubuntu
You can test your AWS-CLI access by querying for the available regions for your AWS account:
$ aws ec2 --output json describe-regions --query Regions[].RegionName [ "eu-west-1", "ap-southeast-1", "ap-southeast-2", "eu-central-1", "ap-northeast-2", "ap-northeast-1", "us-east-1", "sa-east-1", "us-west-1", "us-west-2" ]
Query for AWS VPC information
Note: You can find all these information at in your AWS UI at aws.amazon.com console. Here is how pegasus can help you.
The following queries can help you quickly determine which subnet-id and security-group-id to use in your instance deployments.
VPCs
Let’s say we want to deploy our instances in the VPC named my-vpc
. We can view all VPCs in my region with peg aws vpcs
$ peg aws vpcs -------------------------------------- | DescribeVpcs | +---------------+--------------------+ | VPC_ID | VPC_NAME | +---------------+--------------------+ | vpc-add2e6c3 | default | | vpc-c2a496a1 | my-vpc |
We can see that vpc-c2a496a1
is the VPC id we would need my subnet-id and security-group-id associated with.
Subnets
To choose the specific subnet-id we will use in my deployment, we can view all Subnets in our region with peg aws subnets
$ peg aws subnets ------------------------------------------------------------------------------------------ | DescribeSubnets | +------------+-------+------------------+-------------------------------+----------------+ | AZ | IPS | SUBNET_ID | SUBNET_NAME | VPC_ID | +------------+-------+------------------+-------------------------------+----------------+ |us-west-2c | 251 | subnet-6ac0bd26 | private-subnet-west-2c | vpc-c2a496a1 | |us-west-2b | 4089 | subnet-9fe6e3df | aws-us-west-2b | vpc-add2e6c3 |
We see here that the first subnet is associated with the same VPC id we specified previously, so subnet-6ac0bd26
is the subnet-id I will need to use in my instance deployment later on.
We can also filter the Subnets down to a specific VPC name with peg aws subnets <vpc-name>
if we have too many subnets to search through
$ peg aws subnets my-vpc ------------------------------------------------------------------------------------- | DescribeSubnets | +------------+-------+-------------------+-------------------------+----------------+ | AZ | IPS | SUBNET_ID | SUBNET_NAME | VPC_ID | +------------+-------+-------------------+-------------------------+----------------+ | us-west-2c| 251 | subnet-6ac0bd26 | private-subnet-west-2c | vpc-c2a496a1 |
Security groups
The last network related information we would need for our instance deployment is the security-group-id. We can view all Security Groups in our region with peg aws security-groups
$ peg aws security-groups -------------------------------------------- | ...