Downloading from S3

From Leo's Notes
Last edited on 25 August 2023, at 00:05.

AWS S3 and storage using a S3-compatible layer can be downloaded from using HTTP.

The HTTP Authorization header[edit | edit source]

Generating the Authorization header[edit | edit source]

To download from a protected S3 bucket, you will need to pass in an Authorization HTTP header. This header can be generated using both the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY along with the current date and time.

When creating the HTTP request, the Date and Authorization HTTP headers must be sent containing the following:

  • Date contains the current timestamp in a RFC2616 compliant format. Eg: date -u '+%a, %e %b %Y %H:%M:%S +0000'
  • Authorization header should contain 'AWS AWS_Access_Key_Id:Signature', where Signature is a base64 encoded string containing the RFC 2104 HMAC-SHA1 of the following items:
    METHOD\n
    MD5\n
    Date\n
    PATH
    

More on this from AWS's documentation: https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#ConstructingTheAuthenticationHeader

Scripting it with Bash[edit | edit source]

To download from a protected S3 bucket using Curl or Aira2, use the following Bash script. The Authorization header is created based on the configurable variables at the top of the script and are then passed to either Curl or Aria2. The neat thing with using Aria2 is that you can have it download files in multiple parts to speed up the file retrieval.

# Configurables
Url="s3://mybucket/example.bin"
Server="s3-backend-url.example.com"
AWS_SECRET_ACCESS_KEY=""
AWS_ACCESS_KEY_ID=""

# Build the headers
Filename=$(basename "$Url")
Path="${Url:4}"
Method="GET"
MD5Sum=""
Timestamp="$(date -u '+%a, %e %b %Y %H:%M:%S +0000')"
printf -v StringToSign "%s\n%s\n\n%s\n%s" "$Method" "$MD5Sum" "$Timestamp" "$Path"
Signature=$(echo -n "$StringToSign" | openssl sha1 -binary -hmac "${AWS_SECRET_ACCESS_KEY}" | openssl base64)
Authorization="AWS ${AWS_ACCESS_KEY_ID}:${Signature}"

# Use Curl to download
curl -f -o "$Filename" \
	 -H "Date: $Timestamp" \
	 -H "Authorization: $Authorization" \
	 "http://$Server$Path"

# Use Aria2 to download with 8x concurrency
aria2c -x8 -o "$Filename" \
	--auto-file-renaming=false --allow-overwrite=true \
	--header="Date: $Timestamp" \
	--header="Authorization: $Authorization" \
	 "http://$Server$Path"