CodeCommit: Mono Repo, Multiple Pipelines – part I: repackaging the repo

As an experiment, I have a CodeCommit repository that has a combination of CloudFormation Templates, and some static web content, checked in to two separate prefixes or folders: /Templates/ and /Website/.

What I am trying to do is, upon any commit to the repo, determine if the Website prefix needs an update, or the Templates have to trigger CFN Stack Update.

Starting with the most basic piece, I want the web content to go via CodePipeline, and unpack into an S3 Bucket, against which there is a CloudFront distribution pointing (with an Origin Access Identity already in place).

By default, an S3 unpack expects the entire repo to unpack into S3, but I want to only have a particular sub folder, so I’ve implemented a “repackage” step as a Lambda function in the pipeline, which grabs the Original Artifact the pipeline has, unpacks it, and then create as new Artifact containing just the folder /Website/ and below. Turned out to be around 50 lines of code in Python:

import json
import boto3
import os
import zipfile

def lambda_handler(event, context):
    if (event["CodePipeline.job"]["data"]["inputArtifacts"][0]["location"]["type"] != "S3"):
        return { 'statusCode': 500, 'body': json.dumps('Not on S3') }
    dl_filename = event["CodePipeline.job"]["data"]["inputArtifacts"][0]["location"]["s3Location"]["objectKey"].split('/')[-1]

    s3client = boto3.client('s3',
      aws_access_key_id=event["CodePipeline.job"]["data"]["artifactCredentials"]["accessKeyId"], 
      aws_secret_access_key=event["CodePipeline.job"]["data"]["artifactCredentials"]["secretAccessKey"], 
      aws_session_token=event["CodePipeline.job"]["data"]["artifactCredentials"]["sessionToken"]
      )
    with open("/tmp/" + dl_filename, 'wb') as data:
        s3client.download_fileobj(
            event["CodePipeline.job"]["data"]["inputArtifacts"][0]["location"]["s3Location"]["bucketName"], 
            event["CodePipeline.job"]["data"]["inputArtifacts"][0]["location"]["s3Location"]["objectKey"], 
            data)
    with zipfile.ZipFile("/tmp/" + dl_filename, 'r') as zip:
        zip.extractall('/tmp/')
        zip.close()
    ul_filename = event["CodePipeline.job"]["data"]["outputArtifacts"][0]["location"]["s3Location"]["objectKey"].split('/')[-1]
    zipf = zipfile.ZipFile("/tmp/" + ul_filename, 'w', zipfile.ZIP_DEFLATED)
    os.chdir('/tmp/Website/')
    for root, dirs, files in os.walk('.'):
        for file in files:
            zipf.write(os.path.join(root, file))
    zipf.close()
    #WARNING: CodePipeline artifacts may ave a default BucketPolicy requiring an explict KMS key. Remove that SSE requirement, turn on dfault encryption for the bucket.
    s3response=s3client.upload_file(
        "/tmp/" + ul_filename,
        event["CodePipeline.job"]["data"]["outputArtifacts"][0]["location"]["s3Location"]["bucketName"],
        event["CodePipeline.job"]["data"]["outputArtifacts"][0]["location"]["s3Location"]["objectKey"],
        ExtraArgs={"ServerSideEncryption": "AES256"}
        )
    client_cp = boto3.client('codepipeline')
    response_cp = client_cp.put_job_success_result(jobId=event["CodePipeline.job"][ "id"])
    return {
        'statusCode': 200,
        'body': json.dumps('Done repacking.')
    }

This runs reasonably quickly, and means I am not unpacking the entire CodeCommit repo into my CloudFront distribution.