Deploying custom models on AWS Sagemaker using FastAPI

Sii
BlogerSii
Deploying custom models on AWS Sagemaker using FastAPI

Although AWS SageMaker is a great platform that organizes and greatly simplifies all data science-related activities, the first experience of deploying custom machine learning models can be cumbersome. Fortunately, we’ve been down this road many times, and we’ll guide you through your first deployment, avoiding all the potential pitfalls.

AWS SageMaker is a powerful tool for developing and deploying machine learning models. It makes it possible to scale models, develop them with a variety of frameworks, and integrate them with other AWS services such as model monitoring. However, its versatility and scalability come at a price — it requires extensive knowledge of the platform making it hard for beginners to get started. Things get even more complex when you try to deploy custom models, especially those developed outside the platform.

The article is here to help you with the above. We are going to show how you can easily deploy custom ML models using Docker, FastAPI, and AWS Sagemaker in 9 simple steps:

Train the model wherever it suits you.
Dockerize the inference code.
Push the Docker image to Amazon Elastic Container Registry (ECR).
Save the model artifacts on S3 bucket.
Create an Amazon SageMaker model that refers to the Docker image in ECR.
Create an Amazon SageMaker endpoint configuration that specifies the model and the resources to be used for inference using the AWS SDK (boto3).
Create an endpoint configuration.
Deploy your model.
Use Boto3 to test your inference.

Train the model locally or in a cloud-based environment

For the sake of simplicity, we are going to use the pre-trained HuggingFace model, a fine-tuned BERT model that is ready to use for Named Entity Recognition. It recognizes four types of entities: Locations (LOC), Organisations (ORG), Persons (PER), and Miscellaneous (MISC). In reality, this could be any model you’d like to deploy.

Dockerize the inference code

First of all, you need to create a specific directory and file structure (see below) within your working directory.

Directory opt/ will be your root. The main.py should contain the inference code, that reads the model from a specific path: ml/model and exposes REST API for performing predictions — we are using FastAPI which is a high-performance web framework for developing APIs with Python. Let’s take a look at the script, especially the invocations and ping entrypoints that are required by AWS SageMaker.

# main.py
from fastapi import FastAPI, Request
from transformers import pipeline

MODELS_PATH = "ml/model/"

app = FastAPI()


@app.get('/ping')
async def ping():
    return {"message": "ok"}

@app.on_event('startup')
def load_model():
    classifier = pipeline("ner", model=MODElS_PATHA)
    logging.info("Model loaded.")
    return classifier

@app.post('/invocations')
def invocations(request: Request):
    json_payload = await request.json()

    inputs = [records["scope"] for records in json_payload]
    output = [{"prediction": classifier(input)} for input in inputs ]
    return output

ping() is used by AWS SageMaker to verify your model works.
load_model() is an event handler and runs on service ‘startup’. It means that this function will be executed once and only the application starts up. It’s a great place to load the model into memory and this is what we do.
invocations() is a REST POST entrypoint and here you should put your inference code. It means all what is required to perform the predictions by your model. It uses the await keyword to wait for other asynchronous functions to complete before continuing with their execution. In this case, when an asynchronous function is called, it doesn’t block the execution until it returns a result.

Then we have Dockerfile. It executes several command-line instructions step by step and creates a container based on the indicated image and requirements.

# Dockerfile
FROM python:3.8-slim

# copy the code and requirements
COPY main.py /opt/
COPY requirements.txt /opt/
COPY run_server.sh /opt/serve

WORKDIR /opt

# install required python stuff
RUN pip install -r requirements.txt

# setup the executable path
ENV PATH="/opt/:${PATH}"

Requirements.txt indicates what packages are required and to be installed while creating the docker image.

# requirements.txt
fastapi==0.89.1
uvicorn==0.20.0
transformers==4.25.1

Finally, there is run_server.sh. It starts the machine learning service hosting your model and inference code. The host parameter specifies the IP address the server will listen to, and the port parameter specifies the port number. Don’t modify them, as they come as required by SageMaker.

# run_server.sh
uvicorn main:app --proxy-headers --host 0.0.0.0 --port 8080

Push the Docker image to Amazon Elastic Container Registry (ECR)

At first create a private repository in Amazon Elastic Container Registry where your image will be stored. Please follow the guideline to do so.

Make sure that you have the latest version of the AWS CLI and Docker installed.

Amazon ECR - your repository name — Fig. 2 Amazon ECR – your repository name

Go to the created repository and click on the View push commands button to view the steps to push an image to your new repository. Make sure you are in the location of the .Dockerfile and run the commands one by one.

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin XXXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com
docker build -t your_repository_name .
docker tag your_repository_name:latest XXXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/your_repository_name:latest
docker push XXXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/your_repository_name:latest

If all goes well, you should be able to see your ECR image on AWS.

Save model artifacts on S3 bucket

Models must be packaged as compressed tar files (*.tar.gz) and saved on the S3 bucket. Let’s say your model’s directory looks like this:

To compress the model file make sure that you are in ml/ location and run the following commands:

cd opt/ml/
tar -czvf model.tar.gz model

Upload it to your desired S3 location.

Create an Amazon SageMaker model resource that refers to the Docker image in ECR

There are several options to deploy a model using SageMaker hosting services. You can programmatically deploy a model using an AWS SDK (for example, the SDK for Python (Boto3)), the SageMaker Python SDK, and the AWS CLI, or you can interactively create a model with the SageMaker console. This article presents the first of these options- SDK for Python (Boto3). You can run the following commands locally.

Set the values for the execution role, image URI, model URL, and model name. Call sagemaker.create_model() to create the model resource.

# create_aws_model.py
import boto3

# Create a SageMaker session
sagemaker = boto3.Session().client('sagemaker')
iam = boto3.client('iam')

ROLE = iam.get_role(RoleName='AWS_ROLE_NAME')['Role']['Arn']
IMAGE = "XXXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/your_repository_name:latest"
MODEL_URL = "s3://your-s-bucket-name/model.tar.gz"
MODEL_NAME = "model-name"


# Create a model
sagemaker.create_model(
     ModelName=MODEL_NAME,
     ExecutionRoleArn=ROLE,
     PrimaryContainer={
     'Image': IMAGE,
     'ModelDataUrl': MODEL_URL
     }
)

Create an Amazon SageMaker endpoint configuration

The next step is to create an Amazon SageMaker endpoint configuration that specifies the model and the resources to be used for inference.

# Create an endpoint configuration
endpoint_config_name = 'endpoint-config-name'
sagemaker.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[{
        'InstanceType': 'ml.t2.medium',
        'InitialInstanceCount': 1,
        'ModelName': MODEL_NAME,
        'VariantName': 'Variant-1'
    }]
)

Deploy your model

Finally, you are ready to deploy your model — that is, create an entry point that hosts your model. Use the endpoint configuration specified above.

# Create an endpoint
endpoint_name = 'endpoint-name'
sagemaker.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name
)

Use Boto3 to test your inference

Now for the final step. To make sure everything went well, create a test input and run it through the model. If you get an error, the AWS CloudWatch logs are a good place to debug.

import json
import boto3
import sagemaker
from sagemaker import get_execution_role

role = get_execution_role()
sagemaker = boto3.client("runtime.sagemaker")

test_input = [
    {
        "document_id": "1",
        "scope": "Ron is Harry's best friend",
    },
    {
        "document_id": "2",
        "scope": "Hermione was the best in her class",
    },
]



payload = json.dumps(test_input)

response = sagemaker.invoke_endpoint(EndpointName=endpoint_name,
                                         ContentType='application/json',
                                         Body=payload)

result = json.loads(response["Body"].read().decode())
print(result)

>> [{'prediction': [{'entity': 'I-PER', 'score': 0.9971058, 'index': 1, 'word': 'Ron', 'start': 0, 'end': 3}, {'entity': 'I-PER', 'score': 0.9923815, 'index': 3, 'word': 'Harry', 'start': 7, 'end': 12}]}, {'prediction': [{'entity': 'I-PER', 'score': 0.99259263, 'index': 1, 'word': 'Her', 'start': 0, 'end': 3}, {'entity': 'I-PER', 'score': 0.9645591, 'index': 2, 'word': '##mio', 'start': 3, 'end': 6}, {'entity': 'I-PER', 'score': 0.9782252, 'index': 3, 'word': '##ne', 'start': 6, 'end': 8}]}]

Summary

We have provided you with 9 simple steps on how to deploy custom models on AWS Sagemaker using FastAPI. This allows for greater flexibility and makes implementation a piece of cake.

Make sure that after deployment your model is operating correctly and efficiently. To quickly identify and address any issues in the production system, you might want to use model monitoring tools that are integrated into the AWS ecosystem. AWS Sagemaker provides CloudWatch Logs or CloudWatch Metrics to trigger alarms when certain thresholds are exceeded. By regularly updating your models and incorporating new data, you can improve their accuracy and ensure that they continue to provide value over time.

***
You can find the code here.

4.7/5 ( votes: 9)

Rating:

4.7/5 ( votes: 9)

Author

Katarzyna Cepińska

She graduated in Neuroinformatics at the Faculty of Physics of UW and in Computer Science at the Polish-Japanese Academy of Information Technology. She is a Senior Data Scientist with over 6 years of experience in providing AI solutions for various industry needs. She has worked on ML/NLP/CV projects in banking, insurance, and construction areas. She enjoys both coding and research, as well as making a positive impact on others through her work. Outside of work, she enjoys spending time with her family and her dog.

All articles written by the author

Cancel reply

sreag says:
16 June 2023 at 10:41
Hi. Thanks for the article. Wondering if it’s possible to use SocketIO similarly.
Reply

You might also like

Hard development 21 March 2024

Performance comparison of the Qt Signals and Slots mechanism and the native C++ implementation

Hard development 23 October 2023

The importance of traceability from the perspective of designing a medical device...

Don't miss out

Subscribe to our blog and receive information about the latest posts.

Get an offer

If you have any questions or would like to learn more about our offer, feel free to contact us.

Send your request Send your request

Natalia Competency Center Director

Join Sii

Find the job that's right for you. Check out open positions and apply.

Apply Apply

Paweł Process Owner

Who we are

What we offer

Industries

Career

Trainings

News & events

Blog

Deploying custom models on AWS Sagemaker using FastAPI

Train the model locally or in a cloud-based environment

Dockerize the inference code

Push the Docker image to Amazon Elastic Container Registry (ECR)

Save model artifacts on S3 bucket

Create an Amazon SageMaker model resource that refers to the Docker image in ECR

Create an Amazon SageMaker endpoint configuration

Deploy your model

Use Boto3 to test your inference

Summary

Sii's news and events

Most popular tags

We're hiring!

Take part in our trainings

Leave a comment

Cancel reply

You might also like

Migration to go_router — dev’s story

Price Tracker – a practical dive into serverless applications

Using ChatGPT for creating Spring framework applications (no hand programming)

Server Side Events – implementation and highlights

Harmony in asynchrony leveraging Awaitility in BDD tests with Cucumber

What’s new in Angular 17?

Accessibility in Compose

Communication over LoRa – M2M case using Ra-01

Performance comparison of the Qt Signals and Slots mechanism and the native C++ implementation

The importance of traceability from the perspective of designing a medical device...

Sii's news and events

Most popular tags

We're hiring!

Take part in our trainings

Get an offer

Join Sii

Who we are

What we offer

Industries

Career

Trainings

News & events

Blog

Deploying custom models on AWS Sagemaker using FastAPI

Train the model locally or in a cloud-based environment

Dockerize the inference code

Push the Docker image to Amazon Elastic Container Registry (ECR)

Save model artifacts on S3 bucket

Create an Amazon SageMaker model resource that refers to the Docker image in ECR

Create an Amazon SageMaker endpoint configuration

Deploy your model

Use Boto3 to test your inference

Summary

Sii's news and events

Most popular tags

We're hiring!

Take part in our trainings

Leave a comment

Migration to go_router — dev’s story

Price Tracker – a practical dive into serverless applications

Using ChatGPT for creating Spring framework applications (no hand programming)

Server Side Events – implementation and highlights

Harmony in asynchrony leveraging Awaitility in BDD tests with Cucumber

What’s new in Angular 17?

Accessibility in Compose

Communication over LoRa – M2M case using Ra-01

Performance comparison of the Qt Signals and Slots mechanism and the native C++ implementation

The importance of traceability from the perspective of designing a medical device...

Sii's news and events

Most popular tags

We're hiring!

Take part in our trainings

File upload error

No file was uploaded

Incorrect file size (max 5MB)

Invalid file format

Empty file

Processing...

Get an offer

Join Sii

Ta treść jest dostępna tylko w jednej wersji językowej. Nastąpi przekierowanie do strony głównej.

Ta treść jest dostępna tylko w jednej wersji językowej.
Nastąpi przekierowanie do strony głównej.