Deploy Models for Inference

This page was generated from content adapted from the AWS Developer Guide

Asynchronous inference

  • Note The presence of an asynchronous inference configuration (AsyncInferenceConfig) object in the endpoint configuration implies that the endpoint can only receive asynchronous invocations.

Batch Transform

Deployment guardrails

Container SSM access

  • Note You cannot connect to 1P algorithm containers or containers of models obtained from SageMaker MarketPlace with SSM. However you can connect to deep learning containers (DLCs) provided by AWS or any custom container that you own. If you have enabled network isolation for a model container that prevents it from making outbound network calls, you cannot start an SSM session for that container. You can only access one container from one SSM session. To access another container, even if it is behind the same endpoint, start a new SSM session with the target ID of thath endpoint.

Best practices to minimize interruptions during GPU driver upgrades

  • Important The CUDA Compatibility Package is not backwards compatible so it needs to be disabled if the driver version on the instance is greater than the CUDA Compatibility Package version.