Deploy Models for Inference

This page was generated from content adapted from the AWS Developer Guide

Asynchronous inference

Note The presence of an asynchronous inference configuration (AsyncInferenceConfig) object in the endpoint configuration implies that the endpoint can only receive asynchronous invocations.

Note SageMaker processes each input file separately. It doesn't combine mini-batches from different input files to comply with the https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTransformJob.html#SageMaker-CreateTransformJob-request-MaxPayloadInMB limit.

Note Deployment guardrails only apply to Asynchronous inference and Real-time inference endpoint types.

Note You cannot connect to 1P algorithm containers or containers of models obtained from SageMaker MarketPlace with SSM. However you can connect to deep learning containers (DLCs) provided by AWS or any custom container that you own. If you have enabled network isolation for a model container that prevents it from making outbound network calls, you cannot start an SSM session for that container. You can only access one container from one SSM session. To access another container, even if it is behind the same endpoint, start a new SSM session with the target ID of thath endpoint.

Important The CUDA Compatibility Package is not backwards compatible so it needs to be disabled if the driver version on the instance is greater than the CUDA Compatibility Package version.

Last updated 2 years ago

Was this helpful?