AWS Sagemaker
LiteLLM supports Llama2 on Sagemaker
API KEYS​
!pip install boto3
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
Usage​
import os
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = completion(
model="sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b",
messages=[{ "content": "Hello, how are you?","role": "user"}],
temperature=0.2,
max_tokens=80
)
Passing credentials as parameters - Completion()​
Pass AWS credentials as parameters to litellm.completion
import os
from litellm import completion
response = completion(
model="sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b",
messages=[{ "content": "Hello, how are you?","role": "user"}],
aws_access_key_id="",
aws_secret_access_key="",
aws_region_name="",
)
Usage - Streaming​
Sagemaker currently does not support streaming - LiteLLM fakes streaming by returning chunks of the response string
import os
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = completion(
model="sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b",
messages=[{ "content": "Hello, how are you?","role": "user"}],
temperature=0.2,
max_tokens=80,
stream=True,
)
for chunk in response:
print(chunk)
AWS Sagemaker Models​
Here's an example of using a sagemaker model with LiteLLM
Model Name | Function Call | Required OS Variables |
---|---|---|
Llama2 7B | completion(model='sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b, messages=messages) | os.environ['AWS_ACCESS_KEY_ID'] , os.environ['AWS_SECRET_ACCESS_KEY'] , os.environ['AWS_REGION_NAME'] |
Custom LLM Endpoint | completion(model='sagemaker/your-endpoint, messages=messages) | os.environ['AWS_ACCESS_KEY_ID'] , os.environ['AWS_SECRET_ACCESS_KEY'] , os.environ['AWS_REGION_NAME'] |