This guide will walk you through the key features and steps involved in deploying your models, optimizing costs through undeployment, and removing endpoints when they are no longer needed. After creating a model endpoint, follow these steps to seamlessly manage your model endpoint:
Step 1: Accessing Model Endpoints
Step 2: Accessing Running Model Endpoint
The Model Endpoint details page provides a comprehensive overview of your deployed model, including its configuration, resource usage, and performance metrics. This guide will walk you through the key sections and how to interpret the information presented.
General Information
- Endpoint Name: The name of the endpoint through which your model is accessed for predictions.
- Status: Indicates the current state of the endpoint (e.g., Creating, InService, Updating, Failed).
- Creation Time: The timestamp when the endpoint was created.
- Endpoint URL: A unique address that allows access to a deployed model for predictions or interactions, forwarding requests to
http://localhost:internal_port
within the endpoint pod - Location: Cloud location of this endpoint
- Connection: provides instructions on how to access and interact with the Endpoint URL.
- Instance Type: The type of compute instance running your model (e.g., CPU, GPU, memory-optimized).
- Instance Count: The number of instances currently running to handle prediction requests.
Detail Information
- Model name: The model deployed to this endpoint, previously registered in the Model Registry.
- Model framework: The framework used to deploy the model. (Triton, NVIDIA NIM, vLLM, SGLang, …)
- Model source: The location where the model is stored or retrieved from for deployment.
- Model: The name of the deployed model.
Step 3: Deploy and Undeploy Model Endpoint
Deploy model endpoint
Once your model endpoint is created, it will automatically start. If it is currently stopped or failed , locate your model endpoint in the table and click on the "Deploy" button. This will initialize the model endpoint with the settings you've chosen or new resource configuration. Follow these steps below:
- Locate the model endpoint you want to deploy in the model list.
- Click the "Deploy" button and re-configure your endpoint (optional) based on your demands.
- Wait for the status to change to "Running"
Undeploy model endpoint
Undeploying an endpoint is a cost-effective way to pause predictions when they are not needed. You avoid paying for resources while keeping your model ready for redeployment when demand resumes.
If you need to pause your work or save on resources, simply select the model endpoint you wish to undeploy and click the "Undeploy" button. This will halt the instance and save its state until you deploy it again.
- Locate the model endpoint you want to undeploy in the list.
- Click the "Undeploy" button.
- Wait for the model endpoint status to change to "Undeployed."
Step 4: Delete an Endpoint
When an endpoint is no longer needed, you can delete it to free up resources. To delete an model endpoint, choose the endpoint and click the "Delete" button. A confirmation dialog will appear to ensure you do not accidentally delete the wrong endpoint. Please note that once an endpoint is deleted, it cannot be recovered.