Manage a model endpoint

Manage a model endpoint

This guide will walk you through the key features and steps involved in deploying your models, optimizing costs through undeployment, and removing endpoints when they are no longer needed. After creating a model endpoint, follow these steps to seamlessly manage your model endpoint:

Step 1: Accessing Model Endpoints

  • Dashboard: From the Greennode AI Platform dashboard, locate the Model Inference section.
  • Endpoint List: You'll see a list of your existing model endpoint, including their names, creation dates, status (e.g., running, stopped), and configuration.

Step 2: Accessing Running Model Endpoint

The Model Endpoint details page provides a comprehensive overview of your deployed model, including its configuration, resource usage, and performance metrics. This guide will walk you through the key sections and how to interpret the information presented.

General Information​​​​​
  • Endpoint Name: The name of the endpoint through which your model is accessed for predictions.
  • Status: Indicates the current state of the endpoint (e.g., Creating, InService, Updating, Failed).
  • Creation Time: The timestamp when the endpoint was created.
  • Endpoint URL: A unique address that allows access to a deployed model for predictions or interactions, forwarding requests to http://localhost:internal_port within the endpoint pod
  • Location: Cloud location of this endpoint
  • Connection: provides instructions on how to access and interact with the Endpoint URL.
  • Instance Type: The type of compute instance running your model (e.g., CPU, GPU, memory-optimized).
  • Instance Count: The number of instances currently running to handle prediction requests.

Detail Information

  • Model name: The model deployed to this endpoint, previously registered in the Model Registry.
  • Model framework: The framework used to deploy the model. (Triton, NVIDIA NIM, vLLM, SGLang, …)
  • Model source: The location where the model is stored or retrieved from for deployment.
  • Model: The name of the deployed model.

Step 3: Deploy and Undeploy Model Endpoint

Deploy model endpoint
Once your model endpoint is created, it will automatically start. If it is currently stopped or failed , locate your model endpoint in the table and click on the "Deploy" button. This will initialize the model endpoint with the settings you've chosen or new resource configuration. Follow these steps below:
  • Locate the model endpoint you want to deploy in the model list.
  • Click the "Deploy" button and re-configure your endpoint (optional) based on your demands.
  • Wait for the status to change to "Running"
Undeploy model endpoint
Undeploying an endpoint is a cost-effective way to pause predictions when they are not needed. You avoid paying for resources while keeping your model ready for redeployment when demand resumes.
If you need to pause your work or save on resources,  simply select the model endpoint you wish to undeploy and click the "Undeploy" button. This will halt the instance and save its state until you deploy it again.
  1. Locate the model endpoint you want to undeploy in the list.
  2. Click the "Undeploy" button.
  3. Wait for the model endpoint status to change to "Undeployed."

Step 4: Delete an Endpoint

When an endpoint is no longer needed, you can delete it to free up resources. To delete an model endpoint, choose the endpoint and click the "Delete" button. A confirmation dialog will appear to ensure you do not accidentally delete the wrong endpoint. Please note that once an endpoint is deleted, it cannot be recovered.
Notes
Important Considerations
  • Undeploy vs. Delete:
    • Undeploy: Use this option when you want to temporarily stop predictions but retain the model and configuration for later use.
    • Delete: Use this option when you no longer need the endpoint and want to free up resources permanently.
  • Data Backup: Before deleting an endpoint, make sure you have backed up any important data or model artifacts.
  • Monitoring: Monitor your endpoint's performance metrics to ensure optimal resource utilization and cost management.

    • Related Articles

    • Manage a model tuning job

      Model tuning, also known as hyperparameter optimization, is the process of adjusting the hyperparameters of a machine learning model to improve its performance. Hyperparameters are settings that determine the learning process of a model and are not ...
    • Deploy a model endpoint with custom container

      In this section, you will learn how to deploy and serve your custom machine-learning models using a custom container in our Greennode AI Platform. Follow the steps below to configure and deploy your model for online prediction: Step 0: Import a model ...
    • Create an endpoint

      After training and registering the model, the online prediction component enables the deployment and serving of models to make real-time predictions or inferences on new data. This component provides endpoints or APIs that can be integrated into ...
    • Import a Model Registry

      The model registry is a centralized repository for storing trained models, their metadata, versions, and associated artifacts. It allows for version control, management, and organization of models developed during the training phase. This enables ...
    • Local Storage Limits for Notebook, Model Training, and Online Prediction

      To ensure optimal performance and cost-efficiency, our platform provides a certain amount of local storage included with each compute instance you create. However, exceeding this storage limit can impact your workflow and results. This guide will ...