Local Storage Limits for Notebook, Model Training, and Online Prediction

Local Storage Limits for Notebook, Model Training, and Online Prediction

To ensure optimal performance and cost-efficiency, our platform provides a certain amount of local storage included with each compute instance you create. However, exceeding this storage limit can impact your workflow and results. This guide will help you understand how local storage limits work and how to manage your storage usage effectively.

Instance Types Review

Before starting, please take a look through our available instance types here,

STT

Instance Flavor

vCPU

Memory (GB)

vRAM (GB)

Local NVMe Storage

1

g5-standard-16x250-1h100

16

250

80

3.75TB

2

g5-standard-32x500-2h100

32

500

160

7.5TB

3

g5-standard-64x1000-4h100

64

1000

320

15TB

4

g5-standard-128x2000-8h100

128

2000

640

30TB

Local Storage Limits

  • Notebook Instances: Each notebook instance comes with a fixed amount of local NVMe storage, which varies depending on the instance type you choose. This storage is intended for temporary files, code, and data used during your interactive analysis and experimentation.
  • Model Training Jobs: For model training jobs, the local NVMe storage limit is calculated per node. If your job uses multiple nodes, the total available storage is the per-node limit multiplied by the number of nodes. This storage is used for storing your training data, model checkpoints, and other intermediate files.
  • Online Prediction Endpoints: For model endpoint, the local NVMe storage limit based on the chosen instance type will representative for the total usage storage of all node in an online prediction endpoint. This storage is primarily used for caching model artifacts and handling incoming requests.

Exceeding the Limit

If you exceed the local storage limit for your instance, you may experience the following:

  • Read/Write Errors: You might encounter errors when trying to read or write data to the local storage.
  • Job Failures: Your training jobs or prediction requests could fail due to insufficient storage space.
  • Performance Degradation: Your instance's performance may slow down significantly.

Managing Local Storage

To avoid exceeding the limit and ensure smooth operation, consider these strategies:

  • Choose the Right Instance Type: Select an instance type with sufficient local storage for your expected workload. Refer to our documentation for details on the storage capacity of each instance type.
  • Optimize Data Usage:
    • Stream data from external sources (e.g., S3 buckets) instead of storing it locally whenever possible.
    • Delete temporary files and intermediate results that are no longer needed.
    • Use compression techniques to reduce the size of your data.
  • Monitor Storage Usage: Regularly check the storage usage of your instances and jobs using the monitoring tools provided by our platform.
  • Upgrade Instance: If you consistently need more local storage, consider upgrading to a larger instance type.
Important Note:

The local storage provided with each instance is ephemeral. This means that any data stored locally will be lost if the instance is stopped or terminated. For persistent storage, we recommend using cloud storage services like S3 buckets.

If you have any questions or need further assistance, please don't hesitate to contact our support team.


    • Related Articles

    • Training Mode

      Training Moe Definition Characteristics Use Cases Single training Single training refers to training a machine learning model using a single instance or node. Utilizes a single compute instance for model training. Suitable for small to medium-sized ...
    • Import a model registry with custom container

      The model registry is a centralized repository for storing trained models, their metadata, versions, and associated artifacts. It allows for version control, management, and organization of models developed during the training phase. This enables ...
    • Import a model registry with pre-built container

      Model Preparation Ensure your machine learning model is packaged and stored in a container image compatible with Triton Inference Server. Upload the container image containing your model to a storage location accessible by our AI Platform. The online ...
    • Import a Model Registry using Triton Server

      Model Preparation Since our AI Platform only accesses models from a Network Volume, you must first create a Network Volume. Pull your model from local file systems or cloud storage (AWS S3, Azure Blob, or GCS) into the Network Volume. Ensure the ...
    • Manage a notebook instance

      Notebook instances provide you with a dedicated environment to develop and experiment with your AI models. After creating a notebook instance, follow these steps to seamlessly manage your instances: Step 1: Accessing Notebook Instances Dashboard: ...