GreenNode Batch Inference allows you to submit groups of requests that are processed asynchronously on GreenNode’s high-performance infrastructure. It’s designed for workloads that do not require immediate responses, providing up to 50% lower costs, higher throughput, and flexible completion windows compared to real-time APIs.
Batch requests are 50% cheaper than regular inference calls and have model-agnostic pricing, meaning costs remain the same regardless of the model variant used. In addition, batch inference does not consume tokens from your per-model rate limits. All jobs are processed asynchronously, with most completing within 24 hours under normal system load.
This approach is ideal for large-scale or deferred workloads, such as:
- Model evaluation and large-scale data analysis
- Dataset classification and tagging
- Offline summarization and report generation
- Synthetic data generation for fine-tuning or augmentation
- Content generation for marketing and automation pipelines
- Batch dataset transformation and preprocessing
Prepare a Batch File
To start using GreenNode Batch Inference, you need to prepare a JSON Lines (JSONL) file — where each line represents a single model request. Each request will be processed asynchronously through the /v1/chat/completions endpoint on GreenNode’s infrastructure.
Let’s use batch-requests.jsonl as an example. Each line in the file should follow the format below:
- {"custom_id": "request-1", "body": {"model": "GreenNode/GreenMind-Medium-14B-R1", "messages": [{"role": "user", "content": "Hello, world!"}], "max_tokens": 200}}
- {"custom_id": "request-2", "body": {"model": "GreenNode/GreenMind-Medium-14B-R1", "messages": [{"role": "user", "content": "Explain quantum computing"}], "max_tokens": 200}}
Field Descriptions
- custom_id — Unique identifier for tracking and matching each request with its inference results.
- body.model — The model ID used for inference. All requests in the same batch file should use the same model.
- body.messages — The chat input messages, following the same format as real-time chat completions.
- body.max_tokens — (Optional) Specifies the maximum number of tokens to generate for each completion.
Note: The Batch API currently supports only the /v1/chat/completions endpoint.
File Constraints
- Up to 50,000 requests per batch file
- Up to 100 MB total file size
- Each line must be a valid JSON object in JSONL format
Upload and Manage Batch Jobs
Upload the Batch File
Once your batch file is ready, you can upload it and manage your batch inference jobs directly through the GreenNode API
To upload your prepared JSONL file (batch-requests.jsonl) to GreenNode’s storage:
- Go to Swagger → Files
- Use the Upload File endpoint to upload your batch file. After uploading, you’ll receive a file ID, which will be used when creating a batch.
- You can also use Swagger → File → Get File Result to:
- Retrieve metadata about your uploaded file
- Check file processing or upload status
Create and Manage Batch Jobs
Once your file is uploaded, you can create, cancel, or delete batch jobs through
Swagger → Batch:
- Create Batch: Use the uploaded file ID to create a new asynchronous batch job. The system will process all requests within 24 hours under normal conditions.
- Delete Batch: Permanently remove a batch and its associated metadata after processing.
Related Articles
GreenNode AI Platform Release Note 2024
This central hub provides comprehensive information about the latest updates, new features, enhancements, and bug fixes introduced in each release of the GreenNode AI Platform in 2024. Our goal is to keep you informed and empowered to make the most ...
Team Role
Team roles in GreenNode are designed to support secure and efficient collaboration by assigning clear permissions and access levels to each team member. Roles define what actions a user can take and which parts of the Team Space they can access — ...
Automated Scheduling for Model Endpoint
The Automated Scheduling feature allows users to define specific times for automatically starting or stopping Model Endpoint. This helps optimize cloud usage, reduce unnecessary costs, and ensure that compute resources are only active when needed. ...
Create a quantization job
To create a quantization job, you'll need to provide the following information: Steps to Create a Quantization Tuning Job Access the Quantization Job Creation Interface: Use the provider's platform through the url: ...
Create an endpoint
After training and registering the model, the online prediction component enables the deployment and serving of models to make real-time predictions or inferences on new data. This component provides endpoints or APIs that can be integrated into ...