To create a supervised tuning job, you'll need to provide the following information:
Steps to Create a Supervised Tuning Job
- Access the Tuning Job Creation Interface: Use the provider's platform through the url: .
- Fill in the Input Parameters: Provide the required information for each parameter.
- Review and Submit: Carefully review your input parameters and submit the job.
- Monitor the Job: Track the progress of your tuning job through the platform's interface.
- Tuning Job Name: A unique name for your tuning job.
- Location: The geographic location where the tuning job will run (e.g., thailand bangkok).
- Tuning Type: Specify "supervised_tuning" for this type of job.
- Base Model: Select a pre-trained base model from the provider's library (e.g., GPT-3, BERT).
- Learning Rate Multiplier: Adjust the learning rate of the optimizer.
- Adapter Size: Set the size of the adapter layers added to the base model.
- Instance Type: Choose the type of computing instances to use for training (e.g.,
n1-standard-4). - Instance Count: Specify the number of instances to use for parallel training.
- Network Volume: Select the network volume where your training and validation datasets are stored.
- Input Tuning Dataset: Provide the path to the input training data within the network volume.
- Validate Input Data: Optionally upload a sample dataset to verify its format.
- Validation Tuning Dataset: Specify the path to the validation dataset within the network volume.
- Output Tuned Model: Provide the path where the tuned model will be saved.
- Number of Epochs: Set the number of times the model will iterate over the entire training dataset.
- Batch Size: Specify the number of training examples processed in each batch.
- Suffix: Add a suffix to the output model filename.
- Cutoff Tokens: Set the maximum number of tokens in the input sequence.
- Gradient Accumulation Step: Accumulate gradients over multiple steps before updating the model.
- Number of Saved Checkpoints: Specify the number of checkpoints to save during training.
- Saved Interval Steps: Set the interval (in steps) at which to save checkpoints.
- Validation Interval Steps: Set the interval (in steps) at which to evaluate the model on the validation set.
- Weights & Biases (WandB) API Key: Provide your WandB API key to log and visualize the training process.
- WandB Project: Specify the WandB project name for organizing your experiments.
Steps to Create a RLHF Tuning Job
To create an RLHF tuning job, you'll need to provide the following information, in addition to the parameters for supervised tuning:
- Access the Tuning Job Creation Interface: Use the provider's platform through the url.
- Fill in the Input Parameters: Provide the required information for each parameter, including the additional RLHF-specific parameters. Additional Parameters for RLHF Tuning:
- Reward Learning Rate Multiplier: Adjust the learning rate for the reward model.
- Reward Training Dataset: Specify the path to the reward training dataset within the network volume.
- Reward Batch Size: Specify the batch size for reward model training.
- Reward Gradient Accumulation Step: Accumulate gradients over multiple steps before updating the reward model.
- Review and Submit: Carefully review your input parameters and submit the job.
- Monitor the Job: Track the progress of your tuning job through the platform's interface.