Instruction Dataset
Training dataset of (instruction, [input], expected output) examples covering diverse task types. Quality, variety, and number of tasks directly affect the model's generalization ability.
A curated collection of (instruction, optional input, output) triples covering diverse task types. The breadth of task clusters and template diversity are key factors in how well the tuned model generalizes to unseen tasks. Common formats include the Alpaca format and conversation-style chat templates.