Clone
1
Distillation with Reasoning: can DeepSeek R1 Teach Better Than Humans?
lynetteheist2 edited this page 2025-02-10 06:07:57 +02:00


Inclusion of reasoning "chains of thought" (CoT) in the model output considerably enhances its quality, however it increases reasoning cost. - Distillation transfers thinking understanding from an expensive instructor design to a more economical trainee, minimizing total reasoning cost.

  1. A human professional's chain of thought.
  2. The final response.

    We expanded this dataset by adding:

    Synthetic R1 reasoning, i.e., the CoT generated by DeepSeek R1.

    Then, we fine-tuned three versions of the design (using LoRA on llama-3.1 -8 B-instruct), each with different training targets:

    Direct Answer Only: Generate the final response without showing reasoning. Human Expert CoT: Generate the last answer together with a reasoning chain resembling the human professional's. Synthetic R1 CoT: Generate the last answer along with DeepSeek R1's synthetic thinking chain. The table below sums up average accuracy and thinking length:

    - Note: The accuracy for the 5-shot standard may differ from numbers reported elsewhere due to various examination setups. The crucial focus is on comparing relative performance throughout distillation methods, yogaasanas.science not on beating other designs.

    From this research study, synthetic thinking CoTs from DeepSeek R1 appear remarkable to human-expert CoTs in boosting performance, albeit with a higher reasoning cost due to their longer length.

    Fireworks AI Inference and Fine-Tuning Platform

    DeepSeek R1 is available on the Fireworks AI platform. An easy to use distillation interface will quickly belong to FireOptimizer. If you require earlier gain access to, please contact us to check out alternatives.

    Conclusions

    By integrating reasoning-based information through distillation, organizations can drastically enhance model performance without bearing the full problem of human-annotated datasets. DeepSeek R1's capability to produce long, top quality thinking chains makes it an effective instructor model-showing that, historydb.date sometimes, the machine might simply out-teach the human.