240 发简信
IP属地:澳门
  • Resize,w 360,h 240
    LLaMA:开源的高效的基础语言模型

    LLaMA: Open and Efficient Foundation Language Models Feb 2023 Hugo Touvr...

  • 2025-06-09

    使用检查点支持容错训练 在整个RLHF训练过程中,可能会出现训练错误或机器故障,因此建议启用检查点功能以最小化损失。 API接口已在 :ref:...

  • 缩放律与模型架构:归纳偏置如何影响缩放?

    Scaling Laws vs Model Architectures: How does Inductive Bias Influence S...

  • UL2:统一语言学习范式

    UL2: Unifying Language Learning Paradigms https://arxiv.org/abs/2205.051...

  • 用0.1%的额外计算超越缩放定律

    Transcending Scaling Laws with 0.1% Extra Compute https://arxiv.org/abs/...

  • 大语言模型的涌现能力

    Emergent Abilities of Large Language Models https://arxiv.org/abs/2206.0...

  • 预训练指南:测量数据年龄、领域覆盖率、质量和毒性的影响

    A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age...

  • 自回归生成建模的缩放律

    Scaling Laws for Autoregressive Generative Modeling Oct 2020 https://arx...

  • 神经语言模型的缩放律

    Scaling Laws for Neural Language Models Jan 2020 https://arxiv.org/abs/2...