For example, does a 13B parameter model at 2_K quantiation perform worse than a 7B parameter model at 8bit or 16bit?

  • @WanderOPA
    link
    fedilink
    English
    61 year ago

    That graph is great. Very easy to understand. Thank you!