Meta says Llama 3 beats most other models, including Gemini

meta says llama 3 beats most other models, including gemini

Meta says Llama 3 beats most other models, including Gemini

The next generation of Meta’s large language model Llama, which releases today to cloud providers like AWS and to model libraries like Hugging Face soon, performs better than most current AI models, the company said in a blog post.

Llama 3 currently features two model weights, with 8B and 70B parameters. (The B is for billions and represents how complex a model is and how much of its training it understands.) It only offers text-based responses so far, but Meta says these are “a major leap” over the previous version. Llama 3 showed more diversity in answering prompts, had fewer false refusals where it declined to respond to questions, and could reason better. Meta also says Llama 3 understands more instructions and writes better code than before.

In the post, Meta claims both sizes of Llama 3 beat similarly sized models like Google’s Gemma and Gemini, Mistral 7B, and Anthropic’s Claude 3 in certain benchmarking tests. In the MMLU benchmark, which typically measures general knowledge, Llama 3 8B performed significantly better than both Gemma 7B and Mistral 7B, while Llama 3 70B slightly edged Gemini Pro 1.5.

(It is perhaps notable that Meta’s 2,700-word post does not mention GPT-4, OpenAI’s flagship model.)

It should also be noted that benchmark testing AI models, though helpful in understanding just how powerful they are, is imperfect. The datasets used to benchmark models have been found to be part of a model’s training, meaning the model already knows the answers to the questions evaluators will ask it.

meta says llama 3 beats most other models, including gemini

Benchmark testing shows both sizes of Llama 3 outperforming similarly sized language models.

Meta says human evaluators also marked Llama 3 higher than other models, including OpenAI’s GPT-3.5. Meta says it created a new dataset for human evaluators to emulate real-world scenarios where Llama 3 might be used. This dataset included use cases like asking for advice, summarization, and creative writing. The company says the team that worked on the model did not have access to this new evaluation data, and it did not influence the model’s performance.

“This evaluation set contains 1,800 prompts that cover 12 key use cases: asking for advice, brainstorming, classification, closed question answering, coding, creative writing, extraction, inhabiting a character/persona, open question answering, reasoning, rewriting, and summarization,” Meta says in its blog post.

meta says llama 3 beats most other models, including gemini

Llama 3 performed better than most models in human evaluations, says Meta.

Llama 3 is expected to get larger model sizes (which can understand longer strings of instructions and data) and be capable of more multimodal responses like, “Generate an image” or “Transcribe an audio file.” Meta says these larger versions, which are over 400B parameters and can ideally learn more complex patterns than the smaller versions of the model, are currently training, but initial performance testing shows these models can answer many of the questions posed by benchmarking.

Meta did not release a preview of these larger models, though, and did not compare them to other big models like GPT-4.

News Related

OTHER NEWS

Big market marred by poor upkeep

THOSE looking for fresh produce may find themselves spoilt for choice at the biggest wet market in Klang, but visitors to the place say the condition of the facilities and ... Read more »

Olive Grove: Phase 1 sold out, Phase 2 now open for sale

Olive Grove is the first-ever gated-and-guarded development in Bercham, Ipoh with 24-hour security. IPOH: YTL Land and Development Bhd announced that Phase 1 of Olive Grove is fully sold while ... Read more »

Cops arrest teen who pulled knife on elderly e-hailing driver

Screenshots of a video showing a teenager pointing a knife at an elderly e-hailing driver. PETALING JAYA: Police have arrested a 13-year-old boy for holding an elderly e-hailing driver at ... Read more »

Sprint Highway’s Semantan To KL Slip Road Fully Closed Until Dec 31

Sprint Highway’s Semantan To KL Slip Road Fully Closed Until Dec 31 If you’re a regular user of the Sprint expressway, you’ll need to do some planning for your trips ... Read more »

Genshin Impact Version 4.3 Leak Showcases Update to Domains

Genshin Impact Version 4.3 Leak Showcases Update to Domains New leaks reveals a quality-of-life update to Domains in Genshin Impact, making it easier for players to repeat and farm resources. ... Read more »

Urban Republic Warehouse Clearance: Get iPhone for as low as RM699 and many more

CG Computers will host the Urban Republic (UR) Warehouse Clearance from 30 November to 3 December at the Atria Shopping Gallery in Petaling Jaya. During the event, visitor can get ... Read more »

Malaysia has never experienced hyperinflation - Economy Ministry

Photo for illustrative purposes only – 123RF KUALA LUMPUR – Hyperinflation has never happened in Malaysia and the government hopes it will never happen, according to the Economy Ministry. It ... Read more »
Top List in the World