First thing’s first: We actually broke down the Llama-2 paper in the video above. In it, we turn seventy-eight pages of reading into fewer than fifteen minutes of watching. You can also check out this article that we published the day Llama-2 came out.

The link above should satisfy your appetite for under-the-hood knowledge of these shiny, new generative AI models. This article, therefore, acts as a fun supplement to the video. First, we’re going to briefly go over some key points from the Llama-2 paper for some context. Then, we’re going to share some fun facts about Llama that had to be cut from the video due to time/production constraints. Feel free to share these bite-sized fun facts at your next cocktail party or during that small window of time that you’re waiting for everyone to enter the Zoom meeting. You will be the coolest kid in the office.

Llama-2, the TL;DR

Alright, the video above goes over the architecture of Llama 2, a comparison of Llama-2 and Llama-1, and finally a comparison of Llama-2 against other non-Meta AI models. Let’s go over these subjects one-by-one.

Architecture

Llama-2 isn’t a single model, but rather a collection of four models. The only difference between each of these models is the number of parameters they contain. From smallest to largest, the Llama-2 models contain 7B, 13B, 34B, and 70B parameters. But otherwise, everything else about them—from their activation function to their normalization method—is identical.