https://www.llama.com/llama-downloads/
• Llama 3.3: 70B • Llama 3.2: 1B & 3B • Llama 3.2: 11B & 90B • Llama 3.1: 8B & 405B • Llama 3 • Llama Code • Llama 2 • Llama Guard 2
The models listed below are now available to you under the terms of the Llama community license agreement. By downloading a model, you are agreeing to the terms and conditions of the License, Acceptable Use Policy and Meta’s privacy policy.
Visit the Llama repository in GitHub where instructions can be found in the Llama README.
Install the Llama CLI
In your preferred environment run the command below:
pip install llama-stack
pip install llama-cli
Use -U option to update llama-stack if a previous version is already installed:
pip install llama-stack -U
pip install llama-cli
add the environment to “path” (optional, window environment)
add llama in to the env path.
restart the terminal
Find models list
If you want older versions of models, run the command below to show all the available Llama models:
Command
3 18. Select a model
Select a desired model by running:
Command
4 23. Specify custom URL
Llama 3.3: 70B
Llama 3.2: 1B & 3B
Llama 3.1: 405B & 8B
Llama 3.2: 11B & 90B
Llama 3
Llama Code
Llama 2
Llama Guard 2 32. When the script asks for your unique custom URL, please paste the URL below
URLPlease save copies of the unique custom URLs provided above, they will remain valid for 48 hours to download each model up to 5 times, and requests can be submitted multiple times. An email with the download instructions will also be sent to the email address you used to request the models.Available models The Llama 3.3 70B includes instruct weights only. Instruct weights have been fine-tuned and aligned to follow instructions. They can be used as-is in chat applications or further fine-tuned and aligned for specific use cases.With each model size, please find:
Pretrained weights: These are base weights that can be fine-tuned, domain adapted with full flexibility.
Instruct weights: These weights are for the models that have been fine-tuned and aligned to follow instructions. They can be used as-is in chat applications or futher fine tuned and aligned for specific use cases.
Trust and safety model: Our models offer a collection of specialized models tailored to specific development needs. Available models for download include: • Pretrained: • ◦ Llama-3.2-1B ◦ Llama-3.2-3B ◦ Llama-3.2-11B-Vision ◦ Llama-3.2-90B-Vision ◦ Llama-3.1-405B-MP16 ◦ Llama-3.1-405B-FP8 • Fine-tuned: • ◦ Llama-3.3-70B-Instruct ◦ Llama-3.2-1B-Instruct ◦ Llama-3.2-1B-Instruct-QLORA_INT4_EO8 ◦ Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8 ◦ Llama-3.2-3B-Instruct ◦ Llama-3.2-3B-Instruct-QLORA_INT4_EO8 ◦ Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 ◦ Llama-3.2-11B-Vision-Instruct ◦ Llama-3.2-90B-Vision-Instruct ◦ Llama-3.1-8B-Instruct ◦ Llama-3.1-405B-Instruct ◦ Llama-3.1-405B-Instruct-MP16 ◦ Llama-3.1-405B-Instruct-FP8 • Trust and safety models: • ◦ Llama-Guard-3-1B ◦ Llama-Guard-3-1B-INT4 ◦ Llama-Guard-3-11B-Vision ◦ Llama-Guard-3-8B ◦ Llama-Guard-3-8B-INT8 ◦ Llama-Guard-2-8B ◦ Llama-Guard-8B ◦ Prompt-Guard-86MNote for 405B: • We are releasing multiple versions of the 405B model to accommodate its large size and facilitate multiple deployment options: ◦ MP16 (Model Parallel 16) is the full version of BF16 weights. These weights can only be served on multiple nodes using pipelined parallel inference. At minimum it would need 2 nodes of 8 GPUs to serve. ◦ MP8 (Model Parallel 8) is also the full version of BF16 weights, but can be served on a single node with 8 GPUs by using dynamic FP8 (Floating Point 8) quantization. We are providing reference code for it. You can download these weights and experiment with different quantization techniques outside of what we are providing. ◦ FP8 (Floating Point 8) is a quantized version of the weights. These weights can be served on a single node with 8 GPUs by using the static FP quantization. We have provided reference code for it as well. • 405B model requires significant storage and computational resources, occupying approximately 750GB of disk storage space and necessitating two nodes on MP16 for inferencing.Recommended toolsCode ShieldA system-level approach to safeguard tools, Code Shield adds support for inference-time filtering of insecure code produced by LLMs. This offers mitigation of insecure code suggestions risk, code interpreter abuse prevention, and secure command execution.Now available on GithubCybersecurity EvalThe first and most comprehensive set of open source cybersecurity safety evals for LLMs. These benchmarks are based on industry guidance and standards (e.g. CWE & MITRE ATT&CK) and built in collaboration with our security subject matter experts.Now available on GithubHelpful tipsPlease read the instructions in the GitHub repo and our Llama documentation and use the provided code examples to understand how to best interact with the models. In particular, for the fine-tuned models you must use appropriate formatting and correct system/instruction tokens to get the best results from the model.You can find additional information about how to responsibly deploy Llama models in our Responsible Use Guide. Review our Documentation to start buildingOpen DocumentationIf you need to report issues If you or any Llama user becomes aware of any violation of our license or Acceptable Use Policies — or any bug or issues with Llama that could lead to any such violations - please report it through one of the following means: • Reporting issues with the model • Giving feedback about potentially problematic output generated by the model • Reporting bugs and security concerns • Reporting violations of the Acceptable Use Policy: [email protected]Subscribe to get the latest updates on Llama and Meta AI.