Notebook 3: Basic Application Development On Open-Llama

This notebook introduces the essential usage of ipex-llm, and walks you through building a very basic chat application built upon Open-Llama.

3.1 Install `ipex-llm`

If you haven’t installed ipex-llm, install it as shown below. The one-line command will install the latest ipex-llm with all the dependencies for common LLM application development.

!pip install --pre --upgrade ipex-llm[all]

Requirement already satisfied: ipex-llm[all] in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (2.2.0b20250123)
Requirement already satisfied: py-cpuinfo in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (9.0.0)
Requirement already satisfied: protobuf in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (5.29.3)
Requirement already satisfied: mpmath==1.3.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (1.3.0)
Requirement already satisfied: numpy==1.26.4 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (1.26.4)
Requirement already satisfied: transformers==4.37.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (4.37.0)
Requirement already satisfied: sentencepiece in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.2.0)
Requirement already satisfied: tokenizers==0.15.2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.15.2)
Requirement already satisfied: accelerate==0.23.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.23.0)
Requirement already satisfied: tabulate in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.9.0)
Requirement already satisfied: setuptools in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (75.8.0)
Requirement already satisfied: intel-openmp in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (2025.0.4)
Requirement already satisfied: torch==2.1.2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (2.1.2)
Requirement already satisfied: packaging>=20.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (24.2)
Requirement already satisfied: psutil in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (6.1.1)
Requirement already satisfied: pyyaml in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (6.0.2)
Requirement already satisfied: huggingface-hub in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (0.27.1)
Requirement already satisfied: filelock in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (3.17.0)
Requirement already satisfied: typing-extensions in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (4.12.2)
Requirement already satisfied: sympy in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (1.13.3)
Requirement already satisfied: networkx in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (3.2.1)
Requirement already satisfied: jinja2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (3.1.5)
Requirement already satisfied: fsspec in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (2024.12.0)
Requirement already satisfied: regex!=2019.12.17 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (2024.11.6)
Requirement already satisfied: requests in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (2.32.3)
Requirement already satisfied: safetensors>=0.3.1 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (0.5.2)
Requirement already satisfied: tqdm>=4.27 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (4.67.1)
Requirement already satisfied: intel-cmplr-lib-ur==2025.0.4 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from intel-openmp->ipex-llm[all]) (2025.0.4)
Requirement already satisfied: umf==0.9.* in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from intel-cmplr-lib-ur==2025.0.4->intel-openmp->ipex-llm[all]) (0.9.1)
Requirement already satisfied: tcmlib==1.2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from umf==0.9.*->intel-cmplr-lib-ur==2025.0.4->intel-openmp->ipex-llm[all]) (1.2.0)
Requirement already satisfied: colorama in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from tqdm>=4.27->transformers==4.37.0->ipex-llm[all]) (0.4.6)
Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from jinja2->torch==2.1.2->ipex-llm[all]) (3.0.2)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (2024.12.14)

Note

On Linux OS, we recommend to use pip install --pre --upgrade ipex-llm[all] --extra-index-url <https://download.pytorch.org/whl/cpu> to install. Please refer to https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_cpu.html#quick-installation for more details.

3.2 Load a pretrained Model

Before using a LLM, you need to first load one. Here we take a relatively small LLM, i.e. open_llama_3b_v2 as an example.

Note

open_llama_3b_v2 is an open-source large language model based on the LLaMA architecture. You can find more information about this model on its homepage hosted on Hugging Face.

3.2.1 Load and Optimize Model

In general, you just need one-line optimize_model to easily optimize any loaded PyTorch model, regardless of the library or API you are using. For more detailed usage of optimize_model, please refer to the API documentation.

Besides, many popular open-source PyTorch large language models can be loaded using the Huggingface Transformers API (such as AutoModel, AutoModelForCasualLM, etc.). For such models, ipex-llm also provides a set of APIs to support them. We will now demonstrate how to use them.

In this example, we use ipex_llm.transformers.AutoModelForCausalLM to load the open_llama_3b_v2 model. This API mirrors the official transformers.AutoModelForCasualLM with only a few additional parameters and methods related to low-bit optimization in the loading process.

To enable INT4 optimization, simply set load_in_4bit=True in from_pretrained. Additionally, we configure the parameters torch_dtype="auto" and low_cpu_mem_usage=True by default, as they may improve both performance and memory efficiency.

from ipex_llm.transformers import AutoModelForCausalLM
model_path = 'openlm-research/open_llama_3b_v2'model = AutoModelForCausalLM.from_pretrained(model_path,
                                             load_in_4bit=True)

C:\\Users\\ExpertBook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages\\transformers\\deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
C:\\Users\\ExpertBook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages\\huggingface_hub\\file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
C:\\Users\\ExpertBook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages\\torch\\_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
2025-01-26 18:23:58,853 - INFO - Converting the current model to sym_int4 format......
C:\\Users\\ExpertBook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages\\torch\\nn\\init.py:412: UserWarning: Initializing zero-element tensors is a no-op
  warnings.warn("Initializing zero-element tensors is a no-op")

Note

If you want to use precisions other than INT4(e.g. NF4/INT5/INT8,etc.), or know more details about the arguments, please refer to API document for more information.

openlm-research/open_llama_3b_v2 is the model_id of the model open_llama_3b_v2 on huggingface. When you set the model_path parameter of from_pretrained to this model_id, from_pretrained will automatically download the model from huggingface, cache it locally (e.g. ~/.cache/huggingface), and load it. It may take a long time to download the model using this API. Alternatively, you can download the model yourself, and set model_path to the local path of the downloaded model. For more information, refer to the from_pretrained document.

3.2.2 Save & Load Optimized Model

In the previous section, models loaded using the Huggingface Transformers API are typically stored with either fp32 or fp16 precision. To save model space and speedup loading processes, ipex-llm also provides the save_low_bit API for saving the model after low-bit optimization, and the load_low_bit API for loading the saved low-bit model.

Notebook 3: Basic Application Development On Open-Llama

3.1 Install ipex-llm

3.2 Load a pretrained Model

3.2.1 Load and Optimize Model

3.2.2 Save & Load Optimized Model

3.1 Install `ipex-llm`