Notebook 3: Basic Application Development On Open-Llama

This notebook introduces the essential usage of ipex-llm, and walks you through building a very basic chat application built upon Open-Llama.

3.1 Install ipex-llm

If you haven’t installed ipex-llm, install it as shown below. The one-line command will install the latest ipex-llm with all the dependencies for common LLM application development.

!pip install --pre --upgrade ipex-llm[all]
Requirement already satisfied: ipex-llm[all] in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (2.2.0b20250123)
Requirement already satisfied: py-cpuinfo in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (9.0.0)
Requirement already satisfied: protobuf in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (5.29.3)
Requirement already satisfied: mpmath==1.3.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (1.3.0)
Requirement already satisfied: numpy==1.26.4 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (1.26.4)
Requirement already satisfied: transformers==4.37.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (4.37.0)
Requirement already satisfied: sentencepiece in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.2.0)
Requirement already satisfied: tokenizers==0.15.2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.15.2)
Requirement already satisfied: accelerate==0.23.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.23.0)
Requirement already satisfied: tabulate in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.9.0)
Requirement already satisfied: setuptools in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (75.8.0)
Requirement already satisfied: intel-openmp in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (2025.0.4)
Requirement already satisfied: torch==2.1.2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (2.1.2)
Requirement already satisfied: packaging>=20.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (24.2)
Requirement already satisfied: psutil in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (6.1.1)
Requirement already satisfied: pyyaml in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (6.0.2)
Requirement already satisfied: huggingface-hub in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (0.27.1)
Requirement already satisfied: filelock in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (3.17.0)
Requirement already satisfied: typing-extensions in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (4.12.2)
Requirement already satisfied: sympy in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (1.13.3)
Requirement already satisfied: networkx in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (3.2.1)
Requirement already satisfied: jinja2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (3.1.5)
Requirement already satisfied: fsspec in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (2024.12.0)
Requirement already satisfied: regex!=2019.12.17 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (2024.11.6)
Requirement already satisfied: requests in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (2.32.3)
Requirement already satisfied: safetensors>=0.3.1 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (0.5.2)
Requirement already satisfied: tqdm>=4.27 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (4.67.1)
Requirement already satisfied: intel-cmplr-lib-ur==2025.0.4 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from intel-openmp->ipex-llm[all]) (2025.0.4)
Requirement already satisfied: umf==0.9.* in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from intel-cmplr-lib-ur==2025.0.4->intel-openmp->ipex-llm[all]) (0.9.1)
Requirement already satisfied: tcmlib==1.2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from umf==0.9.*->intel-cmplr-lib-ur==2025.0.4->intel-openmp->ipex-llm[all]) (1.2.0)
Requirement already satisfied: colorama in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from tqdm>=4.27->transformers==4.37.0->ipex-llm[all]) (0.4.6)
Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from jinja2->torch==2.1.2->ipex-llm[all]) (3.0.2)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (2024.12.14)

Note

3.2 Load a pretrained Model

Before using a LLM, you need to first load one. Here we take a relatively small LLM, i.e. open_llama_3b_v2 as an example.

Note

3.2.1 Load and Optimize Model

In general, you just need one-line optimize_model to easily optimize any loaded PyTorch model, regardless of the library or API you are using. For more detailed usage of optimize_model, please refer to the API documentation.

Besides, many popular open-source PyTorch large language models can be loaded using the Huggingface Transformers API (such as AutoModel, AutoModelForCasualLM, etc.). For such models, ipex-llm also provides a set of APIs to support them. We will now demonstrate how to use them.

In this example, we use ipex_llm.transformers.AutoModelForCausalLM to load the open_llama_3b_v2 model. This API mirrors the official transformers.AutoModelForCasualLM with only a few additional parameters and methods related to low-bit optimization in the loading process.

To enable INT4 optimization, simply set load_in_4bit=True in from_pretrained. Additionally, we configure the parameters torch_dtype="auto" and low_cpu_mem_usage=True by default, as they may improve both performance and memory efficiency.

from ipex_llm.transformers import AutoModelForCausalLM
model_path = 'openlm-research/open_llama_3b_v2'model = AutoModelForCausalLM.from_pretrained(model_path,
                                             load_in_4bit=True)
C:\\Users\\ExpertBook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages\\transformers\\deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
C:\\Users\\ExpertBook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages\\huggingface_hub\\file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
C:\\Users\\ExpertBook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages\\torch\\_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
2025-01-26 18:23:58,853 - INFO - Converting the current model to sym_int4 format......
C:\\Users\\ExpertBook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages\\torch\\nn\\init.py:412: UserWarning: Initializing zero-element tensors is a no-op
  warnings.warn("Initializing zero-element tensors is a no-op")

Note

3.2.2 Save & Load Optimized Model

In the previous section, models loaded using the Huggingface Transformers API are typically stored with either fp32 or fp16 precision. To save model space and speedup loading processes, ipex-llm also provides the save_low_bit API for saving the model after low-bit optimization, and the load_low_bit API for loading the saved low-bit model.