Notebook 5.1: ChatBot

You can use IPEX-LLM to load any Hugging Face transformers model for acceleration on your laptop. With IPEX-LLM, PyTorch models (in FP16/BF16/FP32) hosted on Hugging Face can be loaded and optimized automatically with low-bit quantizations (supported precisions include INT4/NF4/INT5/INT8).

This notebook will dive into the detailed usage of IPEX-LLM transformers-style API. In Sections 5.1.2, you will learn how to load a transformers model for different situations. Section 5.1.3 will walk you through the process of building a chatbot using loaded model. You’ll start from a simple form, and then add capabilities step by step, e.g. history management (for multi-turn chat) and streaming.

5.1.1 Install IPEX-LLM

First of all, install IPEX-LLM in your prepared environment. For best practices of environment setup, refer to Chapter 2 in this tutorial.

!pip install --pre --upgrade ipex-llm[all]

Requirement already satisfied: ipex-llm[all] in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (2.2.0b20250123)
Requirement already satisfied: py-cpuinfo in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (9.0.0)
Requirement already satisfied: protobuf in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (5.29.3)
Requirement already satisfied: mpmath==1.3.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (1.3.0)
Requirement already satisfied: numpy==1.26.4 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (1.26.4)
Requirement already satisfied: transformers==4.37.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (4.37.0)
Requirement already satisfied: sentencepiece in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.2.0)
Requirement already satisfied: tokenizers==0.15.2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.15.2)
Requirement already satisfied: accelerate==0.23.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.23.0)
Requirement already satisfied: tabulate in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.9.0)
Requirement already satisfied: setuptools in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (75.8.0)
Requirement already satisfied: intel-openmp in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (2025.0.4)
Requirement already satisfied: torch==2.1.2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (2.1.2)
Requirement already satisfied: packaging>=20.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (24.2)
Requirement already satisfied: psutil in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (6.1.1)
Requirement already satisfied: pyyaml in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (6.0.2)
Requirement already satisfied: huggingface-hub in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (0.27.1)
Requirement already satisfied: filelock in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (3.17.0)
Requirement already satisfied: typing-extensions in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (4.12.2)
Requirement already satisfied: sympy in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (1.13.3)
Requirement already satisfied: networkx in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (3.2.1)
Requirement already satisfied: jinja2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (3.1.5)
Requirement already satisfied: fsspec in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (2024.12.0)
Requirement already satisfied: regex!=2019.12.17 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (2024.11.6)
Requirement already satisfied: requests in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (2.32.3)
Requirement already satisfied: safetensors>=0.3.1 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (0.5.2)
Requirement already satisfied: tqdm>=4.27 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (4.67.1)
Requirement already satisfied: intel-cmplr-lib-ur==2025.0.4 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from intel-openmp->ipex-llm[all]) (2025.0.4)
Requirement already satisfied: umf==0.9.* in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from intel-cmplr-lib-ur==2025.0.4->intel-openmp->ipex-llm[all]) (0.9.1)
Requirement already satisfied: tcmlib==1.2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from umf==0.9.*->intel-cmplr-lib-ur==2025.0.4->intel-openmp->ipex-llm[all]) (1.2.0)
Requirement already satisfied: colorama in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from tqdm>=4.27->transformers==4.37.0->ipex-llm[all]) (0.4.6)
Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from jinja2->torch==2.1.2->ipex-llm[all]) (3.0.2)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (2024.12.14)

5.1.2 Load Model

Now let’s load the model. We’ll use meta-llama/Llama-2-7b-chat-hf as an example.

5.1.2.0 Download Llama 2 (7B)

To download the meta-llama/Llama-2-7b-chat-hf model from Hugging Face, you will need to obtain access granted by Meta. Please follow the instructions provided here to request access to the model.

After receiving the access, download the model with your Hugging Face token:

from huggingface_hub import snapshot_download
model_path = snapshot_download(repo_id='meta-llama/Llama-2-7b-chat-hf',
                               token='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') # change it to your own Hugging Face access token

Fetching 16 files:   0%|          | 0/16 [00:00<?, ?it/s]

Note

The model will by default be downloaded to HF_HOME='~/.cache/huggingface'.

5.1.2.1 Load Model in Low Precision

One common use case is to load a Hugging Face transformers model in low precision, i.e. conduct implicit quantization while loading.

For Llama 2 (7B), you could simply import ipex_llm.transformers.AutoModelForCausalLM instead of transformers.AutoModelForCausalLM, and specify load_in_4bit=True or load_in_low_bit parameter accordingly in the from_pretrained function. Compared to the Hugging Face transformers API, only minor code changes are required.

For INT4 Optimizations (with load_in_4bit=True):

from ipex_llm.transformers import AutoModelForCausalLM
model_in_4bit = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path="meta-llama/Llama-2-7b-chat-hf",
                                                     load_in_4bit=True)