Notebook 4.1: ChatGLM2-6B

4.1.1 Overview

This example shows how to run ChatGLM2-6B Chinese inference on low-cost PCs (without the need of discrete GPU) using IPEX-LLM APIs. ChatGLM2-6B is the second-generation version of the open-source bilingual (Chinese-English) chat model ChatGLM-6B proposed by THUDM. ChatGLM2-6B also can be found in Huggingface models in following link.

Before conducting inference, you may need to prepare environment according to Chapter 2.

4.1.2 Installation

First of all, install IPEX-LLM in your prepared environment. For best practices of environment setup, refer to Chapter 2 in this tutorial.

!pip install --pre --upgrade ipex-llm[all]

Requirement already satisfied: ipex-llm[all] in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (2.2.0b20250123)
Requirement already satisfied: py-cpuinfo in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (9.0.0)
Requirement already satisfied: protobuf in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (5.29.3)
Requirement already satisfied: mpmath==1.3.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (1.3.0)
Requirement already satisfied: numpy==1.26.4 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (1.26.4)
Requirement already satisfied: transformers==4.37.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (4.37.0)
Requirement already satisfied: sentencepiece in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.2.0)
Requirement already satisfied: tokenizers==0.15.2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.15.2)
Requirement already satisfied: accelerate==0.23.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.23.0)
Requirement already satisfied: tabulate in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.9.0)
Requirement already satisfied: setuptools in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (75.8.0)
Requirement already satisfied: intel-openmp in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (2025.0.4)
Requirement already satisfied: torch==2.1.2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (2.1.2)
Requirement already satisfied: packaging>=20.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (24.2)
Requirement already satisfied: psutil in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (6.1.1)
Requirement already satisfied: pyyaml in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (6.0.2)
Requirement already satisfied: huggingface-hub in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (0.27.1)
Requirement already satisfied: filelock in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (3.17.0)
Requirement already satisfied: typing-extensions in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (4.12.2)
Requirement already satisfied: sympy in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (1.13.3)
Requirement already satisfied: networkx in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (3.2.1)
Requirement already satisfied: jinja2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (3.1.5)
Requirement already satisfied: fsspec in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (2024.12.0)
Requirement already satisfied: regex!=2019.12.17 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (2024.11.6)
Requirement already satisfied: requests in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (2.32.3)
Requirement already satisfied: safetensors>=0.3.1 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (0.5.2)
Requirement already satisfied: tqdm>=4.27 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (4.67.1)
Requirement already satisfied: intel-cmplr-lib-ur==2025.0.4 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from intel-openmp->ipex-llm[all]) (2025.0.4)
Requirement already satisfied: umf==0.9.* in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from intel-cmplr-lib-ur==2025.0.4->intel-openmp->ipex-llm[all]) (0.9.1)
Requirement already satisfied: tcmlib==1.2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from umf==0.9.*->intel-cmplr-lib-ur==2025.0.4->intel-openmp->ipex-llm[all]) (1.2.0)
Requirement already satisfied: colorama in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from tqdm>=4.27->transformers==4.37.0->ipex-llm[all]) (0.4.6)
Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from jinja2->torch==2.1.2->ipex-llm[all]) (3.0.2)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (2024.12.14)

The all option is for installing other required packages by IPEX-LLM.

4.1.3 Load Model and Tokenizer

4.1.3.1 Load Model

Load ChatGLM2 model with low-bit optimization(INT4) for lower resource cost using IPEX-LLM APIs, which convert the relevant layers in the model into INT4 format.

Note

IPEX-LLM has supported AutoModel, AutoModelForCausalLM, AutoModelForSpeechSeq2Seq and AutoModelForSeq2SeqLM. The AutoClasses help users automatically retrieve the relevant model, in this case, we can simply use AutoModel to load.

Note

You can specify the argument model_path with both Huggingface repo id or local model path.

from ipex_llm.transformers import AutoModel
model_path = "THUDM/chatglm2-6b"model = AutoModel.from_pretrained(model_path,
                                  load_in_4bit=True,
                                  trust_remote_code=True)

C:\\Users\\ExpertBook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages\\transformers\\deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
C:\\Users\\ExpertBook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages\\huggingface_hub\\file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(

Loading checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]

C:\\Users\\ExpertBook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages\\torch\\_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
2025-01-26 18:27:13,921 - INFO - Converting the current model to sym_int4 format......
C:\\Users\\ExpertBook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages\\torch\\nn\\init.py:412: UserWarning: Initializing zero-element tensors is a no-op
  warnings.warn("Initializing zero-element tensors is a no-op")

4.1.3.2 Load Tokenizer

A tokenizer is also needed for LLM inference. It is used to encode input texts to tensors to feed to LLMs, and decode the LLM output tensors to texts. You can use Huggingface transformers API to load the tokenizer directly. It can be used seamlessly with models loaded by IPEX-LLM.