Notebook 5.2 Speech Recognition

Speech recognition, also known as automatic speech recognition (ASR), is a technology that converts spoken words into written format or executes specific actions based on verbal commands. It involves machine learning models that analyze speech patterns, phonetics, and language structures to accurately transcribe and understand human speech.

Whisper, published by OpenAI, is a popular open-source model for both ASR and speech translation. This means that Whisper has the capability to transcribe speech in multiple languages and facilitate translation from those languages into English.

Due to its underlying Transformer-based encoder-decoder architecture, Whisper can be optimized effectively with IPEX-LLM INT4 optimizations. In this tutorial, we will guide you through building a speech recognition application on IPEX-LLM optimized Whisper model that can transcribe/translate audio files into text.

5.2.1 Install Packages

Follow instructions in Chapter 2 to setup your environment if you haven’t done so. Then install ipex-llm:

!pip install --pre --upgrade ipex-llm[all]
Requirement already satisfied: ipex-llm[all] in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (2.2.0b20250123)
Requirement already satisfied: py-cpuinfo in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (9.0.0)
Requirement already satisfied: protobuf in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (5.29.3)
Requirement already satisfied: mpmath==1.3.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (1.3.0)
Requirement already satisfied: numpy==1.26.4 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (1.26.4)
Requirement already satisfied: transformers==4.37.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (4.37.0)
Requirement already satisfied: sentencepiece in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.2.0)
Requirement already satisfied: tokenizers==0.15.2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.15.2)
Requirement already satisfied: accelerate==0.23.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.23.0)
Requirement already satisfied: tabulate in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (0.9.0)
Requirement already satisfied: setuptools in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (75.8.0)
Requirement already satisfied: intel-openmp in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (2025.0.4)
Requirement already satisfied: torch==2.1.2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from ipex-llm[all]) (2.1.2)
Requirement already satisfied: packaging>=20.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (24.2)
Requirement already satisfied: psutil in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (6.1.1)
Requirement already satisfied: pyyaml in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (6.0.2)
Requirement already satisfied: huggingface-hub in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from accelerate==0.23.0->ipex-llm[all]) (0.27.1)
Requirement already satisfied: filelock in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (3.17.0)
Requirement already satisfied: typing-extensions in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (4.12.2)
Requirement already satisfied: sympy in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (1.13.3)
Requirement already satisfied: networkx in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (3.2.1)
Requirement already satisfied: jinja2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (3.1.5)
Requirement already satisfied: fsspec in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from torch==2.1.2->ipex-llm[all]) (2024.12.0)
Requirement already satisfied: regex!=2019.12.17 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (2024.11.6)
Requirement already satisfied: requests in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (2.32.3)
Requirement already satisfied: safetensors>=0.3.1 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (0.5.2)
Requirement already satisfied: tqdm>=4.27 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from transformers==4.37.0->ipex-llm[all]) (4.67.1)
Requirement already satisfied: intel-cmplr-lib-ur==2025.0.4 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from intel-openmp->ipex-llm[all]) (2025.0.4)
Requirement already satisfied: umf==0.9.* in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from intel-cmplr-lib-ur==2025.0.4->intel-openmp->ipex-llm[all]) (0.9.1)
Requirement already satisfied: tcmlib==1.2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from umf==0.9.*->intel-cmplr-lib-ur==2025.0.4->intel-openmp->ipex-llm[all]) (1.2.0)
Requirement already satisfied: colorama in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from tqdm>=4.27->transformers==4.37.0->ipex-llm[all]) (0.4.6)
Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from jinja2->torch==2.1.2->ipex-llm[all]) (3.0.2)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests->transformers==4.37.0->ipex-llm[all]) (2024.12.14)

Due to the requirement to process audio file, you will also need to install the librosa package for audio analysis.

!pip install -U librosa
Collecting librosa
  Downloading librosa-0.10.2.post1-py3-none-any.whl.metadata (8.6 kB)
Collecting audioread>=2.1.9 (from librosa)
  Downloading audioread-3.0.1-py3-none-any.whl.metadata (8.4 kB)
Requirement already satisfied: numpy!=1.22.0,!=1.22.1,!=1.22.2,>=1.20.3 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from librosa) (1.26.4)
Collecting scipy>=1.2.0 (from librosa)
  Downloading scipy-1.13.1-cp39-cp39-win_amd64.whl.metadata (60 kB)
Collecting scikit-learn>=0.20.0 (from librosa)
  Downloading scikit_learn-1.6.1-cp39-cp39-win_amd64.whl.metadata (15 kB)
Collecting joblib>=0.14 (from librosa)
  Downloading joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Requirement already satisfied: decorator>=4.3.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from librosa) (5.1.1)
Collecting numba>=0.51.0 (from librosa)
  Downloading numba-0.60.0-cp39-cp39-win_amd64.whl.metadata (2.8 kB)
Collecting soundfile>=0.12.1 (from librosa)
  Downloading soundfile-0.13.1-py2.py3-none-win_amd64.whl.metadata (16 kB)
Collecting pooch>=1.1 (from librosa)
  Downloading pooch-1.8.2-py3-none-any.whl.metadata (10 kB)
Collecting soxr>=0.3.2 (from librosa)
  Downloading soxr-0.5.0.post1-cp39-cp39-win_amd64.whl.metadata (5.6 kB)
Requirement already satisfied: typing-extensions>=4.1.1 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from librosa) (4.12.2)
Collecting lazy-loader>=0.1 (from librosa)
  Downloading lazy_loader-0.4-py3-none-any.whl.metadata (7.6 kB)
Collecting msgpack>=1.0 (from librosa)
  Downloading msgpack-1.1.0-cp39-cp39-win_amd64.whl.metadata (8.6 kB)
Requirement already satisfied: packaging in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from lazy-loader>=0.1->librosa) (24.2)
Collecting llvmlite<0.44,>=0.43.0dev0 (from numba>=0.51.0->librosa)
  Downloading llvmlite-0.43.0-cp39-cp39-win_amd64.whl.metadata (4.9 kB)
Requirement already satisfied: platformdirs>=2.5.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from pooch>=1.1->librosa) (4.3.6)
Requirement already satisfied: requests>=2.19.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from pooch>=1.1->librosa) (2.32.3)
Collecting threadpoolctl>=3.1.0 (from scikit-learn>=0.20.0->librosa)
  Downloading threadpoolctl-3.5.0-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: cffi>=1.0 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from soundfile>=0.12.1->librosa) (1.17.1)
Requirement already satisfied: pycparser in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from cffi>=1.0->soundfile>=0.12.1->librosa) (2.22)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests>=2.19.0->pooch>=1.1->librosa) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests>=2.19.0->pooch>=1.1->librosa) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests>=2.19.0->pooch>=1.1->librosa) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\expertbook\\miniforge3\\envs\\llm-tutorial\\lib\\site-packages (from requests>=2.19.0->pooch>=1.1->librosa) (2024.12.14)
Downloading librosa-0.10.2.post1-py3-none-any.whl (260 kB)
Downloading audioread-3.0.1-py3-none-any.whl (23 kB)
Downloading joblib-1.4.2-py3-none-any.whl (301 kB)
Downloading lazy_loader-0.4-py3-none-any.whl (12 kB)
Downloading msgpack-1.1.0-cp39-cp39-win_amd64.whl (74 kB)
Downloading numba-0.60.0-cp39-cp39-win_amd64.whl (2.7 MB)
   ---------------------------------------- 0.0/2.7 MB ? eta -:--:--
   ---------------------------------------  2.6/2.7 MB 10.8 MB/s eta 0:00:01
   ---------------------------------------- 2.7/2.7 MB 10.3 MB/s eta 0:00:00
Downloading pooch-1.8.2-py3-none-any.whl (64 kB)
Downloading scikit_learn-1.6.1-cp39-cp39-win_amd64.whl (11.2 MB)
   ---------------------------------------- 0.0/11.2 MB ? eta -:--:--
   ------------------ --------------------- 5.2/11.2 MB 35.6 MB/s eta 0:00:01
   ---------------------------------------- 11.2/11.2 MB 31.8 MB/s eta 0:00:00
Downloading scipy-1.13.1-cp39-cp39-win_amd64.whl (46.2 MB)
   ---------------------------------------- 0.0/46.2 MB ? eta -:--:--
   --------- ------------------------------ 11.5/46.2 MB 60.3 MB/s eta 0:00:01
   ----------------- ---------------------- 20.7/46.2 MB 52.4 MB/s eta 0:00:01
   -------------------------- ------------- 31.2/46.2 MB 49.5 MB/s eta 0:00:01
   ------------------------------------ --- 42.2/46.2 MB 49.7 MB/s eta 0:00:01
   ---------------------------------------- 46.2/46.2 MB 43.3 MB/s eta 0:00:00
Downloading soundfile-0.13.1-py2.py3-none-win_amd64.whl (1.0 MB)
   ---------------------------------------- 0.0/1.0 MB ? eta -:--:--
   ---------------------------------------- 1.0/1.0 MB 50.5 MB/s eta 0:00:00
Downloading soxr-0.5.0.post1-cp39-cp39-win_amd64.whl (167 kB)
Downloading llvmlite-0.43.0-cp39-cp39-win_amd64.whl (28.1 MB)
   ---------------------------------------- 0.0/28.1 MB ? eta -:--:--
   --------------- ------------------------ 10.7/28.1 MB 51.4 MB/s eta 0:00:01
   ------------------------------ --------- 21.5/28.1 MB 50.3 MB/s eta 0:00:01
   ---------------------------------------  28.0/28.1 MB 50.8 MB/s eta 0:00:01
   ---------------------------------------- 28.1/28.1 MB 43.5 MB/s eta 0:00:00
Downloading threadpoolctl-3.5.0-py3-none-any.whl (18 kB)
Installing collected packages: threadpoolctl, soxr, scipy, msgpack, llvmlite, lazy-loader, joblib, audioread, soundfile, scikit-learn, pooch, numba, librosa
Successfully installed audioread-3.0.1 joblib-1.4.2 lazy-loader-0.4 librosa-0.10.2.post1 llvmlite-0.43.0 msgpack-1.1.0 numba-0.60.0 pooch-1.8.2 scikit-learn-1.6.1 scipy-1.13.1 soundfile-0.13.1 soxr-0.5.0.post1 threadpoolctl-3.5.0

5.2.2 Download Audio Files

To begin, let’s prepare some audio files. As an example, you can download an English example from multilingual audio dataset voxpopuli and one Chinese example from the Chinese audio dataset AIShell. Here, the English audio file and the Chinese audio file have been randomly selected. Feel free to choose different audio files according to your preferences.

Here we rename the files to audio_en.mp3 and audio_zh.mp3 and put them in the current path.You could play the successfully-downloaded audio:

import IPython
#download the example file from here.#<https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0/viewer/zh-TWIPython.display.display(IPython.display.Audio("audio_en.mp3")>)
IPython.display.display(IPython.display.Audio("audio_zh.mp3"))

Your browser does not support the audio element.

Your browser does not support the audio element.

5.2.3 Load Pretrained Whisper Model

Now, let’s load a pretrained Whisper model, e.g. whisper-medium as an example. OpenAI has released pretrained Whisper models in various sizes (including whisper-small, whisper-tiny, etc.), allowing you to choose the one that best fits your requirements.

Simply use one-line transformers-style API in ipex-llm to load whisper-medium with INT4 optimizations (by specifying load_in_4bit=True) as follows. Please note that model class AutoModelForSpeechSeq2Seq is used for Whisper: