项目地址：https://github.com/NVIDIA/TensorRT-LLM.git

0. 调试使用的环境

0.1 启动开发环境

调试使用的环境是 docker 环境，使用下面的 docker-compose.yml 配置启动调试环境

镜像使用的 nvidia/cuda:12.1.0-devel-ubuntu22.04

可以使用下面的命令下载镜像

nvidia-docker run --entrypoint /bin/bash -it nvidia/cuda:12.1.0-devel-ubuntu22.04

version: '3'
services:
  whisper_debug:
    image: nvidia/cuda:12.1.0-devel-ubuntu22.04
    entrypoint: /entrypoint.sh
    #command:
    #  - "tritonserver"
    #  #- "--model-repository=/server/face"
    #  - "--model-repository=/server/image_ocr"
    volumes:
      - ./entrypoint.sh:/entrypoint.sh
      - ./data:/data
      - ./logs:/logs
      - /etc/localtime:/etc/localtime
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            device_ids: ['7']
            capabilities: [gpu]
    shm_size: "48G"
    user: root
    network_mode: bridge
    restart: always
    ports:
      - "26130:8000"
      - "26131:8001"
      - "26132:8002"
    #environment:
    #  - CONTAINER_NAME=stark_face_server_1

镜像下载完成后，启动容器。

0.2 安装依赖

apt update && apt upgrade
apt install software-properties-common -y
add-apt-repository ppa:deadsnakes/ppa
apt-get -y install python3.10
apt-get -y install python3.10-venv
apt-get install -y python3.10-dev
python3.10 -m venv venv
. ./venv/bin/activate
pip install --upgrade pip
apt-get install -y openmpi-bin libopenmpi-dev

注意⚠️：视情况决定是否安装 python3-pip

0.3 安装 `tensorrt_llm`

pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com

测试是否安装完成

python3 -c "import tensorrt_llm"

1. 下载代码

代码地址：https://github.com/NVIDIA/TensorRT-LLM.git

git clone https://github.com/NVIDIA/TensorRT-LLM.git

2. 安装

cd TensorRT-LLM
pip install . -i https://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com --no-cache-dir --extra-index-url https://pypi.nvidia.com

3. build whisper engine

根据 TensorRT-LLM/examples/whisper/README.md 的步骤下载文件，并构建。

需要把音频文件转码为 mono 16000 Hz 格式的。

ffmpeg -i 111.mp3 -acodec pcm_s16le -ac 1 -ar 16000 out.wav

安装 TensorRT-LLM