项目地址:https://github.com/NVIDIA/TensorRT-LLM.git
0. 调试使用的环境
0.1 启动开发环境
调试使用的环境是 docker 环境,使用下面的 docker-compose.yml
配置启动调试环境
镜像使用的 nvidia/cuda:12.1.0-devel-ubuntu22.04
可以使用下面的命令下载镜像
nvidia-docker run --entrypoint /bin/bash -it nvidia/cuda:12.1.0-devel-ubuntu22.04
version: '3'
services:
whisper_debug:
image: nvidia/cuda:12.1.0-devel-ubuntu22.04
entrypoint: /entrypoint.sh
#command:
# - "tritonserver"
# #- "--model-repository=/server/face"
# - "--model-repository=/server/image_ocr"
volumes:
- ./entrypoint.sh:/entrypoint.sh
- ./data:/data
- ./logs:/logs
- /etc/localtime:/etc/localtime
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['7']
capabilities: [gpu]
shm_size: "48G"
user: root
network_mode: bridge
restart: always
ports:
- "26130:8000"
- "26131:8001"
- "26132:8002"
#environment:
# - CONTAINER_NAME=stark_face_server_1
镜像下载完成后,启动容器。
0.2 安装依赖
apt update && apt upgrade
apt install software-properties-common -y
add-apt-repository ppa:deadsnakes/ppa
apt-get -y install python3.10
apt-get -y install python3.10-venv
apt-get install -y python3.10-dev
python3.10 -m venv venv
. ./venv/bin/activate
pip install --upgrade pip
apt-get install -y openmpi-bin libopenmpi-dev
注意⚠️:视情况决定是否安装 python3-pip
0.3 安装 tensorrt_llm
pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
测试是否安装完成
python3 -c "import tensorrt_llm"
1. 下载代码
代码地址:https://github.com/NVIDIA/TensorRT-LLM.git
git clone https://github.com/NVIDIA/TensorRT-LLM.git
2. 安装
cd TensorRT-LLM
pip install . -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com --no-cache-dir --extra-index-url https://pypi.nvidia.com
3. build whisper engine
根据 TensorRT-LLM/examples/whisper/README.md
的步骤下载文件,并构建。
需要把音频文件转码为 mono 16000 Hz
格式的。
ffmpeg -i 111.mp3 -acodec pcm_s16le -ac 1 -ar 16000 out.wav