model_dir/model_a/config.pbtxt
model_dir/model_a/config.pbtxt
CPU Model Instance
来源:https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md
每个 GPU 一个实例
The following configuration will place two execution instances of the model to be available on each system GPU.
instance_group [
{
count: 2
kind: KIND_GPU
}
]
指定 GPU 上的模型实例数量
And the following configuration will place one execution instance on GPU 0 and two execution instances on GPUs 1 and 2.
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [ 0 ]
},
{
count: 2
kind: KIND_GPU
gpus: [ 1, 2 ]
}
]
CPU Model Instance
instance_group [
{
count: 2
kind: KIND_CPU
}
]
If no count
is specified for a KIND_CPU instance group, then the default instance count will be
- 2 for selected backends (Tensorflow and Onnxruntime).
- All other backends will default to 1.
Host Policy
The instance group setting is associated with a host policy. The following configuration will associate all instances created by the instance group setting with host policy "policy_0
". By default the host policy will be set according to the device kind of the instance, for instance, KIND_CPU
is "cpu
", KIND_MODEL
is "model
", and KIND_GPU
is "gpu_<gpu_id>
".
instance_group [
{
count: 2
kind: KIND_CPU
host_policy: "policy_0"
}
]
Priority
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [ 0, 1, 2 ]
rate_limiter {
resources [
{
name: "R1"
count: 4
},
{
name: "R2"
global: True
count: 2
}
]
priority: 2
}
}
]