tritonserver 模型配置文件 config.pbtxt

model_dir/model_a/config.pbtxt
model_dir/model_a/config.pbtxt

CPU Model Instance

来源：https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md

每个 GPU 一个实例

The following configuration will place two execution instances of the model to be available on each system GPU.

  instance_group [
    {
      count: 2
      kind: KIND_GPU
    }
  ]

指定 GPU 上的模型实例数量

And the following configuration will place one execution instance on GPU 0 and two execution instances on GPUs 1 and 2.

  instance_group [
    {
      count: 1
      kind: KIND_GPU
      gpus: [ 0 ]
    },
    {
      count: 2
      kind: KIND_GPU
      gpus: [ 1, 2 ]
    }
  ]

CPU Model Instance

instance_group [
  {
    count: 2
    kind: KIND_CPU
  }
]

If no count is specified for a KIND_CPU instance group, then the default instance count will be - 2 for selected backends (Tensorflow and Onnxruntime). - All other backends will default to 1.

Host Policy

The instance group setting is associated with a host policy. The following configuration will associate all instances created by the instance group setting with host policy "policy_0". By default the host policy will be set according to the device kind of the instance, for instance, KIND_CPU is "cpu", KIND_MODEL is "model", and KIND_GPU is "gpu_<gpu_id>".

  instance_group [
    {
      count: 2
      kind: KIND_CPU
      host_policy: "policy_0"
    }
  ]

Priority

  instance_group [
    {
      count: 1
      kind: KIND_GPU
      gpus: [ 0, 1, 2 ]
      rate_limiter {
        resources [
          {
            name: "R1"
            count: 4
          },
          {
            name: "R2"
            global: True
            count: 2
          }
        ]
        priority: 2
      }
    }
  ]