NVIDIA/TensorRT-LLM

TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

33/100
Stars13,690
Forks2,398
LanguagePython

Overview

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

Best for

  • Evaluating TensorRT-LLM for Python AI workflows.
  • Comparing a GitHub project with 13,690 stars and current repository activity.

Pros

  • TensorRT-LLM has visible GitHub traction with 13,690 stars. Topics: blackwell, cuda, llm-serving.
  • The project provides an external homepage for deeper evaluation.

Cons

  • Production fit still depends on documentation depth, issue activity, and release cadence.
  • No license was detected, so usage risk needs manual review.

Production readiness

TensorRT-LLM should be validated with its README, release history, open issues, and integration requirements before production use.

License risk

GitHub did not report a license, which usually requires manual legal review before production use.

Install

git clone https://github.com/NVIDIA/TensorRT-LLM.git

Star trend

14k14k14k05-1605-1905-21