This repository utilizes Docker to package large language models and multimodal models optimized for Rockchip platforms. It provides a unified calling interface that is compatible with the OpenAI API, making it easy for users to integrate and use these models.
For reComputer RK3588 and reComputer RK3576.
Note: A rough estimate of a model's inference speed includes both TTFT and TPOT. Note: You can use
python test_inference_speed.py --helpto view the help function.
python -m venv .env && source .env/bin/activate
pip install requests
python llm_speed_test.pyReference: rknn-llm