The fastest way to get this model running locally is via Docker.
Refer to the instructions below to proceed.
The loader auto-caches the model archive (several GBs included).
To guarantee smooth performance, the installation process auto-selects the best possible options for your PC.
The PaddleOCR-VL-1.6-GGUF is a stateâofâtheâart visionâlanguage model designed for highâaccuracy optical character recognition in multilingual documents. It leverages a transformerâbased encoderâdecoder architecture that jointly processes text and layout information, enabling robust recognition of curved and distorted scripts. The model supports over 100 languages and can handle a wide range of document types, from printed books to handwritten notes. Its quantized GGUF format ensures efficient inference on consumerâgrade hardware while maintaining competitive performance metrics. A builtâin language detection module automatically identifies the script, reducing preprocessing overhead. Users can integrate the model into existing pipelines via simple API calls, benefiting from its low memory footprint and fast loading times.
| Model Name | PaddleOCR-VL-1.6-GGUF |
| Architecture | Transformerâbased encoderâdecoder |
| Supported Languages | 100+ |
| Input Resolution | 1024×1024 pixels |
| Parameter Count | 1.6âŻB |
| Quantization | GGUF (Q4_K_M) |
| Hardware Requirements | CPU/GPU with â„4âŻGB VRAM |
| License | Apache 2.0 |
https://tabalu.es/category/plugins/