The Kunlun Chip GR800 32GB is the third-generation cloud high-performance AI accelerator card launched by Kunlun Chip Technology under Baidu. It is built on the self-developed XPU-P architecture, focusing on large-scale AI model training and inference scenarios.
| Technical Specifications | |
| Name | Kunlun Chip GR800 32GB |
| Benchmark Product | Comparable to A10 |
| FP16 Performance | 128 TFLOPS |
| INT8 Performance | 256 TOPS |
| VRAM Type | HBM2E (High Bandwidth Memory, approx. 30% higher bandwidth than HBM2) |
| Fabrication Process | TSMC 7nm process, 2.5D CoWoS advanced packaging |
| Video Memory Frequency | Effective frequency 9.375Gbps |
| Memory Bus Width | 1024bit (1.2TB/s bandwidth) |
| Output Interface | PCIe Gen4.0 x16 (Focused on data center computing) |
| Form Factor | FHFL dual-slot |
| Weight | 1064g |

Security is integrated into every phase of the lifecycle, including protected supply chain and factory-to-site integrity assurance. Silicon-based root of trust anchors end-to-end boot resilience.

Accelerate operations with autonomous collaboration. Simplify, automate, and centralize management with advanced enterprise console integration and iDRAC support.

Designed with recycled materials and energy-efficient hardware. The portfolio is optimized to help reduce the carbon footprint and lower operation costs for modern data centers.
Designed for 24x7 enterprise data center operations, it features enterprise-grade components and energy-efficient hardware optimized for deployment at scale.
Supports Secure Boot with Hardware Root of Trust technology, provides an extra layer of security, and meets NEBS Level 3 standards to meet new data center standards.
The passively cooled, energy-efficient, dual-socket design fits a wide range of systems from OEM vendors and is easy to integrate into existing data center infrastructure.








A1: It is primarily designed for high-performance AI accelerator tasks, including large-scale AI model training and inference in sectors such as finance, telecommunications, and the internet.
A2: The GR800 is comparable to the A10 in performance, delivering 128 TFLOPS of FP16 and 256 TOPS of INT8 computing power.
A3: It utilizes 32GB of HBM2E High Bandwidth Memory with a 1024-bit bus width, providing a massive 1.2TB/s bandwidth, which is essential for large model inference.
A4: It uses the PCIe Gen4.0 x16 interface and is fully compatible with both x86 and ARM-based host architectures, ensuring easy integration into various data center environments.
A5: No, the GR800 is a dedicated computing card. It does not have traditional video output ports (HDMI/DP) and does not support gaming APIs like DirectX or OpenGL.
A6: It features a passively cooled design, optimized for energy efficiency within enterprise-grade server chassis that provide high-velocity airflow.