On July 22, 2025, at the 8th Intelligent Assisted Driving Conference,HuaweiHuang Ziliang, product director of Ascend Intelligent Vehicles & Robotics Technology Co., Ltd., pointed out that with the surge in data volume and the increase in model complexity, computing power has become the key to the competition of car companies. The intelligent assisted driving system transitions from modular end-to-end to large models, the parameter scale has increased from millions to tens of billions, the data processing volume has reached petabytes per day, and the training rhythm has accelerated to day-level iterations. These changes put forward higher requirements for cloud AI computing power, and it is expected that the demand for AI computing power in China’s automotive cloud will reach 100EFLOPS in 2028.
In response to industry needs, Huawei Ascend accelerates multimodal large model training through tools such as MindSpeed and Driving SDK, with performance ahead of domestic competitors and supports rapid migration and development. At the same time, its high-availability architecture ensures 40 days of long-term stable training of 100 billion parameter models, with a fault recovery time of less than 10 minutes, providing stable and reliable computing power support for intelligent assisted driving services. Ascend empowers Qiankun intelligent driving, and joins hands to create an industry-leading intelligent driving assistance ADS system through surging computing power and software enablement. At the same time, Ascend looks forward to joining hands with ecological partners to provide intelligent solutions for car companies and empower the development of intelligent vehicle AI.
Huang Ziliang|Product Director of Huawei Ascend Intelligent Vehicles & Robotics
The following is a summary of the speech:
Intelligent driving business trends
With the gradual implementation of technology, the penetration rate of L2 and L2+ level assisted driving continues to increase, and is expected to reach 80% this year. First, the model architecture has evolved from modularity to one-stop, and has further developed into VLM and VLA architectures, and the scale of model parameters has exceeded petabyte level; Second, the data collection method has shifted from the traditional mode to the combination of million-level mass-produced vehicles and AIGC generated data, and the training data volume has been increased to the daily petabyte level; Third, affected by intensified market competition, the training iteration cycle has been shortened from weekly to sky. In the future, with the implementation of L3 and L4 autonomous driving policies, technological development will promote the transformation of manufacturers to end-to-end and VLA technology routes, which put forward higher requirements for data scale and iteration efficiency, and computing power will become the core element of competition among car companies.
Source: Speaker material
WeforecastBy 2028, the increase in cloud AI computing power demand in China’s automotive industry will reach 100 EFlops. Based on the development path of L2 and L2+ to L3 intelligent driving training, it can be concluded that it is necessary to achieve tenfold scale expansion, tenfold reliability improvement and ultimate iteration capability, and process 100 times the amount of data at the same time to optimize the assisted driving algorithm and gain an advantage in market competition. Therefore, the stability and sustainability of computing power supply will become a key guarantee.
Both the E2E model and the VLA model put forward significant requirements for computing power scale and network performance. Taking the VLA model as an example, it is expected to deploy and schedule 100,000 computing cards.
Efficient data mining and annotation verification are the basis for building a closed-loop system of intelligent driving data. First, in terms of data mining, massive data has been accumulated, and the data generation process of multimodal large models needs to rely on high-performance AI computing power platforms to support; Second, in the data annotation process, the traditional manual annotation method can no longer meet the requirements of end-to-end assisted driving algorithms in terms of accuracy and consistency, and it is difficult to cover all dangerous scenarios through road test verification. Therefore, it is necessary to have high-performance AI scenario generation capabilities to improve the coverage of test scenarios.
In general, the construction and use of intelligent driving large computing power clusters put forward higher requirements for architecture design, operational efficiency, resource scheduling and ecological collaboration. In the face of the scale of 10,000 or 10,000 card clusters, it is necessary to ensure that the backbone network has high scalability and excellent performance, and at the same time achieve rapid fault monitoring and high-reliability recovery. In terms of training and inference efficiency, high-performance computing and high-speed communication capabilities should be pursued, and a good ecosystem should be built to support mainstream models in the industry.
In view of the fact that various car companies have deployed a large number of GPU, NPU and other diversified computing power resources, it is necessary to establish a unified training and promotion resource management system to achieve collaborative scheduling and efficient utilization of heterogeneous resources, and finally achieve the goal of high system reliability. In view of the technical difficulties in operator development, accuracy positioning, and performance tuning, an open-source fusion operator reference scheme should be provided to realize automatic accuracy detection and uninterrupted service tuning. In addition, it needs to be deeply adapted to the mainstream framework ecosystem to ensure that the system is ready to use out of the box.
Ascend intelligent driving solutions
Ascend AI adheres to the full-stack open concept of software and hardware collaboration to provide a second computing power option for the domestic market. Its technology system covers basic processors, Atlas series servers and clusters, CAN computing architecture and AI framework MindSpore, and comprehensively benchmarks the technical standards and ecosystems of international mainstream manufacturers. Through the full-stack compatibility of underlying software and hardware, Ascend AI has achieved 10,000-card cluster and trillion-parameter model training support, with 40 days of continuous operation without interruption, and achieved the reliability index of 95% of faults responding to recovery within 10 minutes. In terms of ecological openness, the system fully supports various framework acceleration libraries in the industry, adapts to 100+ basic models, and supports the whole process tool chain, which can realize the rapid migration and deployment of mainstream scenario algorithms. With the collaborative optimization of software and hardware, our performance indicators have reached the industry-leading level at home and abroad.
Since the release of full-scenario, full-stack AI technology in 2018, Ascend AI has been firmly committed to the construction of the AI industry. Recently, we have launched a new generation of technology solutions such as super-node cluster architecture and intelligent driving Drive SDK to further expand the application boundaries of AI technology.
Ascend AI computing power fully covers all business processes such as intelligent driving training, annotation, and desensitization. In the training scenario, we launched the Atlas series of training servers, which can support the large-scale training needs of large models; For central inference and data preprocessing and annotation, we provide inference server solutions such as Atlas 800I A2.
For intelligent driving pre-training and post-training scenarios, Ascend has built a highly reliable, highly available, and easy-to-use AI training system. The system adopts a top-down technical architecture: through the distributed acceleration suite MindSpore, intelligent driving training SDK, AI framework and chip enabling layer, combined with the cluster hardware infrastructure, and supporting the intelligent operation and maintenance of the full-process development tool chain MindStudio and the cluster computing automation engine CCAE, the collaborative optimization of performance, usability and ease of use is realized. Among them, hardwareLightweightDesign and Transformer architecture performance has reached or exceeded the industry average; In terms of high availability, the average repair time for large-scale clusters is controlled within 30 minutes. At the level of high ease of use, the efficiency of operator development and model migration is significantly improved. In large-scale cluster deployments, the synergistic efficiency of computing power, storage, and network remains industry-leading.
Source: Speaker material
For the technical needs of intelligent driving at different stages of development, Ascend AI provides hierarchical development support solutions. For modular end-to-end architecture, the Driving SDK development kit was launched. For cockpit multimodal interaction and integrated end-to-end VLM/VLA architecture, in addition to providing dedicated SDKs, MindSpeed multimodal suite and RL open source high-performance framework have also been released to support multimodal content generation, understanding, and reinforcement learning tasks.
The Ascend Driving SDK is an interface system for NPU high-performance operators and acceleration libraries designed for the field of autonomous driving, supporting seamless integration of the PyTorch framework and realizing minute-level model adaptation through one-click path migration. The suite improves network performance by 30% through operator-level optimization, and increases developer efficiency by 20% in open source mode. As an open source project, Huawei sincerely invites industry developers and scientific research institutions to participate in the ecosystem and promote the evolution of algorithm innovation.
The Driving SDK comprehensively covers the industry’s mainstream perception, planning control, and E2E algorithms, and its typical model performance has reached or exceeded the industry average. In the future, we will continue to expand the operator model library and quickly respond to customer needs through commercial cooperation projects, ensuring out-of-the-box performance and continuous optimization.
The MindSpeed MM multimodal suite provides high-performance acceleration support for intelligent driving data closed-loop and large model training. The suite is preset with 10 mainstream multi-modal large models, with ultimate performance, covering the whole process of pre-training, fine-tuning, online reasoning and online evaluation. At the same time, it supports flexible construction of multimodal generation and understanding models, and provides scalable component architecture design.
In terms of scene acceleration, we have significantly enhanced the acceleration performance of multimodal scenes by integrating the basic acceleration algorithm of MindSpeed Core. At the same time, the high-performance framework for post-RL training realizes the out-of-the-box training script and fully supports mainstream RL algorithms. The framework has the ability to quickly build a training environment, supports the parallelization of multimodal data processing, and realizes interactive training with the sandbox environment. In terms of high-performance RL acceleration, we have developed a variety of Ascend affinity optimization features, including high-performance scheduling frameworks and optimization algorithms, and provide deep acceleration support for the RL operator layer.
MindSpeed and PyTorch framework are deeply optimized and fully support multi-modal generation and understanding models for intelligent driving after multiple rounds of high-intensity performance tuning. After testing and verification, the performance of the optimized mainstream models has been improved by more than 10%, and the performance of some key models has increased by 20%.
MindCluster is a reliable training solution built by Ascend for large-scale clusters, breaking through the scale limit of 5,000 nodes in a single cluster of K8s architecture through ultra-large-scale cluster scheduling technology, and realizing the rapid start of large-scale training tasks in minutes. In terms of breakpoint resumption capability, it supports non-inductive recovery at the operator level to minute-level recovery at the job level, significantly shortening the fault recovery time. At the elastic training level, the overall availability of the cluster is increased by 5% through a flexible and reliable dynamic scaling mechanism.
Ascend supports the rapid migration and deployment of mainstream algorithms by improving the full-process toolchain. Practical application cases show that the system can generate operator support analysis reports within 5 minutes, and model accuracy training can be completed in an average of only 1 working day through one-click code migration tools, providing key technical guarantee for sky-level iteration efficiency.
Source: Speaker material
Through ONNX unified conversion technology, Ascend has realized the implementation of the whole process from cloud training to vehicle deployment. In the central training session, the model performance obtained from Ascend training can be benchmarked against GPU training results, and after being converted to ONNX format, it can be seamlessly deployed to the vehicle-side heterogeneous hardware platform. At present, the industry has successfully verified the engineering implementation of multiple cross-platform deployment paths such as Ascend to Ascend, Ascend to Horizon, and Ascend to NVIDIA.
RiseriseIntelligent driving landing cases and ecological cooperation
By empowering HUAWEI CLOUD and Qiankun ADS, Ascend has built a large-scale computing power cluster, successfully supporting 100 billion parameter models to complete 40 days of continuous and stable training, and the fault recovery capability has reached the industry-leading level: relying on the full-stack failure mode library, 95% of faults can be detected within minutes. Through the three-level fault rapid recovery architecture, it realizes non-inductive breakpoint continuation training in most business scenarios, ensures 40 days of zero-interruption operation of training tasks, and the performance indicators are better than the industry average.
In addition, Ascend surging computing power provides strong support for data services, annotation services and simulation services: in scenarios such as multi-modal data fusion and massive data processing, problems can be quickly located and performance tuning can be realized; Through the efficient conversion of real scenes to the simulation environment, combined with automatic annotation tools to cover the needs of all scenarios, the annotation efficiency of large models is significantly improved, and the intelligent annotation technology further reduces the intensity of manual intervention. It achieves centimeter-level accuracy in 3D simulation scene reconstruction, and has fast response and strong adaptation capabilities for corner cases.
Source: Speaker material
In April this year, Ascend ADS 4.0 was officially released. In order to support the implementation of L3 autonomous driving pilots in high-speed scenarios, the system introduces a new WEVA one-stage end-to-end architecture to realize the AI self-evolution mechanism through the world model. As the complexity of the world’s behavior model continues to increase, higher requirements are put forward for the computing power and efficiency of intelligent driving training inference clusters. With its 10,000-card cluster computing power and Driving SDK development kit, Ascend successfully ensures the rapid iteration and release of ADS 4.0 on schedule, which has effectively promoted the market sales of models such as M9, M8 and Zunjie.
At the ecological construction level, Ascend is fully compatible with mainstream development frameworks, acceleration libraries, and third-party open source communities in the field of intelligent driving, including core components such as PyTorch, OpenMMLab, and DeepSpeed. This is a complete technology ecosystem formed by Ascend after four years of continuous investment.
In terms of business ecosystem construction, Huawei invests special ecological funds every year to help business partners achieve sustainable business success. In terms of technology empowerment, we provide in-depth technical training and continuous innovation support for partner developers by building a full-scenario curriculum system.
At present, the automotive industry is accelerating its transformation to intelligence and AI, and all business scenarios and application links are showing a surge in computing power demand. In view of the pain points of multi-system chimney architecture in the industry, we recommend that the automotive industry build a unified technical architecture, data specification and development platform, focusing on building a standardized artificial intelligence intermediate platform to achieve rapid development of upper-level applications and efficient docking of lower-level business data. Huawei will provide computing power bases, AI basic hardware and software platforms, and work with ecological partners to provide end-to-end intelligent solutions for car companies.
Focusing on AI innovation in the automotive industry, Huawei Ascend is willing to work with China’s automotive industry to jointly build a technical ecology in the new era of intelligent driving, and contribute core technologies and industry synergies to the intelligent transformation of China’s automotive industry.
(The above content is from the keynote speech “Huawei Ascend Ecosystem, Collective Wisdom Jointly Helps the Development of AI in Intelligent Vehicles” delivered by Huang Ziliang, Product Director of Huawei Ascend Intelligent Vehicles & Robotics at the 8th Intelligent Assisted Driving Conference on July 22, 2025.) )