Six doubts and answers to the choice of FPGA in AI computing system

In the past few days, AlphaGo, who has retired, has forcibly brushed a headline, not to compete with any of the world's top players. Instead, the "new dog" has overcome the Li Shishi version of AlphaGo in just three days through unsupervised learning. Then it took 21 days to defeat the Kejie version of AlphaGo. AlphaGo makes us really see the power of AI computing.

Currently, the two most widely used acceleration components in the AI â€‹â€‹computing platform are GPUs and FPGAs. The GPU can be applied to the field of deep learning training models with computationally intensive, high parallel, SIMD (Single Instruction Multiple Data) applications, and the GPU creates CNN, DNN, RNN, LSTM and enhanced learning networks. Applications such as algorithms accelerate the platform and ecosystem.

However, FPGAs have recently been optimistic about the giants in the AI â€‹â€‹field. For example, Microsoft, Baidu, and Science and Technology News are all looking forward to the future of FPGA applications. So if you let FPGA choose the main force of AI computing system, what kind of concerns do you have?

concern

One: FPGA has advantages? What kind of scene is more suitable for FPGA?

First, deep learning involves two computational aspects, namely training and reasoning. The GPU is very efficient in training deep learning algorithm models, but the advantages of parallel computing cannot be exploited for small batch data in reasoning.

FPGAs have both pipeline parallelism and data parallelism, so latency is lower when processing tasks. For example, if there are 10 steps to process a data packet, the FPGA can build a 10-stage pipeline. Different stages of the pipeline process different data packets, and each data packet flows through 10 levels and is processed. Once a packet is processed for each process, it can be output immediately. In general, FPGA acceleration requires only a microsecond PCIe delay. When Intel introduced Xeon + FPGAs interconnected via QPI fast lanes, the latency between CPU and FPGA could even drop below 100 nanoseconds.

Second, FPGA is a programmable chip, and algorithm burning is more flexible. At present, the deep learning algorithm is not fully mature, and the algorithm is still in the iterative process. If the deep learning algorithm changes greatly, the FPGA is software-defined hardware, which can flexibly switch algorithms and quickly enter the market.

In the future, at least 95% of machine learning calculations are used for inference, and less than 5% are used for model training, and FPGAs are strongly inferred. While greatly improving the efficiency of the inference, it also minimizes the accuracy of the loss, which is the strength of the FPGA.

Two: Can the computing performance of FPGA meet my needs?

Unlike CPU and GPU, FPGA is a typical non-Neumann architecture, which is a mode of hardware adaptation software. It can flexibly adjust the parallelism according to system resources and algorithm features to achieve optimal adaptation, so the energy efficiency ratio Higher than CPU and GPU.

Three: The development cycle of FPGA has been more than one year. Can't this meet the needs of my business?

The development of traditional FPGAs is similar to the development of chips. Using hardware description language (HDL) development, the problems caused by HDL development will be as long as the chip design, from architecture design to simulation verification to final completion. It takes about a year to develop.

However, the Internet business is iteratively fast, and it is possible to complete the accumulation of a large user base in a few months. Therefore, the requirements for the data center are â€œfastâ€â€”the upgrade of the computing power platform should satisfy the development of the business as quickly as possible. Therefore, the traditional development mode of FPGA is difficult to meet the demand in the development cycle of half a year or year.

To this end, Inspur tried to use OpenCL high-level language development method, which encapsulates the underlying hardware such as bus, IO interface, memory controller, and underlying software such as drivers and function calls into standard units to provide upper layer support. Focusing on the algorithm itself, the logic developed by OpenCL is directly mapped to the FPGA through the compilation tool, and the development cycle is shortened from at least 1 year to less than 4 months. Four: For companies with O experience, how to quickly launch FPGA applications?

Perhaps you still have some concerns, real-time development efficiency is greatly improved, and the development cycle is greatly shortened. However, for small and medium-sized AI enterprises with insufficient technology and team reserves, FPGA is still an "unattainable" AI acceleration component.

If there is a solution that integrates software, algorithms, and hardware boards, it provides FaaS (FPGA as a Service) services in a form of software and hardware integration. Do you have any concerns?

At present, Inspur is developing algorithmic migration for several application scenarios with the most urgent needs in the market. It develops industry-leading IP in applications such as image compression, text data compression and neural network acceleration, eliminating the customer's algorithm development cycle. Optimize the landing threshold of FPGA and maximize the efficiency of FPGA landing.

Neural network acceleration scheme: Based on F10A AI online reasoning acceleration scheme, it optimizes and solidifies related algorithms of CNN convolutional neural network, accelerates neural networks such as ResNet, and can be applied to image classification, object detection and face recognition applications. Scenes.

The measured data shows that the image processing speed of the F10A acceleration scheme can reach 742 per second when the picture recognition classification task of the ResNet residual network is performed. The Top-5 recognition accuracy rate is 99.6%, which is 3 times higher than the energy efficiency ratio of the same grade GPU. the above. Compared with general-purpose CPUs, the advantages of F10A will be more obvious when dealing with such high-parallel, small-computing tasks.

WebP picture transcoding compression acceleration scheme: For the compression application of image data, embedding the WebP codec optimization algorithm based on FPGA computing environment, greatly improving the processing performance of WebP image compression coding algorithm by making full use of hardware pipeline design and task level parallelism. It can realize fast conversion of JPEG-WebP picture format, which is about 9.13 times higher than the overall processing efficiency of the traditional implementation, and the highest performance can be 14 times higher than the CPU.

Data Compression Acceleration Solution: In order to solve the shortcomings of the traditional compression architecture, the Inspur GZip algorithm acceleration solution makes full use of the board hardware pipeline design and task level parallelism, greatly improving the throughput of the compression task and effectively reducing the CPU load, compression ratio (compression ratio). =1-compressed file/compressed file) up to 94.8%, compression speed of 1.2GB/s, 10 times the compression efficiency of the traditional scheme.

Five: I am doing cloud, FPGA management? Support virtual machine?

FaaS not only refers to the integrated service of board and software algorithms, but also supports public cloud and online remote management and update. The FPGA solution can support online reconfigurable dynamic logic and remote update of static logic, and improve the reliability of remote monitoring and management of the board through optimized monitoring and management mechanism. Through them, the temperature and board of the FPGA chip can be monitored in real time. The fan speed and board memory characteristics adjust the operating frequency of the FPGA.

At the same time, FPGA also supports direct access of virtual machines. The board itself also adds many RAS (reliability, availability, scalability) features, such as high-reliability memory access, and supports parallel (FPP) and serial (AS) dual loading. Mode, any kind of mode loading failure, can quickly switch to another mode loading, ensuring the availability of large-scale service of the board.

With these features, you can use the FPGA solution to quickly build the underlying computing platform needed for the FPGA cloud. Whether it is to provide public cloud services externally or to quickly allocate FPGA computing power, it can be efficient and reliable.

Six: I don't want to be a mouse, who used it?

For FPGAs such as the emerging AI computing devices, â€œwait and seeâ€ is often a wise choice. After all, not every company wants to be a new technology mouse, and when the first person who eats crabs appears, â€œfollow-upâ€ "There is an inevitable way of "flying pigs."

At present, Inspur FPGA has obtained batch application or deep test in Baidu, Alibaba, Tencent, NetEase and Keda Xunfei. The energy efficiency advantage of FPGA in artificial intelligence online reasoning has been recognized by most Internet and AI companies.

So, in which areas can FPGAs be used? We can hear what the head of Tencent Cloud FPGA team said:

In the field of machine learning, finance, big data, and gene detection, there is a large amount of data that needs to be analyzed and calculated. These are areas where FPGAs can take advantage of high throughput.

There are more secure and lower latency requirements in the field of network security. These scenarios can also take advantage of the low latency of FPGAs.

Ultra-large-scale image processing, the processing of these images are all accelerated by the use of FPGA, and can get satisfactory results.

Nowadays, the more popular natural language processing and speech recognition are also the scenarios that FPGAs can take advantage of.

When FPGAs become a computing power service with efficient hardware, mature IP and cloud management, what are you still worried about?

In the future, CPU+FPGA will be used as a new heterogeneous acceleration mode and will be adopted by more and more application fields.

Laser Lamp Series

LED Laser lamp,Full color led laser lamp

Kindwin Technology (H.K.) Limited , https://www.szktlled.com