share_log

平安证券:GPT-4o性能与实用性双突破 有望加速大模型应用落地

Ping An Securities: GPT-4o performance and practicality breakthroughs are expected to accelerate the implementation of large model applications

Zhitong Finance ·  May 14 23:51

Currently, large models around the world are gradually showing a development trend from one-sided performance competition to putting equal emphasis on performance and practicality.

The Zhitong Finance App learned that Ping An Securities released a research report saying that the current global model is gradually showing a development trend from one-sided performance competition to putting equal emphasis on performance and practicality. When large model capabilities reach a certain level, they will inevitably be applied. By improving the cost performance ratio of their products and promoting the promotion and deployment of downstream applications, large model manufacturers are expected to accelerate the formation of a closed commercial loop in the big model industry chain and continue to be optimistic about AI-themed investment opportunities.

The main views of Ping An Securities are as follows:

GPT-4O's text, reasoning and coding capabilities are benchmarked against GPT-4Turbo

GPT-4o can accept any combination of text, audio, and images as input, and can generate any combination of text, audio, and images as output. GPT-4O's performance in terms of English text and code can be compared to GPT-4Turbo. The performance of non-English text is significantly improved. At the same time, the API is faster, and the cost is reduced by 50%. Among them, in terms of text evaluation, according to OpenAI's official website, compared to mainstream models such as Llama 3 400b, GPT-4O achieved a new high score of 88.7% on the 0-shotcotmmlu (common sense question), and on the traditional 5-shotno-cotmmlu, GPT-4O set a new high score of 87.2%.

GPT-4o achieves a breakthrough in visual and audio understanding

According to OpenAI's official website, before GPT-4O, when using voice mode to talk to ChatGPT, the average latency of GPT-3.5/GPT-4 was 2.8/5.4 seconds, respectively. GPT-4O, on the other hand, can respond to audio input in as short as 232 ms. The average time is 320 ms, which is similar to the response time of a human during a conversation. Mainly because the previous speech model was a conduit composed of three independent models: a simple model transcribes audio into text, GPT-3.5 or GPT-4 receives text and outputs text, and a third simple model converts that text back to audio. GPT-4 lost a great deal of information in the process. It was unable to directly observe the tone, multiple speakers, or background noise, and was unable to output laughter, singing, or expressing emotions. GPT-4o trains a new model end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network.

GPT-4o speeds up and reduces price, and high availability boosts application side penetration of large models

According to OpenAI's official website, for C-side users: GPT-4O's text and image features were launched in ChatGPT's free plan on the day of launch, and a message limit of up to 5 times was provided to Plus users. OpenAI will launch a new version of the GPT-4OAlpha voice mode in ChatGPTPlus within the next few weeks. For developers: Developers can access GPT-4o's text and visual model features in the API. Compared with GPT-4Turbo, GPT-4O is 2 times faster and 50% cheaper. OpenAI plans to open GPT-4o's new audio and video capabilities to selected partners in the API over the next few weeks. The launch of GPT-4O is a major breakthrough in the practicality of OpenAI's large model products.

According to the official account of Magic Square AI, previously, the large model DeepSeek (DeepSeek) released by the domestic AI company DeepSeek (DeepSeek) on May 6, 2024, had performance comparable to mainstream models. DeepSeek V2 achieved a significant cost reduction through comprehensive model architecture innovation. Compared with GPT-4-Turbo, the pricing strategy had a prominent cost performance advantage. Large models at home and abroad are gradually showing a trend from one-sided performance competition to focusing on performance and practicality, and paying more attention to cost performance. It is expected that by reducing the deployment costs of downstream large models, the application and implementation of large models in various scenarios will be accelerated.

Recommended targets: 1) In terms of computing power, Zhongke Shuguang (603019.SH), Ziguang (000938.SZ), Shenzhou Digital (000034.SZ), Longxin Zhongke (688047.SH), it is recommended to focus on Cambrian (688256.SH), Jingjiawei (300474.SZ), Topway Information (002261.SZ), Softcom (301236.SZ); 2) In terms of algorithms, I highly recommend iFLYTEK (002230.SZ); 3) In terms of application scenarios, I strongly recommend Science and Technology Innovation (002230.SZ) 300496.SZ), Shengshi Technology (002990.SZ), recommend Jinshan office. It is recommended to pay attention to Wanxing Technology (300624.SZ), Tonghuashun (300033.SZ), and Caixun (300634.SZ); 4) In terms of network security, I strongly recommend Kai Mingchen (002439.SZ).

Risk warning: 1) The development of large model algorithms in China may fall short of expectations; 2) AI computing power supply chain risks are rising; 3) The application implementation of large model products falls short of expectations.

Disclaimer: This content is for informational and educational purposes only and does not constitute a recommendation or endorsement of any specific investment or investment strategy. Read more
    Write a comment