share_log

英特尔主管亮相微软大会:释放AI PC的超能力,优化AI模型运行的革新平台

Intel CEO Debuts at Microsoft Conference: An Innovative Platform to Unleash the Superpowers of AI PCs and Optimize the Operation of AI Models

wallstreetcn ·  May 21 20:07

AI PCs include optimized versions of OpenVino and DirectML, which can efficiently run generative AI models such as Phi3 on CPUs, GPUs, and NPUs. Deploy AI agents that can reason and act using tools to efficiently run AI models on AI PCs, use speculative decoding and quantification techniques, and are suitable for various use cases such as personal assistants, secure local chat, code generation, enhanced search generation (RAG), etc.

Microsoft's annual Build Developers Conference starts on Tuesday. Intel's main software architect Saurabh Tangri and AI application research team leader Guy Boudoukh introduced the development status and application trends of AI PCs.

According to Tangri, AI agents and generative AI applications provide PC users with unparalleled capabilities. AI PCs include optimized versions of OpenVino and DirectML, which can efficiently run generative AI models such as Phi3 on CPUs, GPUs, and NPUs. Deploy AI Agents that can reason and use tools to act, efficiently run AI models on AI PCs, use speculative decoding and quantification techniques, and are suitable for various use cases such as personal assistants, secure local chat, code generation, enhanced search generation (RAG), etc.

Tangri said that current AI technology can already build some features into the platform. He said that when users have static language models for training in static databases, they need the ability to run these models simultaneously. Currently, they can enhance their ability by running Search Enhanced Generation (RAG), thereby enhancing the AI's ability to perform more tasks.

As an example, he said that in a consumer scenario, the question you often run into is “am I over budget?” Now you can bring in your private data through AI and analyze it using advanced LLM (Large Language Model), you can place some content along these lines, and then you can extract some conclusions and actions from it.

“This element is very novel. I'm really excited about this; this is our first time showing this complete pipeline, from RAG to LLM to reaction and reasoning, all running on your PC. It's very interesting and very cutting edge.”

Big

Guy Boudoukh then demonstrated the use of Phi-3, a multi-modal small model driven by the Intel Core Ultra processor, including responses from Phi3AI agents, communication with private data, and how users can talk to documents and generate answers through RAG.

Boudoukh explained that the Phi-3 React proxy front-end is the instructions and context provided by the user to the language model to perform the required tasks, which can be chat or Q&A. He explained that React tips were first introduced by Princeton University and Google last year. This is a new reminder method, and React represents reasoning and execution.

He said this approach allows LLM to do more than simply generate text; it actually allows LLM to use tools and perform actions to better handle user input. It allows LLM to combine various tools such as RAG, Gmail, Wikipedia, Bing Search, etc., some of which can access private data on the device, while others can access the internet.

User queries can first be entered into the React template, then injected into the Phi-3 agent, and the agent decides whether to use tools to answer user queries. If a tool is needed, call the tool, then return the tool's output to the prompt dialog, then back to the agent. Agents can decide if they need to use another tool to answer this question, and the process will be repeated again. The agent will only generate an answer if it determines that there is enough information to answer the user's query.

Big

In the presentation, Boudoukh asked how many teams participated in the Champions League this year. The agent deduced and understood that RAG was needed to answer this question, so he searched 160 BBC Sports News articles; then he asked the agent to send this answer via Gmail, so the agent called Gmail, another tool to solve this problem.

Boudoukh then demonstrated the specific process by which the PHI-3 agent performs RAG. He said RAG allows LLMs to access external knowledge by injecting retrieved information. First, users index hundreds or even thousands of files on the device, which are embedded in the index and saved to a vector database (Vector DB). Now, once the user provides a query, retrieves information from the database, and creates a new unified prompt composed of the user's query and retrieval of the information, then the prompt is injected into LLM and an answer is generated.

RAG has several advantages, he said. First, it enhances LLM knowledge without the need to train models. Second, this kind of data is very efficient to use because there is no need to provide the entire document; only the retrieved information is provided. This reduces the illusion of the model and increases reliability, because when providing answers, it refers to data relevant to obtaining the answers.

Big

In the demonstration that followed, Boudoukh skipped the agent and directly asked the machine how many teams participated in the Champions League this year. He did not use RAG at first. As a result, the agent generated the wrong answer. The answer was that there were 32 teams this year, but in fact, 36 teams participated in the competition this year. He then called RAG to ask the same question and got the right answer.

Boudoukh said this can show developers how to use the software stack to distribute work between NPU, CPU, and integrated GPU. For example, Whisper, the speech recognition model here runs on the NPU, PHI-3 inference runs on the integrated GPU, and the database search runs on the CPU.

Finally, Boudoukh demonstrated the LLAVA Phi3 multi-modal model. He explained that the model is visually and color-trained, so it can handle multi-modal tasks involving text and images. He inserted an image into the model and asked the model to describe the image scene. The model gave a detailed understanding of the scene, and even recommended fishing and relaxing here.

Big

He also showed one of the core parts of the model code, the LLM inference section. He said it's easy to run PHI-3 and LLM inference on an Intel Core Ultra processor; just define the name of the model, define the quantization configuration, load the model, load the tokenizer (tokenizer), then provide some examples, perform tagging operations, mark the input, and then generate the results. And this demo uses an optimized version of OpenVino, which is a type of AI PC.

Big

According to Tangri, this is a wonderful performance of the AI PC running with the LLM. Real-world AI has four pillars: efficiency, security, ability to collaborate with networks, and developer readiness. If you have the first three but aren't ready for developers, you won't be able to innovate on this platform.

He said that high efficiency refers to being able to extend the battery life of the device, rather than simply pursuing the illusion of high floating-point operations per second (TeraFlops). “At the end of the day, what we're really looking for is customer experience and user experience, which involves combining a natural language interface with a graphical user interface. So at the end of the day, we're looking for experience rather than false performance metrics.”

Tangri said that Intel has cooperated with Microsoft over the past few years to establish standards, such as the Open Neural Network Exchange ONNX (Open Neural Network Exchange) standard. Regarding the level of preparation of developers, he said that Intel currently has an operation demonstration of cutting-edge research, which can run entirely in a PC environment. “So we've really catered to developers' needs, lowered the threshold for innovation on our platform, and there's no need to use it online or in the cloud; it can all be done on your PC.”

Disclaimer: This content is for informational and educational purposes only and does not constitute a recommendation or endorsement of any specific investment or investment strategy. Read more
    Write a comment