请问谁知道如何让ollama支持npu

4850 6

[1 楼] 虚化 [资深泡菜] 2-10 20:52 手头一台ultra185h笔记本和一台mac mini m4都有npu，这两天在玩ollama，发现cpu gpu都用上了，专门给AI用的npu却没用上，太讽刺了谁知道如何让ollama支持npu？
[7 楼] 一晴方觉夏深 [泡菜] 2-22 15:07 期待ollama能够支持M芯片的NPU
[6 楼] 闲聊状态 [泡菜] 2-22 12:38 INTEL的CPU里的NPU 可以看看 github上的 ipex 这个项目，INTEL 专属，其中就有NPU的支持
[5 楼] 虚化 [资深泡菜] 2-21 22:49 微软提供了Copilot+ PC的npu支持，回头在我的ultra 185h上试试看 https://learn.microsoft.com/en-us/windows/ai/toolkit/toolkit-getting-started?utm_source=chatgpt.com&tabs=rest Running Distilled DeepSeek R1 models locally on Copilot+ PCs, powered by Windows Copilot Runtime The Neural Processing Unit (NPU) on Copilot+ PCs offers a highly efficient engine for model inferencing, unlocking a paradigm where generative AI can execute not just when invoked, but enable semi-continuously running services. This empowers developers to tap into powerful reasoning engines to build proactive and sustained experiences. With our work on Phi Silica, we were able to harness highly efficient inferencing – delivering very competitive time to first token and throughput rates, while minimally impacting battery life and consumption of PC resources. Running models on the NPU is about speed and efficiency. For example, as mentioned in previous posts, the Phi Silica token iterator on the NPU exhibits a 56% improvement in power consumption compared to operating on the CPU. Such efficiency enables new experiences that demand such state-of-the art models to be in the main loop of the program without draining your battery or overly heating your device. The optimized DeepSeek models for the NPU take advantage of several of the key learnings and techniques from that effort, including how we separate out the various parts of the model to drive the best tradeoffs between performance and efficiency, low bit rate quantization and mapping transformers to the NPU. Additionally, we take advantage of Windows Copilot Runtime (WCR) to scale across the diverse Windows ecosystem with ONNX QDQ format.
[4 楼] CATWK [泡菜] 2-11 09:08 npu是不用cuda的吧？可能需要driver CATWK 编辑于 2025-02-11 09:09
[3 楼] qingcai [资深泡菜] 2-10 23:01 感觉搞一个 mac pro 96gb ram的最给力了。
[2 楼] notebook1 [陈年泡菜] 2-10 22:58 ollama 是用gpu纯计算的。其实 mac的统一内存合适，一般弄张几十G显存的显卡很贵而Mac 内存和显存是统一的