请问谁知道如何让ollama支持npu
4850 6
[1 楼] 虚化 [资深泡菜]
2-10 20:52
手头一台ultra185h笔记本和一台mac mini m4都有npu,这两天在玩ollama,发现cpu gpu都用上了,专门给AI用的npu却没用上,太讽刺了
谁知道如何让ollama支持npu?
[7 楼] 一晴方觉夏深 [泡菜]
2-22 15:07
期待ollama能够支持M芯片的NPU
[6 楼] 闲聊状态 [泡菜]
2-22 12:38
INTEL的CPU里的NPU 可以看看 github上的 ipex 这个项目,INTEL 专属,其中就有NPU的支持
[5 楼] 虚化 [资深泡菜]
2-21 22:49
微软提供了Copilot+ PC的npu支持,回头在我的ultra 185h上试试看

https://learn.microsoft.com/en-us/windows/ai/toolkit/toolkit-getting-started?utm_source=chatgpt.com&tabs=rest

Running Distilled DeepSeek R1 models locally on Copilot+ PCs, powered by Windows Copilot Runtime   

The Neural Processing Unit (NPU) on Copilot+ PCs offers a highly efficient engine for model inferencing,
unlocking a paradigm where generative AI can execute not just when
invoked, but enable semi-continuously running services. This empowers
developers to tap into powerful reasoning engines to build proactive and
sustained experiences.
With our work on Phi Silica, we were able to harness highly efficient inferencing – delivering very
competitive time to first token and throughput rates, while minimally
impacting battery life and consumption of PC resources. Running models
on the NPU is about speed and efficiency. For example, as mentioned in
previous posts, the Phi Silica token iterator on the NPU exhibits a 56%
improvement in power consumption compared to operating on the CPU. Such
efficiency enables new experiences that demand such state-of-the art
models to be in the main loop of the program without draining your
battery or overly heating your device. The optimized DeepSeek models for
the NPU take advantage of several of the key learnings and techniques
from that effort, including how we separate out the various parts of the
model to drive the best tradeoffs between performance and efficiency,
low bit rate quantization and mapping transformers to the NPU.
Additionally, we take advantage of Windows Copilot Runtime (WCR) to
scale across the diverse Windows ecosystem with ONNX QDQ format.
[4 楼] CATWK [泡菜]
2-11 09:08
npu是不用cuda的吧?可能需要driver
CATWK 编辑于 2025-02-11 09:09
[3 楼] qingcai [资深泡菜]
2-10 23:01
感觉搞一个 mac pro 96gb ram的最给力了。
[2 楼] notebook1 [陈年泡菜]
2-10 22:58
ollama 是用gpu纯计算的。
其实 mac的统一内存合适,一般弄张几十G显存的显卡很贵
而Mac 内存和显存是统一的