Technical Documentation

Ollama On Windows: Analysis Of Unexpected CPU Usage Instead Of GPU

Technical guide covering **ollama on windows: analysis of unexpected cpu usage instead of gpu**

👤
Author
Cosmic Lounge AI Team
📅
Updated
6/1/2025
⏱️
Read Time
15 min
Topics
#llm #ai #model #gpu #cuda #docker #api #server #introduction #design

📖 Reading Mode

📖 Table of Contents

🌌 Ollama on Windows: Analysis of Unexpected CPU Usage Instead of GPU

1. Introduction

Ollama has emerged as a valuable open-source tool that enables users to run large language models (LLMs) locally across various operating systems, including Windows 1. By simplifying the complexities associated with LLM deployment, Ollama facilitates easy downloading, management, and serving of models through a user-friendly command-line interface and a straightforward API 1. A critical aspect of achieving satisfactory performance with these computationally intensive models on personal computers is the effective utilization of hardware accelerators, particularly Graphics Processing Units (GPUs) 6. Recognizing this need, Ollama on Windows incorporates built-in support for NVIDIA GPUs, aiming to harness their parallel processing capabilities to significantly accelerate the inference process 11. However, recent reports from the Ollama user community on Windows indicate a concerning trend: the unexpected utilization of the central processing unit (CPU) for model processing instead of the GPU [User Query].

This shift in resource allocation results in a noticeable decline in performance, characterized by considerably slower response times, a substantial increase in CPU load, and in some instances, overall system sluggishness 6. This deviation from the expected GPU-accelerated performance has understandably led to frustration among users who previously experienced the benefits of GPU utilization. The fact that users are reporting a recent change in this behavior [User Query] is particularly noteworthy. This report aims to address this critical issue by thoroughly investigating the reported instances of CPU usage in place of GPU by Ollama on Windows. The objectives include identifying the various potential causes contributing to this behavior, providing a structured and comprehensive set of troubleshooting steps that affected Windows users can follow, and offering a range of actionable workarounds and fixes designed to restore the intended GPU utilization and achieve optimal performance.

2. Recent Reports of CPU Usage Issues on Windows

Analysis of recent user reports across various online platforms reveals a consistent theme of Ollama on Windows unexpectedly relying on the CPU for processing, despite the presence of capable GPUs. These reports span different GPU vendors and hardware configurations, indicating a potentially widespread issue. For users with Nvidia GPUs, the problem manifests in several ways. On the Ollama GitHub repository, discussions in issues such as #3201 and #6008 describe scenarios where Ollama might initially show some level of GPU activity, but this quickly diminishes, leading to the CPU taking over the primary processing load 6. This results in slow response times that are characteristic of CPU-bound operations. Similarly, on Reddit forums like r/LocalLLaMA and r/ollama, users actively seek assistance in getting their Nvidia GPUs to be recognized and effectively utilized by Ollama on Windows 12. Many describe experiencing very slow inference speeds, which strongly suggests that the CPU is bearing the brunt of the computational workload. The recurrence of these reports across diverse Nvidia GPU models and driver versions points to a potential systemic issue in how Ollama interacts with Nvidia GPUs within the Windows environment. Users with Intel Arc GPUs also report significant difficulties. Discussions on Intel’s community forums clearly indicate that individuals with Intel Arc GPUs, such as the A750, are finding that Ollama defaults to using the CPU 15. This occurs despite users attempting to follow available online guides and resources aimed at enabling GPU acceleration. The official support status for Intel Arc GPUs within Ollama appears to be a point of confusion and concern for these users. Their efforts to apply general guidelines or adapt instructions intended for other GPU models, like the A770, often prove unsuccessful, further highlighting the lack of specific support for their hardware. Even users with AMD GPUs, for which Ollama announced preview support on Windows 16, are encountering challenges. Reports on platforms like NixOS Discourse and Reddit indicate that proper GPU utilization is not always straightforward 17. In some cases, users on Linux have found the need for specific packages, such as ollama-rocm, which hints at potential underlying issues with ROCm (AMD’s platform for GPU computing) integration on Windows as well. A YouTube video demonstrates how to identify CPU versus GPU usage and even discusses methods to force Ollama to utilize officially unsupported AMD cards 8. Beyond these vendor-specific issues, a considerable number of users, regardless of their specific GPU, report general performance problems with Ollama on Windows that are highly indicative of CPU usage 9. These users describe slow response times during model inference and high CPU utilization as reported by system monitoring tools. These symptoms are classic signs that the computationally demanding tasks of running LLMs are being handled by the CPU, which is inherently less efficient for these types of workloads compared to a dedicated GPU.

3. Investigating GitHub Issues

A review of the issue tracker on the Ollama GitHub repository provides valuable insights into the history and nature of GPU utilization problems on Windows. Several issues reported by users shed light on potential causes and attempted solutions.

Issue #3201, reported between March and April 2024, describes a scenario where a user with an Nvidia RTX 3050 on Windows observed an initial increase in GPU usage (around 25%) when running a model. However, for subsequent queries, the GPU usage dropped dramatically to between 0 and 1 percent, resulting in significantly slower response times 6. Developers suggested that this behavior could be related to Ollama’s memory prediction algorithms and the offloading of model layers to the CPU when the GPU’s VRAM is nearing its capacity. While this issue was eventually closed as “completed” following improvements to memory management, the continued emergence of similar user reports suggests that the underlying conditions leading to CPU fallback might still occur under specific circumstances or with particular models. This early issue highlights the critical role of efficient memory management in ensuring consistent GPU utilization.

Issue #6008, reported in July 2024, details a user experience on a high-end Windows system equipped with an Nvidia RTX 4090. The user observed that both the CPU and GPU were being utilized during model inference, while their expectation was that the GPU would handle the processing exclusively for optimal speed 7. Developers clarified that if the entirety of the loaded language model cannot be accommodated within the dedicated VRAM of the GPU, then the remaining portions of the model will be processed by the CPU. Furthermore, on Windows, when the dedicated VRAM is full, the system can utilize shared GPU memory, which is a portion of the system RAM allocated to the GPU. However, the bandwidth of system RAM is considerably lower than that of dedicated VRAM, leading to a performance penalty. This explains why even with a powerful GPU and seemingly sufficient total memory, CPU involvement can occur if the model’s footprint exceeds the dedicated VRAM. This issue underscores the crucial relationship between the size of the LLM being used and the available dedicated VRAM on the GPU. A particularly telling report is Issue #8961, from February 2025. A user experienced a severe drop in performance after upgrading their Ollama installation. They observed high CPU usage and, notably, no GPU memory usage as reported by the nvidia-smi tool, even though the ollama ps command indicated 100% GPU utilization 8. The fact that reverting their system to the previous Ollama version resolved the performance issue strongly suggests that the upgrade introduced a regression that negatively impacted GPU utilization, possibly in the way the newer version detects or manages GPU resources. This experience highlights that software updates to Ollama can inadvertently introduce bugs or alter internal configurations in a manner that diminishes or prevents GPU utilization on Windows.

Issue #2794, discussed between February and May 2024, addresses scenarios where users reported low GPU utilization even when the amount of VRAM appeared to be sufficient for the loaded model 24. This hinted at the possibility that Ollama might be unnecessarily offloading some processing tasks to the CPU. A suggested workaround that proved effective for some users involved setting the environment variable OLLAMA_MAX_VRAM to explicitly specify the GPU’s total VRAM capacity. This action reportedly improved the utilization of the GPU’s memory.

Issue #4984, from June 2024, describes a user who found that while Ollama could successfully utilize their GPU immediately after installation, the GPU was no longer detected by Ollama after a system reboot 25. This suggests potential instability in how Ollama initializes or maintains its connection with the GPU drivers on Windows, possibly being disrupted by system events such as reboots or other software interactions. Similar to Issue #8961, Issue #6826, from September and October 2024, also points to performance regressions experienced after upgrading Ollama 26. These regressions are potentially linked to modifications in how GPU memory is handled internally within the application. While primarily focused on a Docker environment, Issue #7006, reported between September and October 2024, underscores the critical importance of having compatible Nvidia driver components installed on the host system and within the container runtime 27. Mismatches in driver versions can prevent a containerized instance of Ollama from properly accessing and utilizing the GPU. This principle of driver compatibility is also directly applicable to native Windows installations of Ollama. A more recent report, Issue #9791, from March 2025, details a significant performance drop specifically when using the Gemma language model after an Ollama upgrade 28. The user also encountered crashes when attempting to use higher context lengths with this model. This suggests that GPU utilization problems might sometimes be specific to certain language models or arise from complex interactions between particular models and specific versions of Ollama. Finally, while specific to an Intel-optimized build of Ollama known as IPEX-LLM, Issue #12831 reveals a potential limitation in the reporting capabilities of the ollama ps command on Windows 29. This issue indicates that ollama ps might incorrectly report CPU usage even when the GPU is actively engaged in processing, particularly for Intel GPUs. Users are advised to cross-reference the output of ollama ps with system-level monitoring tools like the Windows Task Manager to obtain a more accurate assessment of resource utilization.

4. Potential Causes for the Shift to CPU Usage

The observed shift from GPU to CPU usage in Ollama on Windows can be attributed to a variety of interconnected factors. Understanding these potential causes is crucial for effective troubleshooting and resolution. One significant factor is the impact of recent Ollama software updates 23. As highlighted by the GitHub issue analysis, upgrades can sometimes introduce unintended consequences, such as regressions in GPU support or alterations in internal mechanisms that affect how Ollama utilizes hardware resources. Changes in memory management routines, CUDA integration, or even updates to the underlying llama.cpp library can all potentially lead to a shift towards CPU processing 11. Another common cause of GPU-related issues is changes in GPU drivers 13. Outdated drivers can lack the necessary support for newer software versions, while even recent driver updates can sometimes introduce bugs or alter the way applications interact with the GPU. It is possible that a recent update to the user’s Nvidia, AMD, or Intel GPU drivers has inadvertently created a compatibility problem with the current version of Ollama, leading to the observed CPU fallback. Ensuring that the correct and most up-to-date drivers are installed for the specific GPU model is therefore a fundamental troubleshooting step. The memory requirements of the language model being used and the limitations of the GPU’s VRAM are also critical factors 6. Large language models have substantial memory footprints, and if the model’s size exceeds the dedicated Video RAM (VRAM) of the GPU, Ollama might offload some of the model’s layers to the system RAM and process them on the CPU. While Windows can utilize shared GPU memory (a portion of system RAM allocated for GPU use), the significantly lower bandwidth compared to dedicated VRAM can create performance bottlenecks that might be perceived as the CPU doing the primary work. The size of the model, often indicated by the number of parameters and its file size, along with the quantization level used (which affects memory usage), directly impacts the VRAM requirements. Users should be mindful of their GPU’s VRAM capacity and select models that fit comfortably within it for optimal GPU-accelerated performance.

Windows-specific configurations or resource management could also play a role, although perhaps less directly 7. The way Windows manages GPU resources, including the allocation and utilization of shared GPU memory, might influence how efficiently Ollama can utilize the GPU for server processes running in the background. Systems equipped with both integrated (iGPU) and dedicated (dGPU) graphics cards can sometimes experience issues where Ollama mistakenly attempts to use the less powerful iGPU or inefficiently splits the workload between the two 13. This is particularly common in laptop configurations. If Ollama is not correctly identifying or prioritizing the dedicated GPU, which is designed for more demanding tasks like LLM inference, the user will likely observe poor performance indicative of CPU or iGPU usage. The level of official support for certain GPU vendors within Ollama on Windows is another potential cause 15. While NVIDIA GPUs are officially supported, the support for other vendors like AMD and Intel Arc might be less mature or even absent in official builds. In the absence of specific optimizations or compatibility layers for a particular GPU architecture, Ollama might default to using the CPU as the primary processing resource. Users with GPUs from vendors other than Nvidia should consult the official Ollama documentation and community forums to ascertain the current status of support for their specific hardware. Finally, incorrect configuration settings or environment variables could inadvertently lead to CPU usage 33. Users might have unintentionally set environment variables, such as OLLAMA_USE_GPU=0, which explicitly instruct Ollama to use the CPU. Similarly, incorrect or outdated CUDA or ROCm configurations (if the user has attempted manual setup or is using specific builds) could prevent proper GPU utilization.

5. Troubleshooting Steps for Windows Users

To effectively diagnose and resolve the issue of Ollama on Windows unexpectedly using the CPU instead of the GPU, users should follow a systematic set of troubleshooting steps. The first crucial step is to check the Ollama server logs 39. These logs often contain valuable information about GPU detection and utilization. On Windows, they are typically located in the %LOCALAPPDATA%\Ollama directory. Users should look for entries indicating whether the GPU was detected (e.g., “Nvidia GPU detected”), the number of model layers offloaded to the GPU (e.g., “offloaded X/Y layers to GPU”), and any error messages related to CUDA or GPU initialization. If the logs contain messages like “no compatible GPUs were discovered” 8, it suggests a fundamental problem with GPU detection. Next, users should utilize Windows Task Manager or other system monitoring tools 6. The “Performance” tab in Task Manager provides real-time CPU and GPU utilization data. Observing high CPU usage and minimal to no activity on the dedicated GPU while Ollama is running a model confirms that the CPU is handling the processing.

Verifying GPU driver compatibility and version is another essential step 15. Users can check their installed driver version in the Windows Device Manager. Comparing this version with the latest recommended drivers available on the official websites of Nvidia, AMD, or Intel is crucial. Performing a clean installation of the latest drivers is often recommended to rule out any issues with previous installations. Users should also examine the Ollama version information and release notes 11. Running ollama —version in the command prompt or PowerShell will display the installed version. Checking the official Ollama website or GitHub releases page for the release notes associated with that version and any more recent updates might reveal information about GPU support changes or known issues on Windows.

Testing with different language models can help isolate the problem 1. Trying models of varying sizes and quantization levels can indicate whether the issue is model-specific or a more general problem with GPU utilization on the system. If the GPU is utilized correctly with smaller models but not larger ones, it often points to a VRAM limitation.

Checking for relevant environment variables is also important 13. Users should open the System Properties in Windows and look for any variables related to Ollama or GPU computing in the “System variables” or “User variables” section. Ensuring these variables are set to the desired values (e.g., OLLAMA_USE_GPU should be 1 or not set for default GPU usage) is crucial. Users can also temporarily set these variables in the terminal before running Ollama to test their effect. Finally, users should verify their GPU’s compatibility with Ollama 32. For Nvidia GPUs, Ollama requires a compute capability of 5.0 or higher. Users can find a list of CUDA-enabled GPUs and their compute capabilities on Nvidia’s official website. For AMD and Intel GPUs, checking the official Ollama documentation or community discussions for any specific compatibility requirements is necessary.

6. Suggested Workarounds and Fixes on Windows

Based on the analysis of user reports and GitHub issues, several workarounds and fixes can be suggested for Windows users experiencing unexpected CPU usage with Ollama. If the issue started after a recent Ollama update, downgrading to a previous version might resolve the problem 23. Older installers can usually be found on the Ollama GitHub releases page. Ensuring that the latest compatible GPU drivers are installed is crucial 32. Users should visit the official websites of their GPU vendor and download the appropriate drivers, considering performing a clean installation. Experimenting with configuring environment variables can sometimes force GPU utilization 13. Trying OLLAMA_USE_GPU=1 or setting CUDA_VISIBLE_DEVICES=0 (for Nvidia) or HIP_VISIBLE_DEVICES=0 (for AMD) might be effective. If VRAM limitations are suspected, adjusting model parameters or selecting smaller models is recommended 1. Using models with fewer parameters or lower quantization levels can reduce memory usage. For systems with both integrated and dedicated graphics, users should attempt to force the use of the dedicated GPU 13. This can often be done through the system BIOS/UEFI settings or the control panel of the dedicated graphics card. Advanced users might consider using Docker with GPU support 27. This can provide a more controlled environment for managing dependencies and ensuring proper GPU access. As a temporary measure, users can force CPU usage by setting OLLAMA_USE_GPU=0 or CUDA_VISIBLE_DEVICES=-1 32. While this will result in slower performance, it can allow continued use of Ollama while troubleshooting the GPU issue.

7. Summary of Findings and Recommendations

The investigation into reports of Ollama on Windows unexpectedly using the CPU instead of the GPU reveals a complex issue with multiple potential contributing factors. These include recent Ollama software updates, changes in GPU drivers, limitations in GPU VRAM, Windows-specific resource management, incorrect GPU detection on dual-GPU systems, varying levels of official support for different GPU vendors, and improper user configurations. To diagnose this problem, Windows users should systematically check the Ollama server logs, monitor CPU and GPU utilization using system tools, verify the compatibility and version of their GPU drivers, examine the installed Ollama version and release notes, test with different language models, and check for relevant environment variable settings. Based on the findings, the following workarounds and fixes are recommended:

  • Consider downgrading to a previous version of Ollama if the issue arose after an update.

  • Ensure the latest compatible GPU drivers are installed, potentially performing a clean installation.

  • Experiment with setting environment variables like OLLAMA_USE_GPU, CUDA_VISIBLE_DEVICES (for Nvidia), and HIP_VISIBLE_DEVICES (for AMD).

  • Try using smaller or more quantized language models that fit within your GPU’s VRAM.

  • On systems with both integrated and dedicated GPUs, configure the system to prioritize the dedicated GPU for Ollama.

  • Explore running Ollama within a Docker container with GPU support for a more controlled environment. For further assistance and to contribute to the resolution of this issue, users are encouraged to consult the Ollama GitHub repository and join the Ollama Discord server. The optimal solution may vary depending on the user’s specific hardware and software configuration. A systematic approach to troubleshooting, combined with active engagement with the Ollama community, will be crucial in addressing this performance-impacting problem.

🔧 Works cited

1. Unplugging the Cloud: My Journey Running LLMs Locally with Ollama - Naveed Ul Mustafa, accessed on March 27, 2025, https://numustafa.medium.com/unplugging-the-cloud-my-journey-running-llms-locally-with-ollama-faacd75d17e7 2. ollama/ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, and other large language models. - GitHub, accessed on March 27, 2025, https://github.com/ollama/ollama 3. Ollama, accessed on March 27, 2025, https://ollama.com/ 4. Exploring Ollama & LM Studio - dasarpAI, accessed on March 27, 2025, https://dasarpai.com/dsblog/exploring-ollama 5. 3-ways to Set up LLaMA 2 Locally on CPU (Part 2 — Ollama) | by Antoine Frd | Medium, accessed on March 27, 2025, https://medium.com/@fradin.antoine17/3-ways-to-set-up-llama-2-locally-on-cpu-part-2-ollama-c9d5d71612c9 6. Ollama is not using my GPU (Windows) · Issue #3201 - GitHub, accessed on March 27, 2025, https://github.com/ollama/ollama/issues/3201 7. Ollama is running on both CPU and GPU - expected to use GPU only · Issue #6008 - GitHub, accessed on March 27, 2025, https://github.com/ollama/ollama/issues/6008 8. Four Ways to Check if Ollama is Using Your GPU or CPU - YouTube, accessed on March 27, 2025, https://www.youtube.com/watch?v=on3rtyPWSgA 9. Ollama on CPU Performance - Some Data and a Request for More, accessed on March 27, 2025, https://forum.level1techs.com/t/ollama-on-cpu-performance-some-data-and-a-request-for-more/214896 10. Four Ways to Check if Ollama is Using Your GPU or CPU - YouTube, accessed on March 27, 2025, https://www.youtube.com/watch?v=on3rtyPWSgA&pp=0gcJCfcAhR29_xXO 11. Windows preview · Ollama Blog, accessed on March 27, 2025, https://ollama.com/blog/windows-preview 12. Need help with gpu usage : r/ollama - Reddit, accessed on March 27, 2025, https://www.reddit.com/r/ollama/comments/1h5s15v/need_help_with_gpu_usage/ 13. How do I enable my GPU for Ollama LLMs on Windows? - Reddit, accessed on March 27, 2025, https://www.reddit.com/r/ollama/comments/1dh54cz/how_do_i_enable_my_gpu_for_ollama_llms_on_windows/ 14. GPU issues with windows : r/ollama - Reddit, accessed on March 27, 2025, https://www.reddit.com/r/ollama/comments/1jau6m9/gpu_issues_with_windows/ 15. Re: Re:Ollama on intel arc 750 - Intel Community, accessed on March 27, 2025, https://community.intel.com/t5/Intel-ARC-Graphics/Ollama-on-intel-arc-750/m-p/1660876 16. Blog · Ollama, accessed on March 27, 2025, https://ollama.com/blog 17. Ollama isn’t using GPU ( AMD RX 7900 XTX ) - Help - NixOS Discourse, accessed on March 27, 2025, https://discourse.nixos.org/t/ollama-isnt-using-gpu-amd-rx-7900-xtx/59596 18. Windows 10 GPU usage : r/ollama - Reddit, accessed on March 27, 2025, https://www.reddit.com/r/ollama/comments/1ige721/windows_10_gpu_usage/ 19. Ollama using too much CPU : r/LocalLLaMA - Reddit, accessed on March 27, 2025, https://www.reddit.com/r/LocalLLaMA/comments/1f3m0sc/ollama_using_too_much_cpu/ 20. How to optimize Ollama for CPU-only inference? : r/LocalLLaMA - Reddit, accessed on March 27, 2025, https://www.reddit.com/r/LocalLLaMA/comments/1csgnbh/how_to_optimize_ollama_for_cpuonly_inference/ 21. GPU is not being fully utilized and ollama/qwen2.5:32b is slow - oTTomator Community, accessed on March 27, 2025, https://thinktank.ottomator.ai/t/gpu-is-not-being-fully-utilized-and-ollama-qwen2-5-32b-is-slow/870 22. Model occasionally continues to use CPU despite having finished responding. : r/ollama, accessed on March 27, 2025, https://www.reddit.com/r/ollama/comments/1iljej9/model_occasionally_continues_to_use_cpu_despite/ 23. After ollama upgrade, severe performance drop with deepseek …, accessed on March 27, 2025, https://github.com/ollama/ollama/issues/8961 24. Window version not fully utilize gpu · Issue #2794 - GitHub, accessed on March 27, 2025, https://github.com/ollama/ollama/issues/2794 25. Ollama not using GPU after OS Reboot · Issue #4984 - GitHub, accessed on March 27, 2025, https://github.com/ollama/ollama/issues/4984 26. Massive performance regression on 0.1.32 -> GGML_CUDA_FORCE_MMQ: (SET TO NO, after 0.1.31) · Issue #6826 · ollama/ollama - GitHub, accessed on March 27, 2025, https://github.com/ollama/ollama/issues/6826 27. Ollama can’t use my Nvidia GPU anymore? · Issue #7006 - GitHub, accessed on March 27, 2025, https://github.com/ollama/ollama/issues/7006 28. Out of memory errors when running gemma3 · Issue #9791 - GitHub, accessed on March 27, 2025, https://github.com/ollama/ollama/issues/9791 29. Ollama reports model is 100% on CPU when actually running on GPU · Issue #12831 · intel/ipex-llm - GitHub, accessed on March 27, 2025, https://github.com/intel/ipex-llm/issues/12831 30. Releases · ollama/ollama - GitHub, accessed on March 27, 2025, https://github.com/ollama/ollama/releases 31. ollama 0.6.2 Released WIth Support For AMD Strix Halo - Phoronix, accessed on March 27, 2025, https://www.phoronix.com/news/ollama-0.6.2 32. Ollama Use GPU Windows Guide | Restackio, accessed on March 27, 2025, https://www.restack.io/p/ollama-answer-gpu-windows-cat-ai 33. Ollama Disable Gpu Settings | Restackio, accessed on March 27, 2025, https://www.restack.io/p/ollama-answer-disable-gpu-cat-ai 34. Ollama GPU Settings Guide | Restackio, accessed on March 27, 2025, https://www.restack.io/p/ollama-answer-gpu-settings-cat-ai 35. Ollama GPU Support - Reddit, accessed on March 27, 2025, https://www.reddit.com/r/ollama/comments/1b35im0/ollama_gpu_support/ 36. Fix Common Issues with Ollama - Easy Explanation - YouTube, accessed on March 27, 2025, https://www.youtube.com/watch?v=2bTHQx5qW8s&pp=0gcJCfcAhR29_xXO 37. Ollama doesn’t use GPU pls help - Reddit, accessed on March 27, 2025, https://www.reddit.com/r/ollama/comments/1c8ddv8/ollama_doesnt_use_gpu_pls_help/ 38. Ollama not using my gpu whatsoever. · Issue #2012 - GitHub, accessed on March 27, 2025, https://github.com/jmorganca/ollama/issues/2012 39. Ollama Discord Integration | Restackio, accessed on March 27, 2025, https://www.restack.io/p/ollama-answer-discord-integration-cat-ai 40. Ollama Discord Server Community | Restackio, accessed on March 27, 2025, https://www.restack.io/p/ollama-answer-discord-server-cat-ai 41. Ollama Discord Bot GitHub | Restackio, accessed on March 27, 2025, https://www.restack.io/p/ollama-answer-discord-bot-github-cat-ai 42. Ollama Without GPU On Windows | Restackio, accessed on March 27, 2025, https://www.restack.io/p/ollama-answer-windows-without-gpu-cat-ai 43. Ollama Not Opening Issues | Restackio, accessed on March 27, 2025, https://www.restack.io/p/ollama-answer-not-opening-cat-ai 44. Ollama-cuda not using gpu acceleration - Applications - EndeavourOS, accessed on March 27, 2025, https://forum.endeavouros.com/t/ollama-cuda-not-using-gpu-acceleration/50479 45. How to run Ollama only on a dedicated GPU? (Instead of all GPUs) · Issue #1813 - GitHub, accessed on March 27, 2025, https://github.com/ollama/ollama/issues/1813 46. Ollama in Docker reports 100% GPU, but runs on CPU instead - Reddit, accessed on March 27, 2025, https://www.reddit.com/r/ollama/comments/1ip9772/ollama_in_docker_reports_100_gpu_but_runs_on_cpu/ 47. 238SAMIxD/discord-ai-bot: Discord AI chatbot using Ollama and Stable Diffusion - GitHub, accessed on March 27, 2025, https://github.com/238SAMIxD/discord-ai-bot