Local alternative to cursor ctrl k

why

I have been using cursor for a while now, it's a great product wiht a lot of functionality.

I am a heavy terminal user but I have a bad memory for terminal command, I think some command line argument of tool like scp, tar or docker are just unlearnable for sane human. Hopefully llm are good at memorizing stuff. One of my favorite cursor features is to be able to prompt model to interact direclty with the terminal. This is done by pressing ctrl-k in the terminal, write a query and execute the proposed command.

This works great most of the time, and when it does not it usually mean that I will need to have a multi turn chat session with the model to make it understand what I want.

But its has several flaws imo:

It slow, it takes multiple seconds to get the command, mainly because it is using big model behind an api
It only work inside cursor integrated terminal, but I fallback to use allacritty terminal each time I am outside of a coding enviroment.

So I figure out that both problem would be fixed if I could just call a small model from within my terminal.

One can argue than even though small model are faster to run locally their ratio latency/smartness is really bad compare to calling an api.

Well this is true for most usecase, but I would argue that remenbering terminal command and argument is not that hard for a model.

what

So let's try to setup Qwen coder2.5 0.5b locally and use it from the terminal :)

I am gonna use the llm cli which makes everything super easy. Yeah this blog post is just a wrapper around llm documentation.

Before I show you how to install it, lets look at how the end usage look like

Pretty nice, short, work in any terminal, and it is fast.

PS: I am using qwen 0.5b but one could you any llama.cpp model as well as OpenAI or claude model

how

First you need to install llm

I recommand using uv as it is the fastest way to install cli tool and manage python.

if you don't have uv installed, you can install it with

curl -LsSf https://astral.sh/uv/install.sh | sh

then do

uv tool install llm

If you prefer to use pip (not good :( ) feel free to use

pip install llm

Once llm is install you need to install the gguf plugin which is a wrapper around llama.cpp.

llm install llm-gguf

Let's now download the whole qwen family. (Feel free to only download 0.5b if you just want to tests)

llm gguf download-model https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct-GGUF/resolve/main/qwen2.5-coder-0.5b-instruct-fp16.gguf --alias qwen-coder-2.5-0.5b --alias qw0.5

llm gguf download-model https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF/resolve/main/qwen2.5-coder-1.5b-instruct-q8_0.gguf --alias qwen-coder-2.5-1.5b --alias qw1.5b

llm gguf download-model https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct-GGUF/resolve/main/qwen2.5-coder-3b-instruct-q8_0.gguf --alias qwen-coder-2.5-3b --alias qw3b

llm gguf download-model https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q8_0.gguf --alias qwen-coder-2.5-7b --alias qw7b

To test that the models are properly downloaded you can do

llm -m qw0.5 "hello"

you can list model as well using

llm models
OpenAI Chat: gpt-4o (aliases: 4o)
OpenAI Chat: gpt-4o-mini (aliases: 4o-mini)
OpenAI Chat: gpt-4o-audio-preview
OpenAI Chat: gpt-4o-audio-preview-2024-12-17
OpenAI Chat: gpt-4o-audio-preview-2024-10-01
OpenAI Chat: gpt-4o-mini-audio-preview
OpenAI Chat: gpt-4o-mini-audio-preview-2024-12-17
OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
OpenAI Chat: gpt-4-1106-preview
OpenAI Chat: gpt-4-0125-preview
OpenAI Chat: gpt-4-turbo-2024-04-09
OpenAI Chat: gpt-4-turbo (aliases: gpt-4-turbo-preview, 4-turbo, 4t)
OpenAI Chat: o1
OpenAI Chat: o1-2024-12-17
OpenAI Chat: o1-preview
OpenAI Chat: o1-mini
OpenAI Chat: o3-mini
OpenAI Completion: gpt-3.5-turbo-instruct (aliases: 3.5-instruct, chatgpt-instruct)
GgufChatModel: gguf/qwen2.5-coder-0.5b-instruct-fp16 (aliases: qwen-coder-2.5-0.5b, qw0.5)
GgufChatModel: gguf/qwen2.5-coder-1.5b-instruct-fp16 (aliases: qwen-coder-2.5-1.5b, qw1.5b)
GgufChatModel: gguf/qwen2.5-coder-1.5b-instruct-q8_0 (aliases: qwen-coder-2.5-1.5b, qw1.5b)
GgufChatModel: gguf/qwen2.5-coder-3b-instruct-q8_0 (aliases: qwen-coder-2.5-3b, qw3b)
GgufChatModel: gguf/qwen2.5-coder-7b-instruct-q8_0 (aliases: qwen-coder-2.5-7b, qw7b)
Default: gguf/qwen2.5-coder-3b-instruct-q8_0

Let's now setup the template cmd with a system prompt.

For linux (ubuntu) user

llm --system 'reply with ubuntu terminal commands only, no extra information' --model qw0.5 --save cmd

for mac

llm --system 'reply with ubuntu terminal commands only, no extra information' --model qw0.5 --save cmd

At this stage you should be able to do

>>> llm -t cmd "git clone torch"
git clone https://github.com/pytorch/pytorch.git

but I prefer to have even a shorter alias as I want this to be quick to type

Last step is to add into your .bashrc or .zshrc (or whatever you use)

alias llmk='f() { llm -t cmd "\"$*\""; }; f'

Finally you can try

llmk git clone torch bash git clone https://github.com/pytorch/torch.git

the end.