Installation
LINUS-AI ships as a single self-contained binary with no runtime dependencies.
Download the appropriate build for your platform, mark it executable, and place it
on your PATH. That's all there is to it.
System Requirements
Linux (x86_64 & ARM64)
v4.0.0 ships headless CLI binaries for Linux, macOS, and Windows. CUDA-enabled builds include GPU support out of the box. After install, consider setting up a systemd service for automatic startup on Linux. GUI installers (DMG / NSIS) are available on the Download page.
systemd Service (Linux)
To run LINUS-AI automatically on boot as a background server, create a systemd
unit file. Replace your-user with your actual Linux username.
License Activation
LINUS-AI includes a 30-day free trial. After that, a license is required.
Licenses are issued in the format LNAI-XXXX-XXXX-XXXX-XXXX and
delivered to the email address used at checkout.
30-Day Trial
No action is required to start the trial. On first launch, LINUS-AI automatically enters trial mode. A banner is displayed at startup showing the remaining trial days. All features available on the Community plan are accessible during the trial.
CLI Activation
Activate with your license key and the email address used at purchase. An internet connection is required only for the initial activation handshake. After that, the binary operates fully offline.
Config File Activation
You can also add the license key directly to the configuration file. This is convenient for automated deployments or when managing multiple machines.
Environment Variable
For CI/CD pipelines or containerised deployments, supply the key via an environment variable. The email is not required when using the environment variable method after the initial machine binding has been completed.
Check License Status
Trial Expiry
When the 30-day trial expires, LINUS-AI enters read-only mode. Existing conversations in the vault remain accessible, but new inference requests are blocked until a valid license key is activated. The control panel GUI continues to load and will display an activation prompt.
Machine Binding (Perpetual Licenses)
Perpetual licenses are bound to the machine on which they are first activated. The binding uses a hardware fingerprint derived from CPU serial, motherboard UUID, and primary network interface MAC address. Virtual machines generate stable fingerprints. Each seat in your license can be bound to one machine at a time.
Deactivate & Move to Another Machine
To move a license seat to a new machine, deactivate it on the current machine first. This releases the machine binding and allows activation on a different device.
Model Management
LINUS-AI uses the Ollama model registry for downloading and managing models.
Models are stored locally in ~/.linus_ai/models/ and are never
uploaded or shared. Once downloaded, models work fully offline.
Pulling, Listing, and Deleting Models
Quantization Levels
Quantization reduces model size and memory usage at a small cost to output quality.
Q4_K_M is the recommended default for most users — it offers an
excellent balance of speed, memory, and quality.
| Level | Size (7B model) | VRAM (7B) | Speed | Quality | Best for |
|---|---|---|---|---|---|
| Q2_K | ~1.6 GB | ~2 GB | Fastest | Lowest | Memory-constrained devices, quick prototyping |
| Q3_K_M | ~2.1 GB | ~3 GB | Very fast | Fair | 4 GB RAM systems, edge inference |
| Q4_K_M | ~2.8 GB | ~4 GB | Fast | Good | Recommended default for most users |
| Q5_K_M | ~3.4 GB | ~5 GB | Good | Very good | Higher quality without full precision overhead |
| Q8_0 | ~6.7 GB | ~8 GB | Moderate | Excellent | Near-lossless inference, 8 GB VRAM GPUs |
| F16 | ~13.5 GB | ~16 GB | Slowest | Best | Reference quality, fine-tuning, benchmarking |
Recommended Models
All models below are available from the Ollama registry. Pull them with
linus-ai --pull-model <name>.
| Model | Size | RAM (Q4_K_M) | Strengths | Best for |
|---|---|---|---|---|
| llama3.2 | 7B | 4 GB | General reasoning, instruction following | Daily chat, code assistance, Q&A |
| mistral:7b | 7B | 4 GB | Fast, efficient, strong on structured output | Production inference, API workloads |
| qwen2.5:7b | 7B | 5 GB | Multilingual, strong coding, long context | Multilingual apps, coding, 128K context |
| phi3:medium | 14B | 8 GB | Exceptional reasoning relative to size | Complex reasoning, academic tasks |
| qwen2.5:32b | 32B | 20 GB | Top-tier multilingual, coding, math | Professional workloads, 24+ GB VRAM |
| llama3.1:70b | 70B | 40 GB | Near GPT-4 class on many benchmarks | Enterprise, multi-GPU tensor parallel |
| qwen2.5:72b | 72B | 44 GB | State-of-the-art open model, multilingual | Highest quality, distributed inference |
~/.linus_ai/models/ (Linux/macOS) or %USERPROFILE%\.linus_ai\models\ (Windows). You can symlink this directory to a larger volume if needed.Inference Modes
LINUS-AI automatically selects the best inference backend for your hardware.
You can override this with the backend config key or the --backend
CLI flag.
| Backend | Flag | Hardware | Notes |
|---|---|---|---|
| auto | --backend auto |
Any | Recommended. Detects Metal → CUDA → ROCm → CPU in order. |
| cpu | --backend cpu |
Any CPU | Force CPU-only. Works everywhere, slowest for large models. |
| cuda | --backend cuda |
NVIDIA GPU | Requires CUDA 12+ drivers. Fastest on NVIDIA hardware. |
| metal | --backend metal |
Apple Silicon | Uses Apple GPU via Metal. Unified memory — no VRAM limit. |
| rocm | --backend rocm |
AMD GPU | Requires ROCm 5.7+ (Linux). RX 6000/7000 series recommended. |
Backend Detection Output
GPU Layers
The gpu_layers setting controls how many model layers are offloaded to
the GPU. -1 offloads all possible layers (maximum GPU acceleration).
Reduce this number if you hit out-of-memory errors.
CPU Threads
CPU thread count is automatically detected from your system's logical core count.
Override it with threads = N in config or --threads N
on the command line. For best performance, set threads to the number of
physical (not logical) cores.
Context Length
Context length (in tokens) determines how much conversation history and document content can be kept in a single inference pass. Larger contexts use more memory. Supported models (e.g., Qwen2.5, Llama 3.1) support up to 128K tokens.
AI Profiles
AI Profiles are pre-tuned system prompt configurations optimised for specific industry verticals and use cases. Each profile shapes the model's behaviour, tone, terminology, and response style without requiring any prompt engineering on your part.
Available Profiles
Each profile operates under one of four compliance tiers that govern PII handling, consent requirements, and audit logging. The tier is displayed alongside the profile in the GUI and CLI.
| Profile ID | Display Name | Compliance Tier | Description | Availability |
|---|---|---|---|---|
| OPEN — No restrictions | ||||
| general | General Assistant | OPEN | Balanced, helpful assistant suitable for any task. Default profile. | All plans |
| creative | Creative Writing | OPEN | Higher temperature, narrative-focused. Ideal for copywriting, stories, and marketing. | All plans |
| reasoning | Deep Reasoning | OPEN | Chain-of-thought prompting, step-by-step analysis. Best with larger models (32B+). | Pro / Team / Enterprise |
| code | Software Development | OPEN injection detection | Code-first responses, test-driven suggestions, multi-language support, documentation generation. Prompt injection attempts are detected and flagged. | Pro / Team / Enterprise |
| engineering | Systems Engineering | OPEN injection detection | Precise, spec-driven. Covers distributed systems, hardware, embedded, and infrastructure. Prompt injection attempts are detected and flagged. | Pro / Team / Enterprise |
| AUDIT — PII scanned & activity logged | ||||
| education | Education & Tutoring | AUDIT FERPA / COPPA | Socratic method, scaffolded explanations, avoids giving direct answers to homework. PII is scanned and all activity is logged. | All plans |
| support | Customer Support | AUDIT | Empathetic, concise responses tuned for help-desk and customer-facing interactions. PII is scanned and all activity is logged. | All plans |
| sales | Sales & CRM | AUDIT GDPR / CAN-SPAM | Persuasive, goal-oriented. Assists with outreach drafts, objection handling, and pipeline notes. PII is scanned and all activity is logged. | All plans |
| data_science | Data Science & ML | AUDIT GDPR / CCPA | Statistical reasoning, experiment design, model evaluation, Python/R code assistance. PII is scanned and all activity is logged. | Pro / Team / Enterprise |
| REGULATED — One-time consent required · PII blocked | ||||
| medical | Medical & Clinical | REGULATED HIPAA | Conservative, evidence-based. Follows clinical terminology. Always disclaims professional advice. Blocking PII (SSNs, MRNs) is rejected before reaching the model. | Pro / Team / Enterprise |
| legal | Legal Research | REGULATED Attorney-Client Privilege | Formal language, citation-aware. Suited for contract review, case research, and legal drafting. Blocking PII is rejected before reaching the model. | Pro / Team / Enterprise |
| finance | Finance & Accounting | REGULATED SOX / PCI-DSS / FINRA | Precise numeric reasoning, regulatory awareness, financial terminology. Credit card numbers and CVVs are blocked before reaching the model. | Pro / Team / Enterprise |
| hr | Human Resources | REGULATED EEOC / GDPR / CCPA / FCRA | Policy-aware tone. Handles onboarding, policy queries, and HR documentation. Blocking PII is rejected before reaching the model. | All plans |
| RESTRICTED — Professional licence required | ||||
| security | Cybersecurity | RESTRICTED SOC2 / ISO27001 / NIST / CFAA | Technical depth in threat analysis, CVE research, security architecture. Responsible-disclosure framing. Prompt injection causes immediate request block. | Pro / Team / Enterprise |
general, creative) and the AUDIT profiles (support, education, sales), plus the REGULATED hr profile. Upgrade to Professional, Team, or Enterprise to unlock all 14 profiles including reasoning, code, engineering, data_science, medical, legal, finance, and security.Compliance & Consent
Profiles in the REGULATED and RESTRICTED tiers
require a one-time acknowledgement before first use on each machine. When you first
activate one of these profiles, a consent dialog appears with the full legal
disclaimer specific to that profile's regulatory context. You must scroll through
and explicitly accept before proceeding. Your acceptance is stored locally
per-machine per-profile in ~/.linus_ai/consents/ — you will not be
prompted again on the same machine unless the disclaimer is updated.
PII Detection & Handling
All prompts in AUDIT, REGULATED, and RESTRICTED profiles are scanned for personally identifiable information before being sent to the model. PII is classified into two categories:
| Category | Examples | Action |
|---|---|---|
| Blocking | SSN, credit card numbers, CVVs, medical record numbers (MRNs), passport numbers | Request is rejected before it reaches the model. An error is returned to the caller and the attempt is logged. |
| Non-blocking | Email addresses, phone numbers, postal addresses | Automatically redacted in the prompt before inference. The redacted prompt is used; the original is not logged. |
Prompt Injection Detection
LINUS-AI scans all incoming prompts for injection patterns — attempts to override
system instructions, exfiltrate context, or manipulate the model's role. Detected
attempts are flagged and recorded in the audit log. On the
Security (security) profile, a detected injection
attempt causes the request to be immediately blocked and the session to be
suspended pending review.
Audit Logging
All activity in AUDIT, REGULATED, and RESTRICTED profiles is written to a
tamper-evident audit log at ~/.linus_ai/audit/. Each log entry is
HMAC-chained to the previous entry — any modification, deletion, or reordering
of records breaks the chain and is detected on next verification. Log entries
include timestamp, user identity, profile, model, PII scan results, consent
status, and (for non-blocking PII) the redaction map.
linus-ai --audit-verify. The command checks the full HMAC chain and reports the first broken link if tampering is detected.Document Access Control (RAG)
When you upload documents for Retrieval-Augmented Generation (RAG), each document is automatically registered in the RAG registry with a classification level. Access decisions are enforced per-user at query time — you only receive content from documents you are authorised to access.
Document Classification Levels
| Classification | Description |
|---|---|
| PUBLIC | Accessible to all authenticated users within the organisation. |
| INTERNAL | Accessible to employees of the owning company or division. |
| CONFIDENTIAL | Restricted to authorised departments or specific roles. |
| RESTRICTED | Restricted to named individuals or explicit role grants. |
| TOP_SECRET | Highest classification. Requires explicit clearance level on the user record. |
Access Decision Factors
Each RAG retrieval call performs a per-document access check against the requesting user's identity attributes: department, division, company, roles, and clearance level. Documents that the user is not authorised to access are silently withheld from the retrieved context — the response is generated only from documents the user can see. No indication is given to the user that additional documents exist but were withheld.
All access decisions — both grants and denials — are recorded in the RAG audit
trail at ~/.linus_ai/audit/rag/. This trail is also HMAC-chained for
tamper-evidence and can be verified with linus-ai --rag-audit-verify.
--doc-class flag or the GUI Setup tab when uploading documents. If no classification is specified, documents default to INTERNAL. Classification can only be changed by an administrator.Selecting a Profile
Custom System Prompts
Every profile ships with a carefully crafted default system prompt. You can override it at any level — globally via config, per-request via the API, or interactively in the GUI — without affecting the profile selection for other sessions.
Override via GUI
Open the control panel at http://localhost:8080, navigate to the
Setup tab, and find the System Prompt Customization
card. Edit the text area and click Save. The override is stored
per-profile in your browser's localStorage, so each profile remembers its own
custom prompt independently. Click Reset to Default at any time to
restore the profile's built-in prompt.
Override via API
Pass a system field in the request body when calling
POST /agent/stream or the OpenAI-compatible
POST /v1/chat/completions endpoint.
Override via Config File
Set a global system prompt in ~/.linus_ai/config.toml. This applies
to all CLI and API requests that do not supply their own system field.
Profile-level prompts take precedence over this global setting unless you explicitly
set the profile to none.
lnai_prompt_code). Clearing browser data will reset them. For permanent overrides, use the config file method instead.Control Panel GUI
LINUS-AI includes a built-in web control panel. Start it with
linus-ai --serve and open http://localhost:8080
in any modern browser. No additional installation or configuration is needed.
Chat Tab
Standard multi-turn conversation interface with streaming token output. Select a model and profile from the top toolbar, then type your message to begin.
- Real-time streaming responses — tokens appear as they are generated
- Model selector — switch models without restarting the server
- Profile dropdown — apply any available AI profile instantly
- Conversation history — persisted in the encrypted vault across sessions
- Copy, export, and share individual responses
Agent Tab
Multi-turn agentic mode where the model can use tools to complete complex tasks. The agent can browse the web, read files, execute code, and call external APIs — all locally, all private.
- Tool-use loop — model autonomously plans, acts, observes, and iterates
- Built-in tools: file reader, web search (local), code execution sandbox
- Step-by-step trace — view every reasoning step and tool call
- Interrupt and redirect — stop mid-task and give new instructions
- Available on Professional, Team, and Enterprise plans
Setup Tab
Full configuration interface — no config file editing required. Changes take effect immediately without restarting the server.
- Model config — backend, GPU layers, threads, context length, quantization
- Profile selection — choose from all available AI vertical profiles
- System prompt editor — customize and preview the active system prompt
- License management — view status, activate, or deactivate a seat
- Server settings — host, port, CORS origins, API key management
- Vault settings — encryption key rotation, conversation export
CLI Reference
All flags can also be set via ~/.linus_ai/config.toml or environment
variables. CLI flags take highest precedence, followed by environment variables,
then the config file.
| Flag | Type / Default | Description |
|---|---|---|
| Server | ||
| --serve | flag | Start the HTTP server and control panel GUI. |
| --host HOST | string · 127.0.0.1 |
Bind address. Use 0.0.0.0 to expose on the network. |
| --port PORT | int · 8080 |
HTTP port for the server and control panel. |
| Chat & Inference | ||
| --chat MESSAGE | string | Send a one-shot message and print the response, then exit. |
| --model NAME | string · config default | Model to load (e.g. llama3.2, mistral:7b). |
| --profile NAME | string · general |
AI vertical profile. See the Profiles section for all 14 IDs. |
| --backend MODE | string · auto |
Inference backend: auto | cpu | cuda | metal | rocm. |
| --threads N | int · auto | CPU threads for inference. Defaults to physical core count. |
| Model Management | ||
| --pull-model NAME | string | Download a model from the Ollama registry (e.g. llama3.2, qwen2.5:32b). |
| --list-models | flag | List all locally installed models with size, quantization, and date. |
| --delete-model NAME | string | Delete a locally installed model and free disk space. |
| License | ||
| --activate KEY | string | Activate a license key (LNAI-XXXX-XXXX-XXXX-XXXX format). |
| --activate-email EMAIL | string | Email address associated with the license (required with --activate). |
| --license-status | flag | Display current license status, plan, seats, and machine binding. |
| --deactivate | flag | Release the machine binding for this seat so it can be used elsewhere. |
| Distributed / Mesh | ||
| --tensor-parallel N | int · 1 |
Split model across N GPUs using tensor parallelism. |
| --mesh-role ROLE | string | Mesh role: coordinator or worker. |
| --mesh-join HOST:PORT | string | Join an existing mesh cluster (worker nodes use this to connect to the coordinator). |
| General | ||
| --config PATH | string · ~/.linus_ai/config.toml |
Path to a custom configuration file. |
| --quiet | flag | Suppress all non-essential output. Useful in scripts. |
| --verbose | flag | Enable detailed debug logging including backend detection and token stats. |
| --version | flag | Print version, build platform, and active backend, then exit. |
Troubleshooting
Common issues and their solutions. If you cannot find a fix here, check the log
file at ~/.linus_ai/logs/linus-ai.log and include the relevant
excerpts when reporting an issue.
Model not found / "no such model: llama3.2"
This error means the model has not been downloaded yet. Pull it from the registry:
If you already have the model, check that the name matches exactly (case-sensitive, including any tag suffix such as :7b or :Q4_K_M). Also verify that ~/.linus_ai/models/ is on the same filesystem as expected — symbolic links to different volumes are supported.
Out of memory / "CUDA out of memory" / model crashes
The model requires more GPU VRAM than available. Try one or more of the following:
- Reduce
gpu_layersto offload fewer layers (e.g.,gpu_layers = 20) - Use a lower quantization level: switch from
Q8_0toQ4_K_M - Reduce
context_len— long contexts use significant KV cache memory - Close other GPU workloads (browsers with WebGPU, other ML frameworks)
- Switch to a smaller model (e.g., 7B instead of 13B)
License not recognized / "invalid license key format"
Check the following:
- The key must be in the exact format
LNAI-XXXX-XXXX-XXXX-XXXX— four groups of four characters separated by hyphens, prefixed withLNAI- - The email address must exactly match the one used at checkout (case-insensitive)
- Check for accidental leading/trailing whitespace if pasting from email
- If you receive "key already activated on another machine", run
linus-ai --deactivateon the old machine first, or contact support
Port 8080 already in use
Another process is using port 8080. Start LINUS-AI on a different port:
Or set it permanently in config: [server] port = 8081. To find the conflicting process on macOS/Linux: lsof -i :8080.
Metal backend crashes on Apple Silicon
Metal-related crashes are typically caused by one of:
- macOS version too old — update to macOS 14 (Sonoma) or later. Metal shader features used by LINUS-AI require macOS 14+.
- Xcode Command Line Tools not installed — run
xcode-select --installin Terminal. - Metal driver regression — if a recent macOS update broke Metal, add
backend = "cpu"to your config temporarily as a workaround.
Log Location
Detailed logs are written to ~/.linus_ai/logs/linus-ai.log. The log
rotates daily and keeps 7 days of history. Pass --verbose at startup
to increase log verbosity for debugging. When reporting an issue on GitHub,
please include the last 50 lines of the log.