
LightWeight
Run the world's heaviest AI models right on your everyday laptop. No expensive servers, no privacy worries.

"Lightweight AI leverages advanced 4-bit quantization and GGUF-based inference to run large-scale language models locally on consumer-grade hardware. Pull optimized weights from our registry and serve them via a minimalist CLI that bypasses the need for high-end GPUs or cloud dependencies."


"Our platform streamlines the AI lifecycle with performance monitoring tools that track token-per-second (TPS) and VRAM usage in real-time. Integrate compressed models into your local workspace with zero-config setup, ensuring full data privacy and low-latency inference across all your devices."

01. Why LightWeight?
Running a massive AI like Llama or Qwen usually requires a $5,000 server. We built an engine that squeezes these giants so they run cool and fast on your normal laptop.
Big AI Model
64GB+ VRAM
Costs ~$5,000+
16GB RAM
Your normal laptop
Ultra-Heavy Model
140GB+ VRAM
A massive server room
32GB RAM
Your personal computer
Giant Model (Kimi)
300GB+ VRAM
A massive server room
24-32GB RAM
Your personal computer
02. How is it possible?
01 / Built for Your PC
Our engine scans your RAM and GPU to create a custom 'fit'. It's like a tailor making a custom suit for your laptop's power.
02 / No Heat, No Noise
We intelligently limit CPU usage so your fans don't scream. Your laptop stays cool and silent even during deep AI thinking.
03 / Smart Memory Logic
LightWeight doesn't hog your RAM. If other apps need space, it instantly releases memory back to your computer. No more crashing.
04 / Truly Portable
One file, zero setup headaches. It works anywhere from a basic office laptop to a high-end gaming beast without extra drivers.
03. Real-world Speed
| Your Device | AI Type | Feeling |
|---|---|---|
| Old Office Laptop | O'rtacha (7B-8B) | Perfectly readable |
| Modern Laptop | Katta (14B-32B) | Fast human typing |
| Gaming PC / Mac | Gigant (70B-671B) | Instant responses |
04. Simple Start
Windows (PowerShell):
irm https://lightweight.zecoryx.uz/install.ps1 | iexcopymacOS / Linux:
curl -fsSL https://lightweight.zecoryx.uz/install.sh | shcopyPull a model then start chatting immediately:
lightweight pull qwen:32bcopylightweight chat qwen:32bcopyView all downloaded models on your machine:
lightweight listcopyRemove a model to free up disk space:
lightweight rm llama3:8bcopyTurn your machine into an OpenAI-compatible local API endpoint:
lightweight serve --port 8000copy# Server starts at http://0.0.0.0:8000copySend a request from any app:
curl http://localhost:8000/v1/chat/completions \copy -H "Content-Type: application/json" \copy -d '{"model": "qwen:32b", "messages": [{"role": "user", "content": "Hello!"}]}'copyEndpoints
POST /v1/chat/completions
GET /v1/models
GET /v1/health
Options
--port 8000 — custom port
--host 0.0.0.0 — network access
--help — all options
Check if your hardware can run a specific model:
lightweight check llama3:70bcopyGet detailed information about any model:
lightweight info qwen:32bcopyConfigure global CLI settings:
lightweight config --editcopy