Eidolon: How We Got GPT‑OSS 120B Running on a Normal Computer

Eidolon is a platform that lets you run text, music, image, and video generative models locally—right on your own computer

By retemedia

10 December 2025

0

23

Indice

What is Eidolon? A fully offline, local AI Hub
The Heart of the Project: Running Massive Models Locally
How We Did It (Explained Simply)
- Key Optimizations:
Why This Matters
What Eidolon Can Already Do
Why We Believe in Offline AI
Why Kickstarter?
In Conclusion
Bonus: What Is Quantization?

In recent years, artificial intelligence has become part of our daily lives, yet it remains tied to a fundamental limitation: it lives in the cloud, and users have no real control over the fate of the data collected during interactions with various models.

Every time we ask a model something, our data is sent to external servers. It’s convenient, sure. But also:

expensive,
slow,
out of the user’s control,
dependent on companies and subscriptions,
often not transparent.

With Eidolon, our goal is to completely reverse this usage model.

What is Eidolon? A fully offline, local AI Hub

Eidolon is a platform that lets you run text, music, image, and video generative models locally—right on your own computer.

Initially, we focused on open-source models small enough to run on standard machines, so we used relatively lightweight models with 4 to 7 billion parameters. These aren’t as powerful as GPT-4, Claude, or Gemini, but they can still do a lot, albeit with some clear limitations.

The advantages were obvious:

No cloud.
No subscriptions.
No data collection.

Everything generated by the AI stayed on your device.

But this wasn’t just about privacy—it was a new way of using AI.
No longer as an “external service”, but as a personal tool like a DAW, an IDE, or graphics software. Large public models could still be used when needed, but the default would be your own local models.

And let’s be honest—for most people’s needs, a 4B or 7B model is more than enough. Just like in cars: a small Smart often does the job better than a bulky SUV in city traffic. What changes is just the user’s perception of status and power.

Same story with AI.

The Heart of the Project: Running Massive Models Locally

That said, for more advanced professional use, larger models (27B, 70B, 120B+) are undeniably better—for both reasoning depth and access to a broader training dataset.

So we decided to prove that offline AI is not a limitation by attempting something that seemed impossible until recently:

Running GPT‑OSS 120B locally.

This is a 120-billion-parameter model derived from GPT‑4+ and enhanced with improvements later adopted in GPT‑5-class models.

Models of this scale were once considered runnable only in datacenters or on custom-built supercomputers.

Yet we got it working on a high-end home workstation.

How We Did It (Explained Simply)

Despite the model’s massive size, modern techniques make it possible to run even behemoths like this on consumer hardware.

Key Optimizations:

Quantization — Compressing Without Killing Quality

A high-precision model would take up hundreds of GB—not feasible on a home PC.

Quantization reduces the numerical precision of the model:

FP16 → INT8 → down to INT4

It’s like going from a WAV file to a high-quality MP3:
lighter, faster, nearly the same quality.

Result:
GPT‑OSS 120B fits in your combined RAM + VRAM.

GPU + CPU Together — It’s Not Just About the GPU

Eidolon uses smart software to split the model across components:

the heaviest parts go to the GPU,
others run on the CPU,
and some segments are loaded on demand only when needed.

This reduces VRAM usage and makes full use of your system.

NVMe Offloading — Your SSD Becomes Extra Memory

Thanks to modern high-speed SSDs, the disk can act as additional memory.

Just a few years ago, this would’ve been too slow—today it works surprisingly well.

Optimized Runtimes

Eidolon relies on modern libraries like:

ggml
exllama
localized vLLM

These make it possible for even massive models to run efficiently on serious workstations.

Why This Matters

This experiment proves three key points:

1. Offline AI is Truly Possible — Even with Huge Models

There’s no longer any need to rely on the cloud to access advanced generative power.

2. The User Regains Full Control Over Their AI

All data remains on your device.
No subscriptions. No limitations. No leak risks.

3. A New Market Is Emerging: Personalized AI at Home

Eidolon isn’t just a model—it’s a modular AI hub that lets you combine multiple models for:

writing,
video generation,
music composition,
offline RPG NPCs,
autonomous agents,

all without external connections.

What Eidolon Can Already Do

Thanks to open-source integration, Eidolon can already:

generate video clips with WAN 2.0 and LTX Video,
create high-quality images with ComfyUI + Z-Image Turbo, Stable Diffusion, or Flux 1.0,
compose music with MusicGen and other copyright-free music models,
write complex texts with GPT‑OSS 20B and 120B,
analyze documents and images,
create offline AI agents,
build private multimodal interfaces,
generate conversational characters for games and storytelling,
hold voice conversations with users (speech-to-text and text-to-speech).

All this, without sending a single byte outside your computer.

Why We Believe in Offline AI

For years, centralized AI has dominated the market—but it can’t be the only path forward.

The future of AI must be:

local,
personalized,
private,
powerful,
modular,
integrated into everyday creative tools.

Eidolon is a bridge to that future.

Why Kickstarter?

Most of the development work behind Eidolon is already complete.

But distributing the various modules—professional chat, entertainment chat, image/video/music generation—requires massive web storage for models ranging from 1 GB to 120 GB, plus a fast download experience for users.

We are a non-profit cultural association, and we don’t have the resources to provide all of this on our own.

That’s why we launched this Kickstarter campaign—to raise the minimum funding needed to make it all possible.

If you want to understand where consumer AI is really going, don’t stay on the sidelines. Follow the Eidolon project, join the pre-launch list, and see how far an offline, private, user-controlled AI system can actually go.
Eidolon is built to be used, explored and challenged. Jump in now and help shape the next generation of personal AI.

In Conclusion

We’ve come a long way.

From the “philosophical” GPT-3 demos that felt like science fiction,
to tangible, usable AI that can be installed on your own PC.

And now we can run enormous models locally.
This isn’t theory—we’ve done it.

Eidolon is built on this conviction:

Artificial Intelligence doesn’t need to live in the cloud.
It can live with you.

Bonus: What Is Quantization?

Quantization is what allows a device with the energy footprint of a lightbulb to run models that used to require server farms.

It works by converting the model’s weights from higher to lower numerical precision—like going from 32-bit to 8-bit or even 4-bit.

This drastically reduces model size and computing requirements, making it possible to run large language models (LLMs) on consumer hardware—without sacrificing much accuracy.

Why does this matter?

Because as LLMs grow exponentially in size and resource demands, quantization becomes the key to making them accessible across mobile devices, industry tools, and personal AI hubs.

Articolo precedente

Gemini and the Misguided Perception of “Second-Tier AI”: Why a Less Empathic Model Isn’t a Lesser One

Articolo successivo

One AI, a chat with multiple coherent identities, one controlled experience

Eidolon: How We Got GPT‑OSS 120B Running on a Normal Computer

What is Eidolon? A fully offline, local AI Hub

The Heart of the Project: Running Massive Models Locally

How We Did It (Explained Simply)

Key Optimizations:

Quantization — Compressing Without Killing Quality

GPU + CPU Together — It’s Not Just About the GPU

NVMe Offloading — Your SSD Becomes Extra Memory

Optimized Runtimes

Why This Matters

1. Offline AI is Truly Possible — Even with Huge Models

2. The User Regains Full Control Over Their AI

3. A New Market Is Emerging: Personalized AI at Home

What Eidolon Can Already Do

Why We Believe in Offline AI

Why Kickstarter?

In Conclusion

Bonus: What Is Quantization?

Viaggi e Vacanze

Alimentazione e Salute

Giochi

Local AI & Opensource

Software

Editoriali

Attualità

Essere donna

tech News & Analysis

Antropologia

UFO, Misteri & Bufale

Informatica & Cibernetica

Guide

Category

Su di noi

FOLLOW US