Migrating 250K+ Embeddings with Zero Downtime

Embeddings are the lifeblood of modern AI systems, enabling everything from semantic search to recommendation engines. But what happens when you need to migrate millions of embeddings from one model to another? For us at Quartalis, this wasn’t just a theoretical exercise—it was a real-world challenge involving 257,000+ chunks and zero downtime. In this post, I’ll walk through the planning, execution, and verification stages of our recent migration from nomic-embed-text to Qwen3-Embedding, sharing lessons learned along the way.
Planning the Migration
The first step in any major technical project is planning. We knew we had to migrate embeddings without causing downtime or affecting user experience. Here’s how we approached it:
1. Choosing the Right Replacement Model
We evaluated several models before settling on Qwen3-Embedding. Our criteria were:
- Semantic similarity scores must match our existing system.
- Performance (speed and resource usage) had to be comparable.
- Integration with our existing vector database, ChromaDB, was a must.
2. Resource Planning
We allocated significant compute resources for the migration, using Kubernetes pods to handle the load. Each pod was configured with:
resources:
requests:
memory: "8Gi"
cpu: "4" This ensured we could process embeddings in parallel without overwhelming our infrastructure.
3. Fallback Strategy
We knew there was always a risk of failure, so we built a fallback mechanism into our pipeline. This involved:
- Versioning the embedding files to allow for easy rollback.
- Maintaining a read-only copy of the old embeddings during the migration period.
- Setting up automated alerts for any discrepancies between the new and old systems.
Execution: Migrating with Minimal Impact
The execution phase was divided into three stages: evaluation, parallel processing, and incremental migration.
1. Evaluation
We started by migrating a small subset of data to test our pipeline. This allowed us to:
- Validate the embedding conversion process.
- Identify any edge cases or anomalies.
- Fine-tune our resource allocation based on observed performance.
2. Parallel Processing
With confidence in our approach, we scaled up. Using Python’s threading module, we processed embeddings in batches:
import threading
def migrate_embeddings(embedding_id):
# Logic to convert and store the new embedding
pass
threads = []
for i in range(1000): # Number of parallel processes
thread = threading.Thread(target=migrate_embeddings, args=(i,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join() This method allowed us to handle the load without overburdening our systems.
3. Incremental Migration
Once the bulk migration was complete, we switched to an incremental approach. Any new embeddings were automatically processed by the new model, ensuring seamless integration into our system.
Verification and Validation
Verification is where the real work begins. You can’t afford to cut corners here—any discrepancy could lead to degraded performance or user dissatisfaction.
1. Checksum Comparison
We implemented checksums for both source and destination embeddings:
import hashlib
def compute_checksum(embedding):
# Convert embedding to a hashable format
return hashlib.md5(embedding).hexdigest()
old_checksum = compute_checksum(old_embedding)
new_checksum = compute_checksum(new_embedding)
if old_checksum != new_checksum:
raise ValueError("Embedding mismatch detected.") This ensured that every migrated embedding was identical to its original counterpart.
2. Manual Spot-Checking
Even with automated checks, we manually verified a random sample of embeddings. This step is crucial—algorithmic verification can sometimes miss subtle issues that human oversight would catch.
3. Monitoring During Live Traffic
Once the migration was complete, we monitored the system under live traffic for at least 48 hours. This gave us confidence that everything was functioning as expected.
Lessons Learned
Migrating embeddings at scale is a complex task, but it’s also an opportunity to learn and improve your processes. Here are our key takeaways:
1. Over-communicate with Your Team
Ensure everyone understands the risks and procedures. Miscommunication can lead to unnecessary downtime or data loss.
2. Test Thoroughly (and Then Test Again)
We can’t stress this enough: no amount of testing is too much. The last thing you want is a surprise on migration day.
3. Keep It Simple, But Not Too Simple
While it’s tempting to over-engineer solutions, stick to proven methods. Our approach using Python’s threading module was effective without being overly complex.
What’s Next?
Looking ahead, we’re exploring ways to automate this process for future migrations. The insights gained from this project will inform our work on the Quartalis ecosystem, ensuring that our tools are robust and reliable for users worldwide.
In the meantime, if you’ve got any questions or want to share your own experiences with embedding migrations, drop us a line on Twitter or LinkedIn. We’d love to hear from you.
Need this built for your business?
Get In TouchRelated Posts
Deduplicating Vector Databases: Keeping Your Knowledge Clean
Maintaining data quality in vector databases is crucial for accurate and reliable AI applications. Duplicate entries not only waste storage space but can also skew search results and negatively...
Docker Compose for Homelab: Running 15+ Services Like a Pro
Docker Compose is a game-changer for managing multiple containers in a homelab environment. If you're running 15+ services—like Ollama, FastAPI APIs, WordPress sites, n8n workflows, or...
The Complete Guide to Self-Hosting in 2026: Take Back Your Data
In 2026, the allure of self-hosting is stronger than ever. Concerns about data privacy are at an all-time high, subscription costs for cloud services are spiralling, and the desire for true control...