Name: Quartalis
Address: GB
Price range: ££

Embeddings are the lifeblood of modern AI systems, enabling everything from semantic search to recommendation engines. But what happens when you need to migrate millions of embeddings from one model to another? For us at Quartalis, this wasn’t just a theoretical exercise—it was a real-world challenge involving 257,000+ chunks and zero downtime. In this post, I’ll walk through the planning, execution, and verification stages of our recent migration from nomic-embed-text to Qwen3-Embedding, sharing lessons learned along the way.

Planning the Migration

The first step in any major technical project is planning. We knew we had to migrate embeddings without causing downtime or affecting user experience. Here’s how we approached it:

1. Choosing the Right Replacement Model

We evaluated several models before settling on Qwen3-Embedding. Our criteria were:

Semantic similarity scores must match our existing system.
Performance (speed and resource usage) had to be comparable.
Integration with our existing vector database, ChromaDB, was a must.

2. Resource Planning

We allocated significant compute resources for the migration, using Kubernetes pods to handle the load. Each pod was configured with:

resources:
  requests:
    memory: "8Gi"
    cpu: "4"

This ensured we could process embeddings in parallel without overwhelming our infrastructure.

3. Fallback Strategy

We knew there was always a risk of failure, so we built a fallback mechanism into our pipeline. This involved:

Versioning the embedding files to allow for easy rollback.
Maintaining a read-only copy of the old embeddings during the migration period.
Setting up automated alerts for any discrepancies between the new and old systems.

Execution: Migrating with Minimal Impact

The execution phase was divided into three stages: evaluation, parallel processing, and incremental migration.

1. Evaluation

We started by migrating a small subset of data to test our pipeline. This allowed us to:

Validate the embedding conversion process.
Identify any edge cases or anomalies.
Fine-tune our resource allocation based on observed performance.

2. Parallel Processing

With confidence in our approach, we scaled up. Using Python’s threading module, we processed embeddings in batches:

import threading

def migrate_embeddings(embedding_id):
    # Logic to convert and store the new embedding
    pass

threads = []
for i in range(1000):  # Number of parallel processes
    thread = threading.Thread(target=migrate_embeddings, args=(i,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

This method allowed us to handle the load without overburdening our systems.

3. Incremental Migration

Once the bulk migration was complete, we switched to an incremental approach. Any new embeddings were automatically processed by the new model, ensuring seamless integration into our system.

Verification and Validation

Verification is where the real work begins. You can’t afford to cut corners here—any discrepancy could lead to degraded performance or user dissatisfaction.

1. Checksum Comparison

We implemented checksums for both source and destination embeddings:

import hashlib

def compute_checksum(embedding):
    # Convert embedding to a hashable format
    return hashlib.md5(embedding).hexdigest()

old_checksum = compute_checksum(old_embedding)
new_checksum = compute_checksum(new_embedding)

if old_checksum != new_checksum:
    raise ValueError("Embedding mismatch detected.")

This ensured that every migrated embedding was identical to its original counterpart.

2. Manual Spot-Checking

Even with automated checks, we manually verified a random sample of embeddings. This step is crucial—algorithmic verification can sometimes miss subtle issues that human oversight would catch.

3. Monitoring During Live Traffic

Once the migration was complete, we monitored the system under live traffic for at least 48 hours. This gave us confidence that everything was functioning as expected.

Lessons Learned

Migrating embeddings at scale is a complex task, but it’s also an opportunity to learn and improve your processes. Here are our key takeaways:

1. Over-communicate with Your Team

Ensure everyone understands the risks and procedures. Miscommunication can lead to unnecessary downtime or data loss.

2. Test Thoroughly (and Then Test Again)

We can’t stress this enough: no amount of testing is too much. The last thing you want is a surprise on migration day.

3. Keep It Simple, But Not Too Simple

While it’s tempting to over-engineer solutions, stick to proven methods. Our approach using Python’s threading module was effective without being overly complex.

What’s Next?

Looking ahead, we’re exploring ways to automate this process for future migrations. The insights gained from this project will inform our work on the Quartalis ecosystem, ensuring that our tools are robust and reliable for users worldwide.

In the meantime, if you’ve got any questions or want to share your own experiences with embedding migrations, drop us a line on Twitter or LinkedIn. We’d love to hear from you.

Migrating 250K+ Embeddings with Zero Downtime