# Research Integration: PrefillShare: A Shared Prefill Module for KV Reuse in Multi-LLM Disaggregated Serving
## Your Mission
Create a new branch and develop a detailed implementation plan for integrating this research idea into the codebase. Do NOT implement yet — focus on understanding, planning, and identifying risks.
## Branch Setup
```bash
git checkout -b experiment/prefillshare-a-shared-prefill-module-for-kv-reuse-
```
## The Research
**Paper**: [PrefillShare: A Shared Prefill Module for KV Reuse in Multi-LLM Disaggregated Serving](https://arxiv.org/abs/2602.12029)
**PDF**: https://arxiv.org/pdf/2602.12029
**Core Achievement**:
Reduces tail latency by up to 45% in multi-agent orchestration scenarios by sharing prefill modules across multiple LLMs.
**Why This Matters for carlcrm**:
Multi-agent systems like the 'beads' workflow often suffer from redundant context loading. This paper offers an architectural pattern to share the 'global state' context efficiently.
**Suggested Integration Approach**:
When spawning agents via AgentMail, include a shared context ID that points to a pre-computed KV cache of the current AGENTS.md and project schema to reduce latency.
**Estimated Effort**: multi-week effort
## Strategic Context (Crux)
This paper addresses a deeper strategic problem:
> How do you maintain a coherent 'global consciousness' of a project's state when the individual actors (agents) only possess fleeting, local, and potentially conflicting snapshots of reality?
Consider how this solution might unlock other improvements beyond the immediate tactical goal.
## Model Requirements
**Commercial APIs available**: GPT-4, Gemini, Claude
**Open-source (local GPU)**: Llama, Qwen, DeepSeek, Mistral, Phi...
🖥️ **This requires local model hosting** — GPU infrastructure needed.
*The paper's core technique, PrefillShare, requires a 'cache-conditioned fine-tuning procedure' that freezes the prefill module and modifies the decode module weights. This level of architectural manipulation and weight modification cannot be performed via standard commercial APIs.*
## Your Task: Create the Integration Plan
### Phase 1: Understand the Codebase Context
1. **Identify the integration surface**: Which files/modules would this touch?
2. **Map dependencies**: What existing code would this interact with?
3. **Find similar patterns**: Is there existing code that does something similar we can learn from?
### Phase 2: Design the Integration
Create a detailed plan covering:
1. **Architecture**: How does this fit into the existing system?
2. **Data flow**: What inputs does it need? What outputs does it produce?
3. **Configuration**: What new settings/parameters are needed?
4. **Testing strategy**: How will we validate this works?
### Phase 3: Premortem — What Could Go Wrong?
**Think about this integration failing 2 weeks from now. Why did it fail?**
Consider:
- **Performance**: Could this slow down critical paths?
- **Complexity**: Are we adding too much complexity for the benefit?
- **Maintenance**: Will this be hard to maintain or debug?
- **Dependencies**: Are we adding risky dependencies?
- **Edge cases**: What inputs or states could break this?
- **Rollback**: If this doesn't work, how easily can we revert?
For each risk, note:
- Likelihood (low/medium/high)
- Impact (low/medium/high)
- Mitigation strategy
### Phase 4: Define Success Criteria
Before implementing, define:
1. **Minimum viable test**: What's the simplest way to prove this works?
2. **Quantitative metrics**: What numbers should improve? By how much?
3. **Qualitative checks**: What should "feel" better?
4. **Failure signals**: What would tell us to abandon this approach?
## Output Format
Create a `PLAN.md` file in the repo root with:
```markdown
# Experiment: [Title]
## Summary
[1-2 sentence summary of what we're trying]
## Integration Points
- [ ] File 1: description of changes
- [ ] File 2: description of changes
## Architecture Decision
[Explain the chosen approach and why]
## Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| ... | ... | ... | ... |
## Success Criteria
- [ ] Criterion 1
- [ ] Criterion 2
## Open Questions
- Question 1?
- Question 2?
## Next Steps
1. First implementation step
2. Second implementation step
```
## Important Guidelines
- **Read the paper first** — skim the abstract, intro, and methodology sections
- **Don't over-engineer** — start with the simplest version that could work
- **Preserve optionality** — design so we can easily extend or remove this later
- **Document decisions** — future you will thank present you
- **Ask questions** — if something is unclear, note it rather than assuming
---
*This prompt was generated by DSI (Daily Session Intelligence) to help you systematically explore research ideas.*