DSI Research Ideas
2026-02-13 — Click "Try Branch" to get a prompt for your coding agent
Papers (5)
Agent-Diff: Benchmarking LLM Agents on Enterprise API Tasks via Code Execution with State-Diff-Based
Score: 9/10
carlcrm
API OK
Commercial:
GPT-4, Claude, Gemini, Claude Opus 4, Gemini 3 Pro, Gemini-3-Flash
Open-source:
Llama, Qwen, DeepSeek...
Try Branch →
Agentic Test-Time Scaling for WebAgents
Score: 9/10
daily-session-intel
API OK
Commercial:
GPT-4, GPT-4o, Claude, Gemini
Open-source:
Llama, Qwen, DeepSeek...
Try Branch →
PrefillShare: A Shared Prefill Module for KV Reuse in Multi-LLM Disaggregated Serving
Score: 9/10
carlcrm
Local GPU
Commercial:
GPT-4, Gemini, Claude
Open-source:
Llama, Qwen, DeepSeek...
Try Branch →
Structured Context Engineering for File-Native Agentic Systems
Score: 9/10
carlcrm
API OK
Commercial:
Claude, GPT, Gemini
Try Branch →
FlowMind: Execute-Summarize for Structured Workflow Generation from LLM Reasoning
Score: 9/10
carlcrm
API OK
Commercial:
GPT-4, Claude
Open-source:
Toolformer, HuggingGPT, SciAgent...
Try Branch →
Repos (2)
n8n
⭐ 174401
Try Branch →
AutoGPT
⭐ 181775
Try Branch →