DSI Research Ideas

2026-02-13 — Click "Try Branch" to get a prompt for your coding agent

Papers (5)

Agent-Diff: Benchmarking LLM Agents on Enterprise API Tasks via Code Execution with State-Diff-Based
Score: 9/10 carlcrm API OK
Commercial: GPT-4, Claude, Gemini, Claude Opus 4, Gemini 3 Pro, Gemini-3-Flash
Open-source: Llama, Qwen, DeepSeek...
Try Branch →
Agentic Test-Time Scaling for WebAgents
Score: 9/10 daily-session-intel API OK
Commercial: GPT-4, GPT-4o, Claude, Gemini
Open-source: Llama, Qwen, DeepSeek...
Try Branch →
PrefillShare: A Shared Prefill Module for KV Reuse in Multi-LLM Disaggregated Serving
Score: 9/10 carlcrm Local GPU
Commercial: GPT-4, Gemini, Claude
Open-source: Llama, Qwen, DeepSeek...
Try Branch →
Structured Context Engineering for File-Native Agentic Systems
Score: 9/10 carlcrm API OK
Commercial: Claude, GPT, Gemini
Try Branch →
FlowMind: Execute-Summarize for Structured Workflow Generation from LLM Reasoning
Score: 9/10 carlcrm API OK
Commercial: GPT-4, Claude
Open-source: Toolformer, HuggingGPT, SciAgent...
Try Branch →

Repos (2)

n8n
⭐ 174401

Try Branch →
AutoGPT
⭐ 181775

Try Branch →