Evaluation & Benchmarks
Testing frameworks and benchmarking tools for AI systems.
6 resources
Hover over a resource to preview
Evaluation & Benchmarks
6
├──
ADE-Bench (dbt Labs)
Github
~
→
├──
LLM Council (Karpathy)
Github
~
→
├──
Origon: Production-Grade AI Agent Platform
Origon
~
→
├──
Arize Phoenix: AI Observability & Evaluation
Github
~
→
├──
Better Agents: Standards for Agent Building
Github
~
→
└──
The Importance of Agent Harness in 2026
Philschmid
~
→
Other Tools
11
├──
Advanced Tool Use (Anthropic)
Anthropic
~
→
├──
Consistent Hero Images Workflow
Open
~
→
├──
CocoIndex: Data Transformation Framework for AI
Cocoindex
~
→
├──
Tool Search is Dead, Long Live Skills
Nicolaygerold
~
→
├──
Gemini Deep Research
Gemini
~
→
├──
21st.dev: AI Product Designer
21st
~
→
├──
Custom AI Agent Builder's Guide
Motherduck
~
→
├──
Towards a Disaggregated Agent Filesystem on Object Storage
Penberg
~
→
├──
Agentic Coding Flywheel Setup
Github
~
→
├──
Convex AI Chat Template
Convex
~
→
└──
Workflow DevKit for AI Agents
Workflowdevkit
~
→
CLI & Utilities
8
├──
mdflow - Executable Markdown
Github
~
→
├──
summarize.sh: Web Content Extraction CLI
Summarize
~
→
├──
Tigma: Terminal-Based ASCII Design Tool
Github
~
→
├──
Agentic Coding Flywheel: VPS Bootstrap System
Github
~
→
├──
Peaky Panes: TUI Project Manager
Github
~
→
├──
Tigma: AI-Driven Design Tool
Github
~
→
├──
PeakyPanes: Cursor Window Manager
Github
~
→
└──
Dots: Lightweight Task Tracking
Github
~
→
Memory Systems
5
├──
claude-mem - Persistent Memory
Github
~
→
├──
OpenMemory: Local Memory Store for LLM Apps
Github
~
→
├──
Semantic Memory: Local Vector Search with PGlite
Github
~
→
├──
Memory Lane: Persistent Memory for Claude Code
Gist
~
→
└──
Context Field for Cursor
Github
~
→
MCP & Protocols
6
├──
MCP Deep Dive: The USB-C Layer for AI
Newsletter
~
→
├──
mcp-use: Full-Stack MCP Framework
Github
~
→
├──
Agent Skills: Open Format for Agent Capabilities
Agentskills
~
→
├──
MCP Apps: Interactive UI Extension Specification
Blog
~
→
├──
MotherDuck MCP Server
Motherduck
~
→
└──
MCP UI Desktop Client
Github
~
→
Claude Code Tools
10
├──
Oh-My-OpenCode: Multi-Agent Plugin System
Github
~
→
├──
5 Fixes for Claude Skills Failures
Open
~
→
├──
Claude Use Cases Directory
Claude
~
→
├──
Claude Code Templates: Stack Builder
Aitmpl
~
→
├──
Compound Engineering: Claude Code Plugin
Github
~
→
├──
Continuous Claude: Context Management System
Github
~
→
├──
Coding Tutor: Personalized AI Learning Plugin
Github
~
→
├──
Writing a Good CLAUDE.md
Humanlayer
~
→
├──
Awesome Claude: Marketplace Directory
Github
~
→
└──
Claude Commands and Prompts Guide
Nurijanian
~
→
Agent Platforms
8
├──
Simular AI: Autonomous Computer Use Agent
Simular
~
→
├──
Agentic Data Scientist
Github
~
→
├──
Markdown Site: AI-Ready Publishing Framework
Github
~
→
├──
Swarms: Enterprise Multi-Agent Orchestration
Github
~
→
├──
Agent-Native Architectures: How to Build Apps After Code Ends
Every
~
→
├──
ClawdBot: Claude Discord Bot
Github
~
→
├──
Claude Delegator: Task Routing Agent
Github
~
→
└──
AgentFS: Filesystem for AI Agents
Github
~
→
Documentation & Knowledge
5
├──
CodeWiki: Repository-Level Documentation Framework
Github
~
→
├──
AI Builder's Guide: Building Analytics Agents
Motherduck
~
→
├──
Markdown Site Generator
Github
~
→
├──
Obsidian Skills: AI Agent Behaviors
Github
~
→
└──
SpecStory: AI Coding Session History
Specstory
~
→