Changelog
What's new in AI Company. Every feature, improvement, and fix.
Mobile Sticky Header Fix, Improvement Cycle Hang Fix & QC Hardening
Generated output headers no longer stick to the screen on mobile — both Streamdown prose and HtmlPreview iframe outputs are now fully scrollable. Improvement cycle LLM calls that hung indefinitely are now capped with a 90s timeout. Comprehensive QC audit performed by ChatGPT agent covering content accuracy, navigation, trust messaging, and mobile UX.
Bold New Logo, One-Click Install, Performance & Accessibility
Complete logo overhaul with bold white 'A' on deep blue background — clearly visible at all icon sizes including Android home screen. Smart PWA install banner auto-detects installability and offers one-tap install. Lazy loading splits 17 routes into separate chunks, boosting Lighthouse Performance from 36 to 54. Accessibility score raised to perfect 100. App background fixed to clean white.
QA Bug Fixes + PWA Cache Busting
Fixed reviewer FAIL label showing misleading 'In Progress' text, updated service worker cache to force new logo propagation on PWA reinstall, and verified all v3.22 QA report items.
New 'A' Circuit Logo + Educational Loading Screen
Brand new 'A' Circuit logo across the entire app — a stylized letter A morphing into neural network circuits with glowing blue nodes. The loading screen now teaches you about Run, Build, and Pro modes with rotating example cards.
New Logo, No Splash Screen & PWA Improvements
New circuit-brain app icon replaces the generic white circle. PWA splash screen eliminated — dark background loads instantly without white flash on app open.
Auto Deploy Verification & Owner Notification
After every publish, the server automatically runs the full 12-check Deploy Verification suite (database, APIs, version, changelog, LLM connectivity, etc.) and sends the results as a notification to the project owner. No manual steps needed — just publish and you'll be notified if anything broke.
CEO Revision Creates New Build Versions (V4, V5, etc.)
CEO revisions in Build mode now create new Build versions instead of overwriting the latest. Each revision saves as V4, V5, V6, etc. so users can compare all iterations side-by-side in the Build Versions viewer. The new version appears instantly in the left panel after the revision completes.
Shared Footer Component, Playwright E2E Smoke Tests & QA Automation
Extracted the 8-link footer into a shared SiteFooter component used by all 6 pages (Home, About, Examples, Instructions, Changelog, Builds, ComponentShowcase) — eliminates duplication and ensures consistent navigation. Created a comprehensive Playwright e2e smoke test suite (pre-deploy-smoke.spec.ts) with 22 automated tests covering footer links, version badges, mobile overflow, changelog integrity, about page model names, invite flow, and cooldown persistence. All tests run before every publish to catch regressions automatically.
Perplexity QA Fixes: Cooldown Persistence, Run Badge, Footer & Share Toast
Addresses all findings from the Perplexity QA v3.22 test report. The 15-second abort cooldown now persists across page refreshes via localStorage, preventing bypass by reloading. The blue 'Run' badge now appears in logged-in user history (userHistory endpoint was missing runMode). Rate-limit errors from the server now auto-trigger a client-side cooldown countdown. The /builds page now includes the standard 8-link footer, and all copy-to-clipboard actions show a consistent toast notification.
Mobile Versions Tab, Revision Timeout Fix & Time Remaining Estimate
Fixed the missing Versions tab on mobile Build mode — V1/V2/V3 website version buttons now appear on mobile just like desktop. Increased CEO revision timeout from 5 to 8 minutes and added per-operation timeouts (30s per search, 20s per image scrape) to prevent indefinite hangs. The modification button now shows an estimated time remaining during each phase.
CEO Revision Progress Tracking, Retry Buttons & Stale Connection UX
Major improvement to the CEO modification flow: the button now shows real-time progress during research, image scraping, and LLM generation phases with an elapsed time counter. If a revision fails, a Retry button appears inline. All three stale connection warning banners now include a Retry button for immediate manual recovery. Progress events are streamed from the server during the research phase so users see exactly what the CEO is doing.
Abort Cooldown Enforcement, Rate Limit Error Handling & Run Badge Fix
Fixes three bugs found during QA testing: (1) After aborting a run, buttons now show a visible 15-second cooldown countdown preventing rapid re-runs. (2) 'Rate exceeded' plain-text errors from upstream proxies are now caught and shown as a friendly message instead of a JSON parse crash. (3) The blue 'Run' badge now correctly appears in the history sidebar for all Run-mode tasks.
Mobile Stall Watchdog, SSE Reconnection UX & Share Fixes
Critical fix for mobile execution stalls: added a client-side watchdog that detects when no server events arrive for 45+ seconds, shows a visible warning, and auto-aborts at 90 seconds with a retry option. SSE reconnection now shows toast notifications ('Reconnecting...' / 'Reconnected!'). Stale connection warning banner appears on both Task and Log tabs. Share button uses on-demand S3 upload so blob URLs are never shared.
On-Demand S3 Upload for Share & Run Mode Badge
Share and Open buttons now upload HTML to S3 on-demand via a new tRPC endpoint, ensuring shared links are always real CloudFront URLs. Added 'Run' mode badge (blue) to history sidebar so all three modes have visible badges.
Share Link Blob URL Fix & Cooldown Reduction
Fixed all Open/Share buttons to use real CloudFront URLs when available instead of blob: URLs. Reduced abort cooldown from 60s to 15s for faster re-testing.
Cooldown Reduction & Planning Latency Investigation
Reduced run cooldown from 60 seconds to 15 seconds for faster iteration. Investigated 'stuck in planning' report — confirmed it was normal LLM latency with heartbeat working correctly.
Share Button Blob URL Fix
Fixed Share button to only appear when a real CloudFront URL exists. Open in new tab now uses the hosted URL instead of blob URL.
Dedicated Share Button & Button Clarity
Added a dedicated Share button to Build Versions and HtmlPreview toolbar that copies the CloudFront website URL. Renamed 'Play' to 'Preview' and 'View & Play' to 'View Latest' for clarity.
Image Scraping for CEO Revisions & Scroll Bug Fix
CEO revision flow can now scrape real images from websites mentioned in modification instructions. When users say 'add pictures from example.com', the system fetches actual image URLs from the HTML instead of guessing. Defensive CSS fixes applied to all marketing pages to prevent white-page scroll issues across browsers.
Mode Suggestion Toast, Heartbeat & Public Changelog
Mode suggestion toast appears when a task would work better in a different mode. Heartbeat entries show progress during long operations. Changelog page made public. Footer navigation updated on most pages. Pricing chips visible below status bar.
CEO Revision Web Research & Build Timeout Increase
CEO revision flow now does live web research when users mention URLs or ask to improve content. Build timeout increased to 20 minutes with automatic retry. Better error messages for skipped cycles.
Streaming CEO Revisions & Activity Log Improvements
CEO revision process now streams changes in real-time instead of waiting for the full response. Activity log entries improved with better formatting and timestamps.
CEO Modification Flow & Send to CEO Button
New 'Send to CEO for Revision' button lets users request modifications to completed outputs. CEO analyzes the request and applies targeted changes to the existing draft.
Quick Test, Automated Verification & Revision Upgrades
Major admin tooling update: Quick Test button runs lightweight smoke tests ($0.15-0.30) to verify the full Run + Build pipeline without external agents. All 12 manual deploy checklist items are now automated server-side checks. Revision process upgraded to modify existing drafts instead of rewriting from scratch. Credit dashboard crash fixed.
Stale Run Recovery & Deployment Resilience
Fixed the 'Unable to start run' error that occurred after server deployments. When a run was interrupted by a deployment (stuck in PLAN/EXECUTE state but no longer active in memory), the system now automatically recovers it by restarting with the stored task brief. Users see a seamless restart instead of an error message.
Connection Health Indicator & Retry for Stuck Steps
Added a real-time 'Last update: Xs ago' indicator with color-coded connection health (green/yellow/red). When no server event is received for 90+ seconds, a 'Retry Step' button appears to abort the stuck step and let users re-submit. Visible on both desktop and mobile, on all tabs.
Heartbeat Progress Updates During Long LLM Calls
Added heartbeat progress updates every 15 seconds during all long-running LLM calls. Users now see live status messages like 'CEO is analyzing the task (30s)...' instead of a frozen screen. Applied to all 7 LLM call points: CEO planning, Operator drafting, Reviewer checking, Replanning, Improvement cycles, and Finalization.
Mobile UX: Live Timer, Progress on All Tabs, Bigger Fonts
Fixed the frozen elapsed timer so it counts up every second (not just on server events). Added a sticky progress bar visible on Plan and Outputs tabs. Increased font sizes for credit balance and settings on mobile. Hidden confusing dual ETAs during active runs.
LLM Retry Logic & Invite Code Fix
Added retry logic with exponential backoff to all three LLM clients (Built-in, ChatGPT, Gemini) to handle transient upstream errors. Fixed the 'Change Invite Code' button so it properly clears the saved session and navigates to the invite page.
Run Duration Tracking & Completion Notifications
Accurate run duration tracking with completedAt timestamps. Notification sound and vibration on mobile when runs complete so you don't have to watch the screen.
Pre-Deploy Checklist & ETA
New admin deploy checklist page with automated health checks and manual verification items. Estimated completion time shown near Run buttons based on historical run durations.
Run History Mobile UX
Shows dollar amount alongside remaining runs in the Run History sidebar. Trash/delete icon is now always visible on mobile instead of requiring hover.
Build Quality & Credit UX Overhaul
Major fix: CEO and Operator now produce working HTML code instead of design documents for build tasks. Added low balance warnings, per-run cost tracking in history, and auto-refreshing credit balance.
Invite Code Cleanup & Credit Balance Display
Cleaned up old invite codes (BETA/GAMMA/DELTA/EPSILON/ZETA) to shorter names. Enhanced credit balance display next to Run buttons with visual progress bar, color-coded warnings, and estimated cost per run type.
Mobile Reliability Fix — Task Brief Persistence
Fixed 'Invalid run state or missing task brief' error on mobile devices (Samsung S24 Plus and similar). Task briefs are now persisted to the database immediately when a run starts, eliminating race conditions between the run mutation and SSE connection on slow mobile networks.
Gemini Fallback, Operator Code Output & Builder Auto-Trigger
Gemini API now retries with fallback models when overloaded. Operator prompt rewritten to always produce working code (games, apps, tools) instead of design documents. Builder auto-triggers immediately when 'Run + Build' completes — no manual click needed.
Owner-Only Changelog, Instructions Rewrite & Code Update
Changelog page is now restricted to the project owner only (verified by identity, not invite code). Instructions page completely rewritten to reflect two run modes, credit system, and current features. Invite code simplified for easier entry.
Credit System, Two Run Modes & Branding Update
Major update introducing a credit system with per-invite budgets, two run modes (Operator for fast AI runs, Builder for full website builds), admin credit dashboard, and complete removal of third-party branding from all user-facing text.
Admin Health Dashboard, E2E Browser Tests & Post-Publish Verification
Added an admin-only health dashboard at /admin/health with live status indicators for database, API keys, environment, and system health. Set up Playwright E2E browser tests (25 tests) covering all critical user flows. Created a post-publish auto-verification suite (23 tests) that can run against production after every deploy.
Deep Testing, Health Check & Regression Guards
Added comprehensive test suites covering the full run lifecycle, post-deployment verification, and regression guards for every version since v2.26. New /api/trpc/version.health endpoint provides real-time system health monitoring with database, API key, and environment checks.
Time Management: Countdown, Configurable Timeout & Estimated Completion
Added real-time elapsed/remaining time countdown to the progress bar, configurable per-run timeout (via Max Time setting), and estimated completion time displayed in the Cost Estimate card based on token complexity.
Increased Hard Timeout to 10 Minutes
Doubled the hard timeout from 5 minutes to 10 minutes, giving complex tasks with multiple improvement cycles enough time to complete without hitting the time limit.
Graceful Timeout Handling for Improvement Cycles
Fixed a critical issue where runs that completed successfully (reviewer-approved) would show as ERROR when improvement cycles couldn't start due to time limits. Now, if time runs out during improvements, remaining cycles are gracefully skipped and the reviewer-approved output is saved. Finalization also gracefully handles timeouts by using the draft directly.
QC Smoke Tests, Plan Step Badge & Progress Improvements
Added a comprehensive pre-deployment QC smoke test suite with 41 automated checks covering model selection, run creation, progress tracking, plan steps, version consistency, SSE events, and more. Plan step count is now shown as a badge next to the Execution Plan header. Progress display improved with replanning and improving phases.
GPT-5.2 API Compatibility Fix
Fixed a critical API error where GPT-5.2 was rejecting the max_tokens parameter. All GPT-5 family models (5, 5-mini, 5-nano, 5.2, 5.2-pro) now correctly use max_completion_tokens as required by OpenAI's API. Legacy models (GPT-4o) still use max_tokens.
PWA Popup Fix + QC Testing
Fixed the browser's automatic PWA install popup that was appearing on every page. The install prompt is now fully suppressed — users can only install the app by tapping the link at the bottom of the About page. No more intrusive banners.
PWA Install — Add to Home Screen
AI Company can now be installed as a Progressive Web App (PWA) on your phone or desktop. Visit the About page and tap 'Install AI Company as an App' at the bottom. Works offline with service worker caching. No app store needed.
Auto-Migrate Existing User Model Defaults
Existing users who had old/retired models cached in localStorage (like GPT-4o or Gemini 2.0 Flash) are now automatically upgraded to the new defaults: GPT-5.2 for CEO and Gemini 3 Pro Preview for Reviewer. No manual action needed — the migration happens silently on page load.
Builds Gallery + Access Code Protection
New public Builds Gallery page at /builds showing all Builder builds with status, progress, and credit usage. Each build is protected by a unique 8-character access code — you need the code to view details, URLs, and results. The Examples page remains fully public. Access codes are shown to the build creator after triggering a build, and can be shared with others for controlled access.
Builder Agent — CEO Output → Real Websites
The CEO's output can now be sent directly to the Builder agent for real-world execution. After a run completes, use 'Run + Build' mode to have the Builder actually build what the CEO planned — websites, code, documents, anything. Real-time progress tracking with polling, result links, and shareable URLs. This is the first step toward the full AI Council vision.
Default Model Upgrade — GPT-5.2 & Gemini 3 Pro
Upgraded default AI models to the latest and most capable versions. CEO agent now defaults to GPT-5.2 (OpenAI's best model for complex tasks) instead of GPT-5 Mini. Reviewer agent now defaults to Gemini 3 Pro instead of Gemini 2.5 Flash. Fixed model detection logic so GPT-5 base is correctly treated as a reasoning model, while GPT-5.2 uses standard API parameters. Also fixed 3 hardcoded fallback model references that were still pointing to retired GPT-4o.
Mobile UX Improvements
Fixed mobile readability issues on Samsung Galaxy S24+ and similar devices. Log entry text is no longer truncated — full descriptions are visible without needing to expand. Transcript sections now use larger font sizes with better line spacing for comfortable reading on mobile screens.
GPT-5 Temperature Fix
Fixed API error where GPT-5 models rejected custom temperature values. GPT-5 Mini and other GPT-5 models only support the default temperature (1), so the temperature parameter is now omitted for all GPT-5+ models. Legacy models like GPT-4o still support custom temperature.
GPT-5 API Compatibility Fix
Fixed a critical API compatibility issue where GPT-5 models rejected the legacy 'max_tokens' parameter. GPT-5 and newer models now correctly use 'max_completion_tokens' while legacy models (GPT-4o) continue using 'max_tokens'. The fix applies to both standard and streaming API calls.
Smarter AI Models — GPT-5 & Gemini 3
Upgraded to the latest AI models for dramatically better output quality. CEO agent now defaults to GPT-5 Mini with GPT-5.2 and GPT-5.2 Pro available for complex tasks. Reviewer agent defaults to Gemini 2.5 Flash with Gemini 3 Pro Preview available for the deepest analysis. All models are selectable per-run in the Run Settings dropdown. Legacy models (GPT-4o, Gemini 2.0 Flash) still available but marked as deprecated.
Agent Transcripts, Progress Stepper & Bug Fixes
Full transparency into what each AI agent said and did. Every log entry now has an expandable 'Agent Transcript' showing the actual prompt sent and response received, with model name and token estimates. The progress bar is replaced with a visual phase stepper showing Planning → Executing → Reviewing → Complete with checkmarks. Five Tier 1 bugs fixed including long brief validation, CEO prompt disambiguation, clarification handling, SSE race condition, and data retention cleanup.
Reviewer Details, Improvement Cycles & History Fixes
The Reviewer's activity log now shows detailed findings — every issue, veto reason, and suggestion is visible instead of just a summary count. A new 'Post-Complete Improvements' setting (default 3) makes the CEO do additional research and improvement cycles even after the Reviewer approves, ensuring higher quality output. History sidebar no longer shows duplicate or non-clickable entries.
Execution Progress & What's Next
All output tabs now show live progress during execution instead of blank placeholders. The Final tab displays which agent is currently working (CEO planning, Operator building, Reviewer checking) with recent activity feed. Draft and Review tabs show contextual loading states. After run completion, a 'What's Next' section suggests modifications, new tasks, or export options.
Streaming Revisions & Extreme QC
CEO revisions now stream in real-time instead of waiting for the full response — you see the output being written word by word with a typing cursor. This also eliminates mobile timeout issues on slow connections. Full extreme quality control pass across all pages, APIs, and features.
Smarter CEO — Build vs Display Intelligence
Major upgrade to CEO agent intelligence. The CEO now correctly distinguishes between 'build this' and 'display this' instructions — pasting game dev instructions will build the actual game, not a documentation page. Added clarification mechanism for ambiguous tasks. Improved log visibility with AI model labels. Better error handling for revision requests.
Unlimited Runs — Daily Limit Removed
Removed the 10 runs/day per-invite limit and the 30 runs/day per-IP limit. Users can now run unlimited tasks with only a 60-second cooldown between runs. Cleaned up unused rate limit constants.
Quality Check & Mobile Error Visibility
Full quality check across all pages, APIs, and mobile layouts. Error messages now display on all mobile tabs (Task, Plan, Outputs) instead of only the Task tab. Auto-switches to Task tab when an error occurs. Fixed all changelog dates from 2025 to 2026.
Friendly Error Messages & 20K Character Limit
Error messages are now human-readable instead of showing raw JSON. Task brief character limit increased from 5,000 to 20,000 characters. Validation errors display clear, actionable messages like 'Task brief is too long (max 20,000 characters)' instead of cryptic Zod error objects.
Mobile-First Headers & Version Visibility
All pages now have fully responsive headers that work on Samsung S24+ and other mobile devices. Version badge is always visible on every screen. Home page has a hamburger menu on mobile. Navigation buttons are compact on small screens with abbreviated labels.
Try This, History Search, Cost Tracking & Mobile Fixes
Three new features plus critical mobile fixes. 'Try This' buttons on instruction examples auto-fill the app with pre-configured tasks. Run history now has search and status filtering. Activity Log shows per-agent cost estimates with a running total breakdown. Mobile session restore no longer gets stuck, and viewport overflow is fixed.
Enhanced Instructions & Ready-to-Use Commands
Completely revamped instructions page with 4 detailed, copy-paste-ready example prompts covering deep research, brainstorming, website creation, and competitive analysis. Each example includes full task description, recommended settings with dollar budgets, and explanation of what each AI agent does. Added new Website Template, Pro Tips section, and Budget column to the Quick Reference table.
Activity Log, HTML Preview & Post-Run Summary
New Activity Log tab shows every agent action in real-time — see what the CEO planned, what the Operator searched and drafted, and what the Reviewer decided. Draft tab now renders HTML websites as live previews instead of raw code. Post-run summary card shows completion stats with quick navigation links.
Revision File Attachments
You can now attach files (images, PDFs, documents) when requesting modifications to a completed run. The CEO agent uses attached files as context for more accurate revisions. Both mobile and desktop layouts support the new file upload UI.
Run Reliability & Refresh
Fixed the 'Invalid run state' error when running tasks with long briefs. Task briefs are now stored server-side before SSE connection, eliminating URL length limits. Added a Refresh button to reload run data without a full page refresh.
Live Progress, Thumbnails & Cross-Device Access
Real-time progress indicators show cycle count, current phase, and estimated time. Projects gallery now shows live thumbnail previews. Long task briefs no longer cause errors. Runs persist in history and are accessible from any device via invite code.
History Fix, Start New Task & Tailwind CDN
Run history is now fully clickable across sessions, a 'Start New Task' button lets you reset and begin fresh, and all generated websites automatically include Tailwind CSS + Inter font for polished styling.
Website Preview & Full HTML Pipeline
The AI CEO now generates complete, self-contained HTML websites when you ask for a website, landing page, or portfolio. Preview them live in an iframe with code view, fullscreen, and download options.
Auto-Save Website Projects
When the CEO generates a website (HTML output), it is automatically saved as a project with a permanent shareable URL at /projects/:slug.
Versions & Website Preview
Added Versions tab to mobile layout, View Website button for HTML outputs, and export buttons (Copy, .txt, .md, PDF) on every version.
Simplified Settings & Reliability
Redesigned run settings as a clean flat list, added delete run capability, and improved stability across the board.
Persistent Run History
Run history now persists across version updates and re-authentication — your past runs will never disappear again.
Run Modes, Projects & Reliability
Four flexible run modes, live project previews, persistent invite codes, and critical stability fixes.
Draft Modifications & CEO Attribution
Iterative draft refinement — ask the CEO to revise outputs, with clear attribution showing which agent wrote each draft.
File Intelligence & Polish
Major upgrade to file handling with content extraction, drag-and-drop uploads, and version tracking across the app.
File Uploads & Run History
Added file upload support, run history sidebar, and configurable run settings.
Initial Release
First public release of AI Company — the AI CEO which thinks before acting.