TaylorIQ: Building an AI Intelligence Service
A year-long technical and strategic retrospective
Built and operated a production AI pipeline that delivered weekly intelligence briefs to commercial medical teams. Owned every function: product strategy, engineering, customer discovery, sales, and operations. The product worked and users found real value, but enterprise sales cycles and scaling constraints made solo operation untenable. Shut down deliberately after a year of structured experimentation.
Background
I spent a year building TaylorIQ: an AI-powered intelligence monitoring service for commercial professionals in fast-moving industries. The goal was to test whether a real unmet need I'd seen firsthand (inconsistent competitive and clinical monitoring) could be solved with AI search tooling that was just becoming available.
I came in with a business analytics background, professional experience writing PRDs and managing projects, and some coding ability, but no experience building production software systems. I used AI-assisted development extensively, which meant the quality of my requirements documents directly determined the quality of the output. Writing detailed, unambiguous PRDs became both the skill I leaned on most and the one I developed fastest.
TaylorIQ ran from January through December 2025. It produced working software, real pilot users, and a deep understanding of what it takes to build with AI at production quality. It also produced a data-informed decision to stop.
The Problem
Commercial teams in the medical industry (product specialists, sales managers, market access roles) needed to stay current on competitor activity, clinical guideline shifts, FDA decisions, and new research. In practice, this monitoring was inconsistent and manual. Google Alerts were too noisy. Keyword tools surfaced irrelevant results. Internal research meetings were irregular.
The failure wasn't effort; it was reliability. No tool could consistently surface the right information for a specific person's role, at the right time, without requiring significant manual filtering.
Think of it as a personal researcher who runs a defined set of searches every week on your behalf: most weeks returning nothing, but reliably catching the ones that matter, and summarizing them with context specific to your role.
Product Overview
TaylorIQ delivered a weekly email brief containing cited, categorized updates across clinical, regulatory, and competitive domains. Each brief was tailored to a specific user's role, company, and tracked topics.
- ~100 targeted research queries per user, defined through a discovery interview process
- No dashboard, no login, no app. Just a concise email arriving every Tuesday
- Most queries returned no result in a given week, by design. The system watched for changes, not generated content for its own sake
- Role-specific context injection: summaries were tailored to the user's company, role, and focus areas
How quality evolved: Early briefs were noisy: too many tangential results, inconsistent formatting, and summaries that didn't match how users consumed the information. I iterated on both dimensions in parallel: tuning query configurations and prompt engineering to improve content quality, and restructuring the brief format based on what users read and skipped. By the final months, the briefs were substantially tighter and more useful than where they started.
Technical Architecture
The production system ran on three Python scripts scheduled via GitHub Actions, backed by a Supabase (PostgreSQL) database:
fetch.py
Research Phase
Perplexity Sonar APIgenerate.py
Summarization Phase
Gemini APIsend_emails.py
Delivery Phase
Gmail APIDevelopment Timeline
Tested workflow tools, built an RSS-based pipeline in Python, then discovered a better approach.
Started with visual workflow tools (n8n, Zapier, Make.com) to test what AI could produce and what was noise. Moved into Python and built a structured pipeline: RSS feeds, keyword detection, Gemini summarization. It worked, but had problems: too much noise, missed signals outside RSS feeds, and high API costs relative to useful output.
This is where I started developing evaluation frameworks: structured benchmarks for assessing whether AI-generated content was accurate, relevant, and useful. This became foundational to everything that came after.
Built custom benchmarks for a barely-documented API, attended weekly Perplexity office hours, and implemented per-topic configurations as new features shipped.
The Sonar API was brand new and sparsely documented. Making it work for a structured weekly research pipeline with hundreds of per-user queries required building my own benchmarks: running the same queries repeatedly, comparing outputs, and documenting what configurations produced consistent results.
In April, Perplexity started hosting weekly developer office hours; I found the announcement through my own tool. I attended every week, gaining direct access to their engineering team and early visibility into upcoming features.
As new capabilities shipped (domain filtering, academic search mode, SEC search mode, recency filters, reasoning effort controls), I implemented each one per-topic, not globally. A topic tracking FDA approvals used entirely different settings than one tracking competitor sales activity.
Rearchitected from single-user prototype to multi-user production system. Moved to managed infrastructure.
The early pipeline was built around a single user. Expanding to multi-user required a meaningful rebuild: new database schema for users, topics, and user-topic mappings; CLI tooling for user and topic management; per-user custom prompts and delivery settings.
Moved from SQLite + local cron jobs to Supabase (managed PostgreSQL) + GitHub Actions for production scheduling. SQLite stayed as the local development database for fast iteration.
Each new Perplexity model release changed output behavior, sometimes in undocumented ways, requiring re-testing and re-tuning against quality benchmarks.
Shifted to sales and pilot work. Gathered enough evidence to make a deliberate, data-informed decision to sunset.
The latter part of the year shifted toward sales and pilot work, with continued technical maintenance. LinkedIn outreach with custom demo briefs generated interest, but enterprise procurement timelines and the difficulty of quantifying "not missing competitive updates" made closing slow.
By December, the evidence was clear: the product worked, users found value, but the path to a sustainable business required either significant capital or a structural rethink of the go-to-market motion. I chose to stop deliberately rather than continue on momentum alone.
Go-to-Market
The primary channel was LinkedIn outreach targeted at VP Sales, CCOs, Product Owners, and competitive intelligence roles at oncology and diagnostics companies, audiences who already understood the problem.
The hook: a custom-built sample brief for each target account, generated using AI tools, showing the prospect exactly what their weekly report would look like before any commitment.
What I learned: The demos surfaced a real limitation: generating a convincing sample without a proper discovery interview produced something that looked right but wasn't calibrated to the user's needs. Prospects could tell. The demos worked better as a hook than as a close.
Early in the go-to-market phase I invested time and money in a video ad. I cast too wide a net; the message was too generic to resonate with any specific audience.
The core problem: TaylorIQ was difficult to explain to people who don't already think in terms of AI-augmented workflows. "A personal researcher that runs searches for you every week" is accurate but abstract.
Lesson: With a product this concept-dependent, showing is far more effective than telling. Custom demos were the right direction. The ad was a waste.
An early strategic decision was to pilot across multiple industries: oncology, private equity, pharmacy, and others. The reasoning was that the underlying problem is industry-agnostic, and testing broadly would show what was generalizable.
This was a mistake. Every different sector required a completely different discovery process. The question set, the relevant sources, the evaluation criteria: all of it was domain-specific. Breadth spread my time across contexts I couldn't develop deep expertise in quickly.
Lesson: Specializing in oncology was the right call, and it came later than it should have. The domain knowledge from my time as a product specialist was the actual edge: it let me define better queries, evaluate results accurately, and speak credibly during sales.
Pitfalls
OpenPerplex Integration
I investigated an open-source project that claimed to replicate Perplexity's search capabilities as an alternative backend. Spent significant time integrating it. The output quality was substantially lower, the implementation was brittle, and the maintenance burden was high.
Prompt Engineering Drift
Each new Perplexity model release changed output behavior in ways that weren't always documented. Prompts that produced reliable output on one model version would produce degraded output on the next.
Per-User Onboarding Bottleneck
Each user required a custom discovery interview to define their query set, followed by iterative testing to tune configurations. Generic queries produced generic output.
Why It Ended
After a year, I made a deliberate decision to stop. The product worked. Users found value in the reports. But several factors made continued solo operation untenable:
Enterprise Sales Cycles
Target buyers operate on long procurement timelines. As a solo founder without brand recognition, converting interest into contracts was slow.
Invisible ROI
The value of not missing a competitive update is real but abstract. Prospects understood the problem but struggled to justify budget for a cost that never appeared on a spreadsheet.
Scaling Bottleneck
The per-user onboarding process was irreducibly manual. Automating it would have required a product redesign, not just an engineering fix.
Platform Dependency
Building core infrastructure on a single API from an early-stage company in a competitive market was a material long-term risk.
Advancing Agent Capabilities
General-purpose AI agent tooling advanced significantly over 2025. TaylorIQ offered higher-quality output, but the gap was narrowing against a moving target.
Fundraising Calculus
The AI landscape moved fast enough that a well-funded competitor could eliminate the market at any point. I preferred a bounded one-year test over taking outside capital into that uncertainty.
The shutdown was a data-informed decision, not a failure of validation. The problem was real. The product worked. The path to a sustainable business required either significant capital or a structural rethink of the go-to-market motion.
Skills Developed
AI Pipeline Engineering
Quality Evaluation
Python
Database & Infrastructure
API Integration
B2B Go-to-Market
Product Development
Strategic Decision-Making
What I Took Away
TaylorIQ was a deliberate bet on timing: domain expertise in oncology, early access to new search AI tooling, and a real unmet need I understood firsthand. I built it, tested it, sold it, iterated on it, and made a clear decision to stop when the evidence pointed that direction.
The experience gave me a working understanding of what it takes to build with AI at production quality: not just connecting APIs, but designing evaluation frameworks, managing model drift, building for reliability, and navigating the gap between "this works in a demo" and "this works every week without breaking."
Operating every function solo (product strategy, engineering, customer discovery, sales, and operations) forced me to make real tradeoffs rather than theoretical ones. When you own the entire problem, you learn quickly which decisions compound and which ones don't matter yet.
It also taught me the difference between a product people find valuable and a business that works. Those are not the same thing, and knowing when you have one but not the other is a useful skill.