TaylorIQ: Building an AI Intelligence Service

A year-long technical and strategic retrospective

12 Months Duration

Solo Operator Role

3-Stage Pipeline Architecture

~100 Queries/User Weekly Scale

[X]% Quality Improvement

Sunset Dec 2025 Outcome

Built and operated a production AI pipeline that delivered weekly intelligence briefs to commercial medical teams. Owned every function: product strategy, engineering, customer discovery, sales, and operations. The product worked and users found real value, but enterprise sales cycles and scaling constraints made solo operation untenable. Shut down deliberately after a year of structured experimentation.

Background

I spent a year building TaylorIQ: an AI-powered intelligence monitoring service for commercial professionals in fast-moving industries. The goal was to test whether a real unmet need I'd seen firsthand (inconsistent competitive and clinical monitoring) could be solved with AI search tooling that was just becoming available.

I came in with a business analytics background, professional experience writing PRDs and managing projects, and some coding ability, but no experience building production software systems. I used AI-assisted development extensively, which meant the quality of my requirements documents directly determined the quality of the output. Writing detailed, unambiguous PRDs became both the skill I leaned on most and the one I developed fastest.

TaylorIQ ran from January through December 2025. It produced working software, real pilot users, and a deep understanding of what it takes to build with AI at production quality. It also produced a data-informed decision to stop.

The Problem

Commercial teams in the medical industry (product specialists, sales managers, market access roles) needed to stay current on competitor activity, clinical guideline shifts, FDA decisions, and new research. In practice, this monitoring was inconsistent and manual. Google Alerts were too noisy. Keyword tools surfaced irrelevant results. Internal research meetings were irregular.

The failure wasn't effort; it was reliability. No tool could consistently surface the right information for a specific person's role, at the right time, without requiring significant manual filtering.

Think of it as a personal researcher who runs a defined set of searches every week on your behalf: most weeks returning nothing, but reliably catching the ones that matter, and summarizing them with context specific to your role.

Product Overview

TaylorIQ delivered a weekly email brief containing cited, categorized updates across clinical, regulatory, and competitive domains. Each brief was tailored to a specific user's role, company, and tracked topics.

~100 targeted research queries per user, defined through a discovery interview process
No dashboard, no login, no app. Just a concise email arriving every Tuesday
Most queries returned no result in a given week, by design. The system watched for changes, not generated content for its own sake
Role-specific context injection: summaries were tailored to the user's company, role, and focus areas

How quality evolved: Early briefs were noisy: too many tangential results, inconsistent formatting, and summaries that didn't match how users consumed the information. I iterated on both dimensions in parallel: tuning query configurations and prompt engineering to improve content quality, and restructuring the brief format based on what users read and skipped. By the final months, the briefs were substantially tighter and more useful than where they started.

Technical Architecture

The production system ran on three Python scripts scheduled via GitHub Actions, backed by a Supabase (PostgreSQL) database:

1

fetch.py

Research Phase

Perplexity Sonar API

2

generate.py

Summarization Phase

Gemini API

3

send_emails.py

Delivery Phase

Gmail API

Python PostgreSQL Supabase SQLite GitHub Actions Perplexity Sonar API Gemini API Gmail API Google Cloud OAuth

Development Timeline

Jan–Feb

Tested workflow tools, built an RSS-based pipeline in Python, then discovered a better approach.

Started with visual workflow tools (n8n, Zapier, Make.com) to test what AI could produce and what was noise. Moved into Python and built a structured pipeline: RSS feeds, keyword detection, Gemini summarization. It worked, but had problems: too much noise, missed signals outside RSS feeds, and high API costs relative to useful output.

This is where I started developing evaluation frameworks: structured benchmarks for assessing whether AI-generated content was accurate, relevant, and useful. This became foundational to everything that came after.

Key decision: Switched to Perplexity's Sonar API after discovering it through my own pipeline. The RSS approach was producing diminishing returns; Sonar offered search-first, cited web results that were a fundamentally better fit. I found it because one of my preset searches was tracking AI API updates; the tool surfaced the thing that replaced it.

Mar–Jun

Built custom benchmarks for a barely-documented API, attended weekly Perplexity office hours, and implemented per-topic configurations as new features shipped.

The Sonar API was brand new and sparsely documented. Making it work for a structured weekly research pipeline with hundreds of per-user queries required building my own benchmarks: running the same queries repeatedly, comparing outputs, and documenting what configurations produced consistent results.

In April, Perplexity started hosting weekly developer office hours; I found the announcement through my own tool. I attended every week, gaining direct access to their engineering team and early visibility into upcoming features.

As new capabilities shipped (domain filtering, academic search mode, SEC search mode, recency filters, reasoning effort controls), I implemented each one per-topic, not globally. A topic tracking FDA approvals used entirely different settings than one tracking competitor sales activity.

Key decision: Narrowed from multi-industry pilots to oncology only. Every different sector required a completely different discovery process, question set, source list, and evaluation criteria. My domain knowledge from years as a product specialist in oncology was the actual edge: it let me define better queries, evaluate results accurately, and speak credibly during sales.

Jul–Sep

Rearchitected from single-user prototype to multi-user production system. Moved to managed infrastructure.

The early pipeline was built around a single user. Expanding to multi-user required a meaningful rebuild: new database schema for users, topics, and user-topic mappings; CLI tooling for user and topic management; per-user custom prompts and delivery settings.

Moved from SQLite + local cron jobs to Supabase (managed PostgreSQL) + GitHub Actions for production scheduling. SQLite stayed as the local development database for fast iteration.

Each new Perplexity model release changed output behavior, sometimes in undocumented ways, requiring re-testing and re-tuning against quality benchmarks.

Key decision: Rearchitected for multi-user before having paying customers. This was a bet that the product would need to scale, and that building the foundation now would be cheaper than retrofitting later. It was the right call: the multi-user infrastructure made pilot onboarding significantly smoother.

Oct–Dec

Shifted to sales and pilot work. Gathered enough evidence to make a deliberate, data-informed decision to sunset.

The latter part of the year shifted toward sales and pilot work, with continued technical maintenance. LinkedIn outreach with custom demo briefs generated interest, but enterprise procurement timelines and the difficulty of quantifying "not missing competitive updates" made closing slow.

By December, the evidence was clear: the product worked, users found value, but the path to a sustainable business required either significant capital or a structural rethink of the go-to-market motion. I chose to stop deliberately rather than continue on momentum alone.

Key decision: Shut down a working product. The hardest call wasn't any technical pivot; it was recognizing that a product people find valuable and a business that works are not the same thing, and acting on that distinction.

Go-to-Market

LinkedIn Outreach with Custom Demos

The primary channel was LinkedIn outreach targeted at VP Sales, CCOs, Product Owners, and competitive intelligence roles at oncology and diagnostics companies, audiences who already understood the problem.

The hook: a custom-built sample brief for each target account, generated using AI tools, showing the prospect exactly what their weekly report would look like before any commitment.

What I learned: The demos surfaced a real limitation: generating a convincing sample without a proper discovery interview produced something that looked right but wasn't calibrated to the user's needs. Prospects could tell. The demos worked better as a hook than as a close.

Video Ad: What Not To Do

Early in the go-to-market phase I invested time and money in a video ad. I cast too wide a net; the message was too generic to resonate with any specific audience.

The core problem: TaylorIQ was difficult to explain to people who don't already think in terms of AI-augmented workflows. "A personal researcher that runs searches for you every week" is accurate but abstract.

Lesson: With a product this concept-dependent, showing is far more effective than telling. Custom demos were the right direction. The ad was a waste.

Vertical Expansion: The Specialization Mistake

An early strategic decision was to pilot across multiple industries: oncology, private equity, pharmacy, and others. The reasoning was that the underlying problem is industry-agnostic, and testing broadly would show what was generalizable.

This was a mistake. Every different sector required a completely different discovery process. The question set, the relevant sources, the evaluation criteria: all of it was domain-specific. Breadth spread my time across contexts I couldn't develop deep expertise in quickly.

Lesson: Specializing in oncology was the right call, and it came later than it should have. The domain knowledge from my time as a product specialist was the actual edge: it let me define better queries, evaluate results accurately, and speak credibly during sales.

Pitfalls

OpenPerplex Integration

I investigated an open-source project that claimed to replicate Perplexity's search capabilities as an alternative backend. Spent significant time integrating it. The output quality was substantially lower, the implementation was brittle, and the maintenance burden was high.

Lesson: The concern about platform dependency was legitimate. The solution was wrong. The right answer: architect for multiple search backends from the start, not bet on an unproven open-source clone.

Prompt Engineering Drift

Each new Perplexity model release changed output behavior in ways that weren't always documented. Prompts that produced reliable output on one model version would produce degraded output on the next.

Lesson: This required re-testing against benchmarks after every model update, an ongoing maintenance cost that was easy to underestimate. Benchmarks aren't optional; they're infrastructure.

Per-User Onboarding Bottleneck

Each user required a custom discovery interview to define their query set, followed by iterative testing to tune configurations. Generic queries produced generic output.

Lesson: The thing that made the product good was also the thing that made it hard to grow. This is the same quality-vs-scale tension any product team faces. Recognizing it early is better than discovering it after committing to a growth plan that assumes it away.

Why It Ended

After a year, I made a deliberate decision to stop. The product worked. Users found value in the reports. But several factors made continued solo operation untenable:

Enterprise Sales Cycles

Target buyers operate on long procurement timelines. As a solo founder without brand recognition, converting interest into contracts was slow.

Invisible ROI

The value of not missing a competitive update is real but abstract. Prospects understood the problem but struggled to justify budget for a cost that never appeared on a spreadsheet.

Scaling Bottleneck

The per-user onboarding process was irreducibly manual. Automating it would have required a product redesign, not just an engineering fix.

Platform Dependency

Building core infrastructure on a single API from an early-stage company in a competitive market was a material long-term risk.

Advancing Agent Capabilities

General-purpose AI agent tooling advanced significantly over 2025. TaylorIQ offered higher-quality output, but the gap was narrowing against a moving target.

Fundraising Calculus

The AI landscape moved fast enough that a well-funded competitor could eliminate the market at any point. I preferred a bounded one-year test over taking outside capital into that uncertainty.

The shutdown was a data-informed decision, not a failure of validation. The problem was real. The product worked. The path to a sustainable business required either significant capital or a structural rethink of the go-to-market motion.

Skills Developed

AI Pipeline Engineering

Multi-model pipelines Per-query API config Production prompt engineering

Quality Evaluation

Benchmark design Quantitative output assessment Cross-model consistency testing

Python

Production pipeline scripts CLI tooling Database abstraction layers

Database & Infrastructure

PostgreSQL / Supabase SQLite Schema design GitHub Actions

API Integration

Perplexity Sonar API Gemini API Gmail API Google Cloud OAuth

B2B Go-to-Market

Direct outreach Demo-driven sales Customer discovery Pricing model design

Product Development

User research Iterative development Onboarding design Multi-user architecture PRD-driven AI development

Strategic Decision-Making

Market timing Build vs. buy tradeoffs When to stop

What I Took Away

TaylorIQ was a deliberate bet on timing: domain expertise in oncology, early access to new search AI tooling, and a real unmet need I understood firsthand. I built it, tested it, sold it, iterated on it, and made a clear decision to stop when the evidence pointed that direction.

The experience gave me a working understanding of what it takes to build with AI at production quality: not just connecting APIs, but designing evaluation frameworks, managing model drift, building for reliability, and navigating the gap between "this works in a demo" and "this works every week without breaking."

Operating every function solo (product strategy, engineering, customer discovery, sales, and operations) forced me to make real tradeoffs rather than theoretical ones. When you own the entire problem, you learn quickly which decisions compound and which ones don't matter yet.

It also taught me the difference between a product people find valuable and a business that works. Those are not the same thing, and knowing when you have one but not the other is a useful skill.