Every startup has a content problem. There's too much being published, too many signals to track, and not enough time to synthesize what matters. Trends emerge in newsletters, Twitter threads, Reddit threads, blog posts, and earnings calls — often weeks before they surface in mainstream coverage. The founders who catch them early win.
TrendTalks was our attempt to solve that problem with AI. Here's the honest build story: what we were trying to do, how we built it, what actually worked, and where we stumbled. The code is open source at github.com/v12labs-engineering/trendtalks.
Table of Contents
- The Problem We Were Trying to Solve
- Why We Built It Instead of Using Existing Tools
- The Tech Stack Decisions
- How TrendTalks Works Under the Hood
- What Worked
- What Didn't Work
- What We'd Do Differently
- Open Source: Why We Released the Code
- Ready to Build?
The Problem We Were Trying to Solve
I kept getting the same request from founders we work with: "I need to stay on top of what's happening in my space, but I don't have time to read 50 sources every day."
The existing solutions were either too broad (Google Alerts, which surfaces noise, not signal) or too expensive (enterprise intelligence platforms that cost $500+/month and are built for analyst teams, not solo founders). There was nothing in the middle that a pre-seed founder could actually use.
I also noticed that the tools that existed were keyword-based. They'd tell you when a word appeared in content. But trends aren't about keywords — they're about patterns. Multiple sources discussing the same underlying topic in the same week, even if they use different words. A shift in sentiment around an established concept. An emerging theme that doesn't have a name yet.
That's a semantic problem, not a keyword problem. And semantic problems are exactly what embedding models and vector search are good at.
Why We Built It Instead of Using Existing Tools
Three reasons:
1. We needed to validate the approach. Building TrendTalks was partly about proving that this semantic trend detection approach actually worked — not just in theory, but on real content from real sources that founders actually care about.
2. We wanted a portfolio piece. Showing founders that we can build AI products with actual utility, not just demos, matters. TrendTalks is something people can use and evaluate. That's worth more than any case study.
3. It solved our own problem. I run V12 Labs. I need to track what's happening in AI, startups, and developer tooling. I wanted a tool I'd actually use every day, not a toy I'd demo once.
The best internal tools are the ones that solve a real problem for the people building them. That's how you know the product intuition is genuine.
The Tech Stack Decisions
Frontend: Next.js 14 with App Router We could have used something simpler, but Next.js 14's App Router gives us server-side rendering for the trend feed, which matters for load performance. We're fetching from Supabase on the server, rendering the content, and hydrating the interactive parts on the client. Clean, fast, and we know the stack well.
Embedding pipeline: OpenAI text-embedding-3-small We evaluated a few embedding models. OpenAI's text-embedding-3-small hits the right balance of quality, speed, and cost for this use case. At $0.02 per million tokens, embedding several thousand content pieces per day is essentially free at the volume we're running.
Vector database: Pinecone For semantic similarity search at scale, Pinecone is our default. We're storing embeddings with metadata (source, date, content type, source URL) and querying by similarity with metadata filters. The managed service means no infrastructure maintenance.
Trend detection: LangChain + Claude 3.5 Sonnet The trend synthesis layer is where the interesting work happens. We embed incoming content, cluster similar pieces using cosine similarity in Pinecone, and then pass the clusters to Claude 3.5 Sonnet to generate trend summaries. Claude is unusually good at identifying the underlying theme in a cluster of semantically similar content and articulating it in plain language.
Backend: Node.js with a content ingestion queue A lightweight Node.js API handles the content ingestion pipeline. Content from configured RSS feeds, newsletters (via email parsing), and manual URL submissions goes into a queue, gets fetched, chunked, embedded, and stored.
Scheduled processing: Vercel Cron + Edge Functions We run the ingestion pipeline every 4 hours and the trend synthesis every 24 hours. Vercel Cron triggers the jobs, Edge Functions handle the lightweight orchestration, and a separate Node.js worker handles the heavy lifting.
Database: Supabase Postgres via Supabase for structured data (user preferences, source configurations, trend history). We use Supabase's Row Level Security to scope data per user. The Supabase JS client makes the integration trivially easy from Next.js.
How TrendTalks Works Under the Hood
The pipeline has four stages:
Stage 1: Ingestion Content from configured sources (RSS feeds, newsletters, URLs) is fetched, cleaned, and chunked into semantic units (typically 200–500 tokens). Each chunk gets embedded and stored in Pinecone with metadata.
Stage 2: Clustering Every 24 hours, we query Pinecone for the most recent content and cluster pieces by semantic similarity. We use a simple cosine similarity threshold to group pieces that are clearly about the same underlying topic. A cluster of 5+ pieces that appeared in the same 7-day window is a candidate trend signal.
Stage 3: Trend synthesis Candidate clusters go to Claude 3.5 Sonnet with a prompt asking it to:
- Identify the underlying theme (not just the surface topic)
- Summarize why this is appearing now
- Assess whether this is a new trend or an acceleration of an existing one
- Rate the signal strength on a simple scale
Stage 4: Delivery Synthesized trends are stored in Supabase and surfaced in the Next.js UI. Users can browse trends by category, see the source content that drove each trend, and save trends for follow-up.
What Worked
Semantic clustering over keyword matching. This was the core hypothesis and it held up. The semantic approach surfaces trends from different sources using different vocabulary — which is how real trends actually work. We'd catch a trend appearing in a developer newsletter, a VC blog, and a startup Twitter thread, all using different words but clearly about the same underlying shift.
Claude for synthesis. We tried GPT-4o for the trend synthesis and the output was more verbose and less precise. Claude consistently produced cleaner, more specific trend summaries with less noise. The instruction-following quality shows up in this kind of structured synthesis task.
The 24-hour batch cycle. Real-time trend detection sounds appealing but it's premature optimization. Most trends worth tracking develop over days, not hours. The 24-hour synthesis cycle produces better signal-to-noise than trying to update in real time.
Supabase for everything structured. We did not need a separate Postgres instance. Supabase's managed Postgres with RLS handles our data perfectly and the dev experience is excellent.
What Didn't Work
Content quality variance was brutal. RSS feeds and newsletters have wildly different quality levels. Some sources are dense, information-rich, and embed well. Others are thin promotional content that creates noise clusters. We ended up building a source quality scoring system that weights high-quality sources more heavily — which added time we hadn't planned for.
The clustering threshold was hard to calibrate. Too tight a threshold (high similarity required) and you miss trends that express themselves in varied language. Too loose and you cluster unrelated content together. We went through three iterations of threshold tuning before it felt right, and it's still imperfect.
Email newsletter parsing was harder than expected. Some newsletters have complex HTML structures with tracked links, images, and embedded content that breaks simple HTML parsing. Building robust newsletter ingestion took significantly longer than the RSS feed ingestion. We ended up using a combination of JSDOM and custom parsing rules per newsletter domain.
Cost modeling was off. We initially estimated embedding costs conservatively, but the Pinecone storage cost at scale was higher than expected. For a free-tier product, this matters. We optimized by reducing chunk overlap and archiving older embeddings.
What We'd Do Differently
Start with source curation. The quality of TrendTalks is entirely dependent on the quality of the sources you feed it. We underinvested in source curation early on and it showed in the output quality. Next time, I'd spend the first week just identifying and categorizing 50 high-quality sources before writing a line of pipeline code.
Build the evaluation pipeline first. We didn't have a way to measure trend detection quality until we'd shipped v1 and were using it. That meant we were tuning the clustering and synthesis with vibes rather than metrics. Build your eval framework before you build your pipeline.
Design for source diversity. TrendTalks currently ingests text-based content well but struggles with video transcripts and podcast content. These are increasingly important signal sources. I'd build the multi-modal ingestion pipeline from the start rather than retrofitting it.
Open Source: Why We Released the Code
We open-source internal tools because it serves multiple goals simultaneously:
It proves we can build. Any founder evaluating V12 Labs can read the code. They can see how we architect systems, how we write tests, how we structure a production Next.js project. The code is better proof of capability than any testimonial.
It contributes to the ecosystem. The AI tooling community has been enormously valuable to us. Releasing tools that others can build on is how you participate in that value exchange, not just consume it.
It attracts aligned clients. The founders who find TrendTalks interesting, dig into the code, and reach out to us because they want to build something similar — those are exactly the founders we love working with.
The code for TrendTalks is at github.com/v12labs-engineering/trendtalks. Fork it, run it, improve it.
Ready to Build?
If you're building a product in the intelligence, research, or content synthesis space — this is exactly what we do. At V12 Labs, we've shipped AI products using Next.js, LangChain, Pinecone, and Claude across dozens of builds.
$6K flat fee. 15-day delivery. Full source code ownership.
Book a discovery call at v12labs.io and let's build something you'll actually use.