I build web scraping, automation systems, and data products β from scrapers that handle dynamic sites at scale to full production platforms that serve structured data via REST APIs.
- High-performance web scraping (dynamic + static sites)
- Multi-source data aggregation pipelines
- Dataset cleaning, enrichment & normalization
- Secure REST API development
- Automated data collection systems
Production news aggregation platform that indexes 6,900+ articles per run from 250+ live sources including BBC, Reuters, The Guardian, TechCrunch, and more.
Built with a 4-tier RSS fallback chain, Playwright-powered scraper for paywalled and dynamic sources, and an hourly APScheduler pipeline. Features full-text search, keyword alerts, weekly digest emails, and a REST API.
FastAPI Β· PostgreSQL Β· Playwright Β· Next.js Β· APScheduler Β· Resend Β· Railway Β· Supabase
Production-grade OSINT dataset marketplace aggregating, enriching, and distributing structured social media data across Reddit, YouTube, GitHub, and Medium.
Features a scalable scraping architecture, automated enrichment pipelines, and secure subscription-based API delivery. Datasets cover AI training, financial sentiment, brand monitoring, and market intelligence.
FastAPI Β· PostgreSQL Β· Next.js Β· Paddle Β· JWT Β· AWS S3
Multi-page Python scraper extracting quotes, authors, and tags from Goodreads into structured datasets.
Python Β· BeautifulSoup Β· CSV
Scraping & Automation
- Playwright (Python & Node)
- Selenium
- BeautifulSoup
- Requests / HTTPX
- Asyncio
Data Processing
- Pandas Β· NumPy
- CSV / JSON / JSONL / Parquet exports
- lxml (XML/RSS repair)
Backend
- FastAPI
- PostgreSQL (asyncpg)
- JWT Authentication
- APScheduler
- Resend (transactional email)
Frontend
- Next.js 15 (App Router)
- Tailwind CSS
- TypeScript
Infrastructure
- Railway (API hosting)
- Vercel (frontend)
- Supabase (managed PostgreSQL)
- Docker
- π pulseaggregator.com
- π socialintel.io
- π¦ x.com/hexsyro