Summary
A sold-listings intelligence tool for eBay phone resellers. Given a search term, it gathers completed sale data — title, price, condition, sell-through velocity — into a SQLite database, applies client-side filters to strip out accessories and noise, and surfaces buy-under prices and best-margin opportunities. Used daily by a small group of resellers as their sourcing decision layer.
The Problem
Phone reselling lives or dies on sourcing prices. The question every reseller asks fifty times a day is: "What's this exact phone actually selling for, in this exact condition, right now?" eBay's sold-listings UI answers that, but slowly and imprecisely — it conflates accessories with phones, mixes conditions, and doesn't tell you sell-through velocity. Doing the analysis by hand on every potential buy is the bottleneck.
I needed a tool that could pull sold listings at scale, filter out the obvious noise (cases, chargers, screens), aggregate by similar product, and tell me — and a handful of other resellers — what to buy under.
The Approach
The system is a CLI-first Python tool backed by SQLite (~13K listings stored). Selenium handles the actual collection because the marketplace is hostile to lighter scraping — visible Chrome, deliberate page delays, anti-bot evasion, max 5 pages per term. Once the data is in SQLite, all analysis is SQL queries.
Search term → Collector (Selenium, page-delayed, anti-bot)
↓
Client-side filters (exclude/must-have/price range)
↓
SQLite (~13K listings, deduped)
↓
CLI analyzer → buy-under prices, best-margin signals
There's also a lightweight web dashboard for the multi-user remote deploy on Railway — three other resellers use it for their own sourcing — but the primary interface I use locally is the CLI plus direct SQL queries from a developer console. Splitting the daily-driver UX (CLI) from the multi-user UX (web dashboard) kept the tool fast for me without forcing remote users into a CLI.
What I Built
- Tiered collector pipeline — primary Selenium collector for reliability, faster HTTP-based collector for volume scans, batch orchestrator for multi-term runs
- Client-side filter engine — exclude keywords (case, screen, charger), must-have keywords (unlocked, 128GB, etc.), price-range filters applied during collection so only relevant rows hit the DB
- SQLite database with deduplication — ~13K listings, indexed for fast lookup, queryable directly from the CLI or a dev console
- CLI analyzer — groups similar products, calculates price distribution, surfaces "buy under" thresholds and top-margin opportunities
- Multi-user Railway dashboard — lightweight web dashboard with shared auth for 4 active users, frozen as a stable artifact while the local tool keeps evolving
- Sell-through velocity collector — tier-2 path that looks for "X sold" badges on active listings to catch what's moving right now, not just what already sold
Engineering Highlights
- Collection-time filtering, not post-hoc. Excluding accessories during the collection pass instead of after the fact dropped DB pollution and analysis runtime substantially. The filter rules live in one module and can be tweaked per search term without redeploying.
- CLI + AI-assisted developer console as the primary interface. Replaced what would otherwise be a custom dashboard with raw SQLite queries from a developer console. Asking "what are the best-margin phones in the database right now?" in plain language and getting an answer in seconds is faster than any UI I could have built.
- Two-track stability model. The Railway dashboard is frozen — same Dockerfile, same code, no churn for the existing users. All experimentation happens in the local CLI tool. Kept multi-user reliability without slowing down personal iteration.
- Anti-bot pragmatism. Visible Chrome with deliberate delays beats headless-with-tricks for this marketplace. Boring is reliable. The collector hasn't been blocked at any meaningful frequency, and the tradeoff (slower runs) is acceptable for the use case.
Outcome
Three other resellers actively use the Railway deployment for their own sourcing decisions. My personal sourcing decisions go through the CLI daily. Database has accumulated a meaningful corpus that gets better with every search.
Tech footprint
- Backend — Python CLI + Selenium + httpx, optional lightweight web layer
- Data — SQLite (~13K rows, indexed)
- Browser automation — visible Chrome with explicit waits and anti-bot evasion
- Deploy — Docker image on Railway with shared auth, frozen for stability
- Interface — CLI as primary driver; lightweight web dashboard for remote multi-user access; direct SQL queries from a developer console for ad-hoc analysis