Case — Rocester (Intelligent Digital Catalog of Industrial Parts)
Type: AutoU (client: Rocester — industrial parts) Role: Full Stack / AI Developer — involved in founding the project (architecture and initial functional base) Status: Live (in production); further evolution on the roadmap Stack: Python, FastAPI, React + TypeScript, PostgreSQL (pgvector enabled), Gemini Vision (PDF extraction), JWT with roles, Docker Compose, ReportLab (roadmap), full-text search (RAG)
Confidentiality note: AutoU client project — validate what can be made public before exposing name/details.
Context and problem
Rocester's salespeople looked up parts catalogs in scanned supplier PDFs: finding a part, checking the OEM code and putting together a quote was slow and error-prone. Digitizing the catalogs manually was unfeasible given the volume.
Solution
A platform that turns catalog PDFs into an AI-searchable database:
- PDF ingestion pipeline with Gemini Vision: catalog upload → structured part extraction (code, OEM, description, category, equipment) with
temperature=0.1, JSON mode and retries - Confidence score per part (0.0–1.0) with textual reasoning: extracted parts enter as pending; a review panel with bulk actions (approve/reject) for efficient human curation
- Search and filters by code, OEM, description, category and equipment
- Quotes: cart, automatic code (
ORC-AAMM-XXXX), history - AI assistant (chat): free-text part suggestions with RAG over the real catalog
Architecture and technical decisions
- Dependency inversion in the AI layer: the domain doesn't know the LLM provider — swapping Gemini for another model doesn't touch the business logic (documented in the README architecture)
- Explicit part lifecycle: extracted → pending review → approved/rejected — AI proposes, human disposes; the public catalog only contains curated data
- Extraction with anti-hallucination controls: low temperature, strict JSON output, retries and a confidence score with a justification per item
- JWT with roles (
admin/seller): sellers search and quote; admins import and curate - pgvector already enabled in the database for evolving toward semantic search via embeddings (documented roadmap, including asynchronous ingestion processing and PDF quote export)
- Technical-commercial proposal and project plan versioned alongside the code
Challenges and solutions
- Heterogeneous supplier PDFs (tables, images, varying layouts): multimodal Vision + bulk human review balance automation and accuracy
- Seller trust in the data: score + reasoning per part make "how sure is the AI" visible instead of hidden
Results and impact
- PDF catalogs become a searchable base with no manual data entry [number of parts/catalogs TO CONFIRM]
- Quotes assembled in the same flow as the search — less tool-switching for the seller
- Bulk curation reduces the human cost of validating AI extraction