Workflow overview
Why this workflow matters
Relevant for managed services and support workflows. Supports knowledge capture and document intelligence use cases.
Decodo Scraper API Workflow Template (AlekSystem Automation Amazon Book Purchase Report) Watch the demo video below: > This workflow demos how to use Decodo Scraper API to crawl any public web page (headless JS, device emulation: mobile/desktop/tablet), extract structured product data from the returned HTML, generate a purchase-ready report, and automatically deliver it as a Google Doc + PDF to Slack/Drive. 🚀 Try Decodo — Web Scraping & Data API (Coupon: TRUNG) Decodo is a powerful public data access platform offering managed web scraping APIs and proxy infrastructure to collect structured web data at scale. It handles proxies, anti-bot protection, JavaScript rendering, retries, and global IP rotation—so you can focus on data, not scraping complexity. Why Decodo Managed Web Scraping API with anti-bot bypass & high success rates Works with JS-heavy sites; outputs JSON/HTML/CSV Easy integration (Python, Node.js, cURL) for eCommerce, SERP, social & general web data 🎟️ Special Discount Use coupon TRUNG to get the Advanced Scraping API plan — 23,000 requests for $5. Who’s it for Creators / Analysts** who need quick product lists (books, gadgets, etc.) with prices/ratings. Ops & Marketing teams** building weekly “top picks” reports. Engineers** validating the Decodo Scraper API + LLM extraction pattern before scaling. How it works / What it does Trigger – Manually run the workflow. Edit Fields (manual) – Provide inputs: targetUrl (e.g., an Amazon category/search/listing page) deviceType (desktop | mobile | tablet) Optional: maxItems, notes, reportTitle, reportOwner Scraper API Request (HTTP Request → POST) Calls Decodo Scraper API with: URL to crawl, headless JS enabled Device emulation (UA + viewport) Optional waitFor / executeJS to ensure late-loading content is captured HTML Response Parser (Code/Function or HTML node) Pulls the HTML string from Decodo response and normalizes it (strip scripts/styles, collapse whitespace). Product Analyzer Agent (LLM + Structured Output Parser) Prompts an LLM to extract structured “book” objects from the HTML: The Structured Output Parser enforces a strict JSON schema and drops malformed items. Build 📚 Book Purchase Report (Code/LLM) Converts the JSON array into a Markdown (or HTML) report with: Executive summary (top picks, average price/rating) Table of items (rank, title, author, price, rating, link) “Recommended to buy” shortlist (rules configurable) Notes / owner / timestamp Configure Google Drive Folder (manual) Choose/create a Drive folder for output artifacts. Create Document File (Google Docs API) Creates a Doc from the generated Markdown/HTML. Convert Document to PDF (Google Drive export) Exports the Doc to PDF. Upload report to Slack Sends the PDF (and/or Doc link) to a chosen Slack channel with a short summary. How to set up 1 Prerequisites AlekSystem** (self-hosted or Cloud) Decodo Scraper API** key OpenAI (or compatible) API key** for the Analyzer Agent Google Drive/Docs** credentials (OAuth2) Slack** Bot/User token (files:write, chat:write) 2 Environment variables (recommended) DECODO_API_KEY OPENAI_API_KEY DRIVE_FOLDER_ID (optional default) SLACK_CHANNEL_ID 3 Nodes configuration (high level) Edit Fields (Set node) Scraper API Request (HTTP Request → POST) HTML Response Parser (Code node) Product Analyzer Agent Build Book Purchase Report (Code/LLM) Create Document File Convert to PDF Upload to Slack Requirements Decodo**: Active API key and endpoint access. Be mindful of concurrency/rate limits. Model**: GPT-4o/4.1-mini or similar for reliable structured extraction. Google**: OAuth client (Docs/Drive scopes). Ensure AlekSystem can write to the target folder. Slack**: Bot token with files:write + chat:write. How to customize the workflow Target site: Change targetUrl to any **public page (category, search, or listing). For other domains (not Amazon), tweak the LLM guidance (e.g., price/label patterns). Device emulation**: Switch deviceType to mobile to fetch mobile-optimized markup (often simpler DOMs). Late-loading pages**: Adjust waitFor.selector or use waitUntil: "networkidle" (if supported) to ensure full content loads. Client-side JS**: Extend executeJS if you need to interact (scroll, click “next”, expand sections). You can also loop over pagination by iterating URLs. Extraction schema**: Add fields (e.g., discount_percent, bestseller_badge, prime_eligible) and update the Structured Output schema accordingly. Filtering rules**: Modify recommendation logic (e.g., min ratings count, price bands, languages). Report branding**: Add logo, cover page, footer with company info; switch to HTML + inline CSS for richer Docs formatting. Destinations**: Besides Slack & Drive, add Email, Notion, Confluence, or a database sink. Scheduling: Add a **Cron trigger for weekly/monthly auto-reports.
Best fit
Categories
Services
Use cases
Need another direction?