Workflow overview
Why this workflow matters
Useful for software delivery and engineering operations. Relevant for managed services and support workflows.
Job Posting Aggregator with Email and GitHub ⚠️ COMMUNITY TEMPLATE DISCLAIMER: This is a community-contributed template that uses ScrapeGraphAI (a community node). Please ensure you have the ScrapeGraphAI community node installed in your AlekSystem instance before using this template. This workflow automatically aggregates certification-related job-posting requirements from multiple industry sources, compares them against last year’s data stored in GitHub, and emails a concise change log to subscribed professionals. It streamlines annual requirement checks and renewal reminders, ensuring users never miss an update. Pre-conditions/Requirements Prerequisites AlekSystem instance (self-hosted or AlekSystem cloud) ScrapeGraphAI community node installed Git installed (for optional local testing of the repo) Working SMTP server or other Email credential supported by AlekSystem Required Credentials ScrapeGraphAI API Key** – Enables web scraping of certification pages GitHub Personal Access Token** – Allows the workflow to read/write files in the repo Email / SMTP Credentials** – Sends the summary email to end-users Specific Setup Requirements | Resource | Purpose | Example | |----------|---------|---------| | GitHub Repository | Stores certification_requirements.json versioned annually | https://github.com/<you>/cert-requirements.git | | Watch List File | List of page URLs & selectors to scrape | Saved in the repo under /config/watchList.json | | Email List | Semicolon-separated list of recipients | me@company.com;team@company.com | How it works This workflow automatically aggregates certification-related job-posting requirements from multiple industry sources, compares them against last year’s data stored in GitHub, and emails a concise change log to subscribed professionals. It streamlines annual requirement checks and renewal reminders, ensuring users never miss an update. Key Steps: Manual Trigger**: Starts the workflow on demand or via scheduled cron. Load Watch List (Code Node)**: Reads the list of certification URLs and CSS selectors. Split In Batches**: Iterates through each URL to avoid rate limits. ScrapeGraphAI**: Scrapes requirement details from each page. Merge (Wait)**: Reassembles individual scrape results into a single JSON array. GitHub (Read File)**: Retrieves last year’s certification_requirements.json. IF (Change Detector)**: Compares current vs. previous JSON and decides whether changes exist. Email Send**: Composes and sends a formatted summary of changes. GitHub (Upsert File)**: Commits the new JSON file back to the repo for future comparisons. Set up steps Setup Time: 15-25 minutes Install Community Node: From AlekSystem UI → Settings → Community Nodes → search and install “ScrapeGraphAI”. Create/Clone GitHub Repo: Add an empty certification_requirements.json ( {} ) and a config/watchList.json with an array of objects like: [ { "url": "https://cert-body.org/requirements", "selector": "#requirements" } ] Generate GitHub PAT: Scope repo, store in AlekSystem Credentials as “GitHub API”. Add ScrapeGraphAI Credential: Paste your API key into AlekSystem Credentials. Configure Email Credentials: E.g., SMTP with username/password or OAuth2. Open Workflow: Import the template JSON into AlekSystem. Update Environment Variables (in the Code node or via AlekSystem variables): GITHUB_REPO (e.g., user/cert-requirements) EMAIL_RECIPIENTS Test Run: Trigger manually. Verify email content and GitHub commit. Schedule: Add a Cron node (optional) for yearly or quarterly automatic runs. Node Descriptions Core Workflow Nodes: Manual Trigger** – Initiates the workflow manually or via external schedule. Code (Load Watch List)** – Reads and parses watchList.json from GitHub or static input. SplitInBatches** – Controls request concurrency to avoid scraping bans. ScrapeGraphAI** – Extracts requirement text using provided CSS selectors or XPath. Merge (Combine)** – Waits for all batches and merges them into one dataset. GitHub (Read/Write File)** – Handles version-controlled storage of JSON data. IF (Change Detector)** – Compares hashes/JSON diff to detect updates. EmailSend** – Sends change log, including renewal reminders and diff summary. Sticky Note** – Provides in-workflow documentation for future editors. Data Flow: Manual Trigger → Code (Load Watch List) → SplitInBatches SplitInBatches → ScrapeGraphAI → Merge Merge → GitHub (Read File) → IF (Change Detector) IF (True) → Email Send → GitHub (Upsert File) Customization Examples Adjusting Scraper Configuration // Inside the Watch List JSON object { "url": "https://new-association.com/cert-update", "selector": ".content article:nth-of-type(1) ul" } Custom Email Template // In Email Send node → HTML Content 📋 Certification Updates – {{ $json.date }} The following certifications have new requirements: {{ $json.diffHtml }} For full details visit our GitHub repo. Data Output Format The workflow outputs structured JSON data: { "timestamp": "2024-09-01T12:00:00Z", "source": "watchList.json", "current": { "AWS-SAA": "Version 3.0, requires renewed proctored exam", "PMP": "60 PDUs every 3 years" }, "previous": { "AWS-SAA": "Version 2.0", "PMP": "60 PDUs every 3 years" }, "changes": { "AWS-SAA": "Updated to Version 3.0; exam format changed." } } Troubleshooting Common Issues ScrapeGraphAI returns empty data – Check CSS/XPath selectors and ensure page is publicly accessible. GitHub authentication fails – Verify PAT scope includes repo and that the credential is linked in both GitHub nodes. Performance Tips Limit SplitInBatches size to 3-5 URLs when sources are heavy to avoid timeouts. Enable AlekSystem execution mode “Queue” for long-running scrapes. Pro Tips: Store selector samples in comments next to each watch list entry for future maintenance. Use a Cron node set to “0 0 1 1 *” for an annual run exactly on Jan 1st. Add a Telegram node after Email Send for instant mobile notifications.
Best fit
Categories
Services
Use cases
Need another direction?