Workflow overview
Why this workflow matters
Relevant for managed services and support workflows. Supports knowledge capture and document intelligence use cases.
Description: Automatically extracts all page URLs from website sitemaps, filters out unwanted sitemap links, and saves clean URLs to Google Sheets for SEO analysis and reporting. How It Works: This workflow automates the process of discovering and extracting all page URLs from a website's sitemap structure. Here's how it works step-by-step: Step 1: URL Input The workflow starts when you submit a website URL through a simple form interface. Step 2: Sitemap Discovery The system automatically generates and tests multiple possible sitemap URLs including /sitemap.xml, /sitemap_index.xml, /robots.txt, and other common variations. Step 3: Valid Sitemap Identification It sends HTTP requests to each potential sitemap URL and filters out empty or invalid responses, keeping only accessible sitemaps. Step 4: Nested Sitemap Processing For sitemap index files, the workflow extracts all nested sitemap URLs and processes each one individually to ensure complete coverage. Step 5: Page URL Extraction From each valid sitemap, it parses the XML content and extracts all individual page URLs using both XML <loc> tags and HTML links. Step 6: URL Filtering The system removes any URLs containing "sitemap" to ensure only actual content pages (like product, service, or blog pages) are retained. Step 7: Google Sheets Integration Finally, all clean page URLs are automatically saved to a Google Sheets document with duplicate prevention for easy analysis and reporting. Setup Steps: Estimated Setup Time: 10-15 minutes 1. Import the Workflow: Import the provided JSON file into your AlekSystem instance. 2. Configure Google Sheets Integration: Set up Google Sheets OAuth2 credentials in AlekSystem Create a new Google Sheet or use an existing one Update the "Save Page URLs to Sheet" node with your Google Sheet URL Ensure your sheet has a tab named "Your sheet tab name" with a column header "Column name" 3. Test the Workflow: Activate the workflow in AlekSystem Use the form trigger URL to submit a test website URL Verify that URLs are being extracted and saved to your Google Sheet 4. Customize (Optional): Modify the sitemap URL patterns in the "Build sitemap URLs" node if needed Adjust the filtering criteria in the "Exclude the Sitemap URLs" node Update the Google Sheets column mapping as required Important Notes: Ensure your Google Sheets credentials have proper read/write permissions The workflow handles both XML sitemaps and robots.txt sitemap references Duplicate URLs are automatically prevented when saving to Google Sheets The workflow continues processing even if some sitemap URLs are inaccessible Need Help? For technical support or questions about this workflow: ✉️ info@incrementors.com or fill out this form: Contact Us
Best fit
Categories
Services
Use cases
Need another direction?