Back to catalogue

Data Collection

Start from a public seed URL and collect a small same-origin set of pages synchronously with regex filtering, dedupe, and normalized markdown or text output.

mainnettestnet Same-origin bounded crawl with max page and depth limitsRegex include/exclude filtersCanonical or final-URL dedupeMainnet and testnet route families

2 endpoints available

Click any endpoint to see details, example requests, and try it live with your Freighter wallet.

POST /collect/run
mainnet $0.08 USD

Run a bounded same-origin collection starting from one public seed URL.

request.json

Edit this JSON before trying the endpoint. The paid retry uses the exact same payload.

curl
curl -X POST "https://xlm402.com/collect/run" -H "Content-Type: application/json" -d @- <<'JSON'
{
  "seed_url": "https://example.com/blog",
  "scope": "same_origin",
  "max_pages": 5,
  "max_depth": 1,
  "include_patterns": ["/blog/"],
  "exclude_patterns": ["/tag/"],
  "format": "markdown",
  "dedupe": "canonical_url",
  "max_chars_per_page": 30000
}
JSON
  • seed_url: absolute http|https URL
  • scope?: same_origin
  • max_pages?: integer 1..10
  • max_depth?: integer 0..2
  • include_patterns?: string[]
  • exclude_patterns?: string[]
  • format?: text | markdown
  • dedupe?: canonical_url | final_url
  • max_chars_per_page?: integer 1000..50000
POST /testnet/collect/run
testnet $0.08 USD

Run a bounded same-origin collection starting from one public seed URL.

request.json

Edit this JSON before trying the endpoint. The paid retry uses the exact same payload.

curl
curl -X POST "https://xlm402.com/collect/run" -H "Content-Type: application/json" -d @- <<'JSON'
{
  "seed_url": "https://example.com/blog",
  "scope": "same_origin",
  "max_pages": 5,
  "max_depth": 1,
  "include_patterns": ["/blog/"],
  "exclude_patterns": ["/tag/"],
  "format": "markdown",
  "dedupe": "canonical_url",
  "max_chars_per_page": 30000
}
JSON
  • seed_url: absolute http|https URL
  • scope?: same_origin
  • max_pages?: integer 1..10
  • max_depth?: integer 0..2
  • include_patterns?: string[]
  • exclude_patterns?: string[]
  • format?: text | markdown
  • dedupe?: canonical_url | final_url
  • max_chars_per_page?: integer 1000..50000