AICollection Help

Web Crawler

To create a web crawler program in Go, follow these steps:

Step 1: Initialize the Go Module

First, create a new directory for your project and initialize the Go module.

mkdir webcrawler cd webcrawler go mod init github.com/username/webcrawler

Step 2: Install Dependencies

You will need the goquery library for parsing HTML.

go get github.com/PuerkitoBio/goquery

Step 3: Create the crawler.go File

Create a crawler/crawler.go file to handle the web crawling functionality.

// crawler.go package crawler import ( "fmt" "log" "net/http" "github.com/PuerkitoBio/goquery" ) // Crawl fetches the HTML content of a URL and prints the links found. func Crawl(url string) { resp, err := http.Get(url) if err != nil { log.Fatalf("Failed to fetch URL: %v", err) } defer resp.Body.Close() if resp.StatusCode != 200 { log.Fatalf("Failed to fetch URL: %s", resp.Status) } doc, err := goquery.NewDocumentFromReader(resp.Body) if err != nil { log.Fatalf("Failed to parse HTML: %v", err) } doc.Find("a").Each(func(index int, item *goquery.Selection) { link, _ := item.Attr("href") fmt.Println(link) }) }

Step 4: Create the main.go File

Create a main.go file to start the web crawler.

// main.go package main import ( "fmt" "os" "github.com/username/webcrawler/crawler" ) func main() { if len(os.Args) != 2 { fmt.Println("Usage: webcrawler <url>") os.Exit(1) } url := os.Args[1] fmt.Printf("Crawling URL: %s\n", url) crawler.Crawl(url) }

Step 5: Run the Program

Run the program using the go run command.

go run main.go <url>

Replace <url> with the URL you want to crawl. This will start the web crawler and print the links found on the specified URL.

Last modified: 08 January 2025