Web Crawler
To create a web crawler program in Go, follow these steps:
Step 1: Initialize the Go Module
First, create a new directory for your project and initialize the Go module.
mkdir webcrawler
cd webcrawler
go mod init github.com/username/webcrawler
Step 2: Install Dependencies
You will need the goquery
library for parsing HTML.
go get github.com/PuerkitoBio/goquery
Step 3: Create the crawler.go
File
Create a crawler/crawler.go
file to handle the web crawling functionality.
// crawler.go
package crawler
import (
"fmt"
"log"
"net/http"
"github.com/PuerkitoBio/goquery"
)
// Crawl fetches the HTML content of a URL and prints the links found.
func Crawl(url string) {
resp, err := http.Get(url)
if err != nil {
log.Fatalf("Failed to fetch URL: %v", err)
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
log.Fatalf("Failed to fetch URL: %s", resp.Status)
}
doc, err := goquery.NewDocumentFromReader(resp.Body)
if err != nil {
log.Fatalf("Failed to parse HTML: %v", err)
}
doc.Find("a").Each(func(index int, item *goquery.Selection) {
link, _ := item.Attr("href")
fmt.Println(link)
})
}
Step 4: Create the main.go
File
Create a main.go
file to start the web crawler.
// main.go
package main
import (
"fmt"
"os"
"github.com/username/webcrawler/crawler"
)
func main() {
if len(os.Args) != 2 {
fmt.Println("Usage: webcrawler <url>")
os.Exit(1)
}
url := os.Args[1]
fmt.Printf("Crawling URL: %s\n", url)
crawler.Crawl(url)
}
Step 5: Run the Program
Run the program using the go run
command.
go run main.go <url>
Replace <url>
with the URL you want to crawl. This will start the web crawler and print the links found on the specified URL.
Last modified: 08 January 2025