Web Scraper
To create a web scraper program in Go, follow these steps:
Step 1: Initialize the Go Module
First, create a new directory for your project and initialize the Go module.
mkdir webscraper
cd webscraper
go mod init github.com/username/webscraper
Step 2: Install Dependencies
You will need the goquery
library for parsing HTML.
go get github.com/PuerkitoBio/goquery
Step 3: Create the scraper.go
File
Create a scraper/scraper.go
file to handle the web scraping functionality.
// scraper.go
package scraper
import (
"fmt"
"log"
"net/http"
"github.com/PuerkitoBio/goquery"
)
// Scrape fetches the HTML content of a URL and prints the text of the specified elements.
func Scrape(url string, selector string) {
resp, err := http.Get(url)
if err != nil {
log.Fatalf("Failed to fetch URL: %v", err)
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
log.Fatalf("Failed to fetch URL: %s", resp.Status)
}
doc, err := goquery.NewDocumentFromReader(resp.Body)
if err != nil {
log.Fatalf("Failed to parse HTML: %v", err)
}
doc.Find(selector).Each(func(index int, item *goquery.Selection) {
text := item.Text()
fmt.Println(text)
})
}
Step 4: Create the main.go
File
Create a main.go
file to start the web scraper.
// main.go
package main
import (
"fmt"
"os"
"github.com/username/webscraper/scraper"
)
func main() {
if len(os.Args) != 3 {
fmt.Println("Usage: webscraper <url> <selector>")
os.Exit(1)
}
url := os.Args[1]
selector := os.Args[2]
fmt.Printf("Scraping URL: %s with selector: %s\n", url, selector)
scraper.Scrape(url, selector)
}
Step 5: Run the Program
Run the program using the go run
command.
go run main.go <url> <selector>
Replace <url>
with the URL you want to scrape and <selector>
with the CSS selector of the elements you want to extract. This will start the web scraper and print the text of the specified elements found on the specified URL.
Last modified: 08 January 2025