《Go in Action》第二章读书笔记

本文为《Go in Action》的第二章读书笔记。
第二章主要是介绍了一个go语言的示例应用。

Q: 这个应用干了啥？

A: 简答来说，就是将配置文件里面的rss源读取出来，然后把源的内容拉取下来，在各个源的内容里面搜索一个文字，显示结果。

Q: 文件结构？

A: 文件结构如下：

sample/  #目录结构
├── data
│   └── data.json #存放的rss源地址，以json的格式
├── main.go       #程序入口main文件
├── matchers      #匹配程序，rss是一种源类型，后续可以扩展
│   └── rss.go
└── search        #主要逻辑代码
    ├── default.go
    ├── feed.go
    ├── match.go
    └── search.go

后续会对各个文件进行分析：

main文件

首先看看内容：

package main

import (
    "log"
    "os"
    _ "sample/matchers"
    "sample/search"
)

func init() {
    log.SetOutput(os.Stdout)
}

// main is the entry point for the program
func main() {
    search.Run("president")
}

几个点：

每个应用都有入口函数，这里就是main函数，同时注意的是main函数在package main下面才行，其余的package是不行的，否则代码不会被编译成为可执行文件
第6行的“_"，表明引入了package，但是没有显示使用。这里是为了触发该package的init函数
init函数先于main函数执行
log的默认output是stderr，这里在init函数里面设置成了Stdout
import的包，编译器会从GOROOT和GOPATH两个环境变量的值表示的目录下面去找
main函数里面调用了search.Run函数，传入了president作为搜索字符串

data.json

这里面包含了rss的地址和名字：

[
    {
        "site" : "npr",
        "link" : "http://www.npr.org/rss/rss.php?id=1001",
        "type" : "rss"
    },
    {
        "site" : "npr",
        "link" : "http://www.npr.org/rss/rss.php?id=1008",
        "type" : "rss"
    },
    {
        "site" : "npr",
        "link" : "http://www.npr.org/rss/rss.php?id=1006",
        "type" : "rss"
    },
    {
        "site" : "npr",
        "link" : "http://www.npr.org/rss/rss.php?id=1007",
        "type" : "rss"
    },
    {
        "site" : "npr",
        "link" : "http://www.npr.org/rss/rss.php?id=1057",
        "type" : "rss"
    },
    {
        "site" : "npr",
        "link" : "http://www.npr.org/rss/rss.php?id=1021",
        "type" : "rss"
    },
    {
        "site" : "npr",
        "link" : "http://www.npr.org/rss/rss.php?id=1012",
        "type" : "rss"
    },
    {
        "site" : "npr",
        "link" : "http://www.npr.org/rss/rss.php?id=1003",
        "type" : "rss"
    },
    {
        "site" : "npr",
        "link" : "http://www.npr.org/rss/rss.php?id=2",
        "type" : "rss"
    },
    {
        "site" : "npr",
        "link" : "http://www.npr.org/rss/rss.php?id=3",
        "type" : "rss"
    } 
  ]

其为一个json数组，每个元素有site、link和type三个字段。

feed.go

feed表示的就是一个rss的源。看看源码：

package search

import (
    "encoding/json"
    "log"
    "os"
)

const dataFile = "data/data.json"

// Feed contains information we need to process a feed.
type Feed struct {
    Name string `json:"site"`
    URI  string `json:"link"`
    Type string `json:"type"`
}

// RetrieveFeeds reads and unmarshals the feed data file
func RetrieveFeeds() ([]*Feed, error) {
    file, err := os.Open(dataFile)
    if err != nil {
        return nil, err
    }
    defer file.Close()

    var feeds []*Feed
    err = json.NewDecoder(file).Decode(&feeds)

    log.Printf("Retrieve feeds result: %v\n", feeds)
    return feeds, err
}

如下：

其包名为search，与文件夹的名字一致
引入了encoding/json，作为json解析使用
引入了os，用于读取文件
使用const创建了一个常量，注意这里是=，不是:=
定义了一个类型Feed，首字母大写，表示是可以被外部使用的
Feed的每个field都有tag，用于json库里field同json对象的属性的对应
定义了RetrieveFeeds函数，用于获取feed，该函数输入无，输出Feed指针slice和一个error
通过os.Open打开文件
通过defer，达到在函数返回之后立即执行file.close操作。

The keyword defer is used to schedule a function call to be executed right after a function returns. It’s our responsibility to close the file once we’re done with it. By using the keyword defer to schedule the call to the close method, we can guarantee that the method will be called.This will happen even if the function panics and terminates unexpectedly.

就算函数非正常终止了，也会执行该defer的操作。

通过json.NewDecoder(file)创建一个Decoder，然后调用Decode方法把json文件里面的值写入到feeds对象中

default.go

先看源码:

package search

// defaultMatcher implements the default matcher.
type defaultMatcher struct{}

func init() {
    var matcher defaultMatcher
    Register("default", matcher)
}

// Search implements the behavior for the default matcher.
func (m defaultMatcher) Search(feed *Feed, searchTerm string) ([]*Result, error) {
    return nil, nil
}

以下：

由于在search文件夹下面，所以package还是search
创建了类型defaultMatcher，小写开头，表示外部不可用
初始化方法中调用了Register，注册了default类型的matcher
注意Register方法的调用没有import，因为大家都在同一个package下面
定义Search方法，此为defaultMatcher的方法。该方法签名与match.go文件中定义的Matcher接口方法一致，即认为类型defaultMatcher实现了Matcher接口

match.go

先看源码：

package search

import "log"

// Result contains the result of a search
type Result struct {
    Field   string
    Content string
}

// Matcher defiens the behavior required by types that want
// to implement a new search type
type Matcher interface {
    Search(feed *Feed, searchTerm string) ([]*Result, error)
}

// Match is launched as a goroutine for each individual feed to run
// searches concurrently
func Match(matcher Matcher, feed *Feed, searchTerm string, results chan<- *Result) {
    searchResults, err := matcher.Search(feed, searchTerm)
    if err != nil {
        log.Println(err)
        return
    }

    for _, result := range searchResults {
        results <- result
    }
}

// Display writes results to the console window as they
// are received by the individual goroutines
func Display(results chan *Result) {
    // The channel blocks until a result is written to the channel.
    // Once the channel is closed the for loop terminates.
    for result := range results {
        log.Printf("%s:\n%s\n\n", result.Field, result.Content)
    }
}

如下：

创建Result类型，作为结果，包含了两个string类型的属性
创建Matcher接口，定义搜索行为，该接口输入为一个feed和一个搜索字符串，返回为result数组和error。为什么是数组？因为可能在一个feed的内容中搜到多处出现搜索字符串的地方
定义match函数，调用参数matcher中的Search方法，返回Result数组，遍历数组，将每个Result发送到channel results中
定义Display函数，遍历results channel，打印result的内容
注意:=符号。该符号表示同时定义并初始化变量

search.go

先看源码：

package search

import (
    "log"
    "sync"
)

var matchers = make(map[string]Matcher)

// Run performs
func Run(searchTerm string) {
    feeds, err := RetrieveFeeds()
    if err != nil {
        log.Fatal(err)
    }

    // Create an unbuffered channel to receive match results to display
    results := make(chan *Result)

    // Setup a wait group so we can process all the feeds
    var waitGroup sync.WaitGroup

    // Set the number of go routines we need to wait for while
    // they process the individual feeds.
    waitGroup.Add(len(feeds))

    // Launch a goroutine for each feed to find the results.
    for _, feed := range feeds {
        // Retrieve a matcher for the search.
        matcher, exists := matchers[feed.Type]
        if !exists {
            matcher = matchers["default"]
        }

        // Launch the goroutine to perform the search
        go func(matcher Matcher, feed *Feed) {
            Match(matcher, feed, searchTerm, results)
            waitGroup.Done()
        }(matcher, feed)
    }

    // Launch a goroutine to monitor when all the work is done.
    go func() {
        waitGroup.Wait()
        //Close the channel to signal to the Display
        // function that we can exit the program
        close(results)
    }()

    Display(results)
}

// Register is called to register a matcher for use by the program.
func Register(feedType string, matcher Matcher) {
    if _, exists := matchers[feedType]; exists {
        log.Fatalln(feedType, "Matcher already registered")
    }

    log.Println("Register", feedType, "matcher")
    matchers[feedType] = matcher
}

如下：

var matchers = make(map[string]Matcher)，创建了一个map，其key为string类型，值为Matcher类型。注意Matcher类型在match.go里面进行了定义，为一个interface。这个matchers定义在了函数的外面，是一个package level的变量。在Register函数里面进行了键值对的添加
之后定义了Run函数，即在main里面进行调用的那个方法
调用RetrieveFeeds函数获取Feeds
log.Fatal会在结束程序前打印信息
results := make(chan *Result)，创建Result Channel
创建waitGroup。其用于计数，当每个goroutine完成任务之后，waitGroup中保存的值减一。
遍历Feeds，根据feed的类型从matchers map中获取对应的Matcher
使用go func(){}()启动goroutine，为每一个feed启动一个goroutine。此处的函数为一个匿名函数。这个时候的匿名函数为一个closure，然后多个closure持有了同一个变量results。
goroutine里面调用了match.go里面的Match方法，进行字符串的搜索。搜索完成之后调用waitGroup.Done()方法
新建一个goroutine，当waitGroup.Wait()执行之后，close掉results这个channel
调用Display()函数，传入Results channel。Display函数定义在match.go文件中

rss.go

源码：

package matchers

import (
    "encoding/xml"
    "errors"
    "fmt"
    "log"
    "net/http"
    "regexp"
    "sample/search"
)

type (
    // item defines the fields associated with the item tag
    // in the rss document.
    item struct {
        XMLName     xml.Name `xml:"item"`
        PubDate     string   `xml:"pubDate"`
        Title       string   `xml:"title"`
        Description string   `xml:"description"`
        Link        string   `xml:"link"`
        GUID        string   `xml:"guid"`
        GeoRssPoint string   `xml:"georss:point"`
    }

    // image defines the fields associated with the image tag
    // in the rss document.
    image struct {
        XMLName xml.Name `xml:"image"`
        URL     string   `xml:"url"`
        Title   string   `xml:"title"`
        Link    string   `xml:"link"`
    }

    // channel defines the fields associated with the channel tag
    // in the rss document.
    channel struct {
        XMLName        xml.Name `xml:"channel"`
        Title          string   `xml:"title"`
        Description    string   `xml:"description"`
        Link           string   `xml:"link"`
        PubDate        string   `xml:"pubDate"`
        LastBuildDate  string   `xml:"lastBuildDate"`
        TTL            string   `xml:"ttl"`
        Language       string   `xml:"language"`
        ManagingEditor string   `xml:"managingEditor"`
        WebMaster      string   `xml:"webMaster"`
        Image          image    `xml:"image"`
        Item           []item   `xml:"item"`
    }

    // rssDocument defines the fields associated with the rss document.
    rssDocument struct {
        XMLName xml.Name `xml:"rss"`
        Channel channel  `xml:"channel"`
    }
)

// rssMatcher implements the Matcher interface
type rssMatcher struct{}

// init registers the matcher with the program
func init() {
    var matcher rssMatcher
    log.Println("register rss matcher")
    search.Register("rss", matcher)
}

func (m rssMatcher) Search(feed *search.Feed, searchTerm string) ([]*search.Result, error) {
    var results []*search.Result

    log.Printf("Search Feed Type[%s] Site[%s] For URI[%s]\n", feed.Type, feed.Name, feed.URI)

    // Retrieve the data to search.
    document, err := m.retrieve(feed)
    if err != nil {
        return nil, err
    }

    for _, channelItem := range document.Channel.Item {
        // Check the title for the search term.
        matched, err := regexp.MatchString(searchTerm, channelItem.Title)
        if err != nil {
            return nil, err
        }

        // If we found a match save the result
        if matched {
            results = append(results, &search.Result{
                Field:   "Title",
                Content: channelItem.Title, // 注意此处的逗号哦，很容易遗忘的
            })
        }

        // Check the description for the search Item
        matched, err = regexp.MatchString(searchTerm, channelItem.Description)
        if err != nil {
            return nil, err
        }

        if matched {
            results = append(results, &search.Result{
                Field:   "Description",
                Content: channelItem.Description,
            })
        }
    }
    return results, nil
}

func (m rssMatcher) retrieve(feed *search.Feed) (*rssDocument, error) {
    if feed.URI == "" {
        return nil, errors.New("No rss feed uri provided")
    }

    resp, err := http.Get(feed.URI)
    if err != nil {
        return nil, err
    }

    defer resp.Body.Close()

    if resp.StatusCode != 200 {
        return nil, fmt.Errorf("HTTP Response Error %d\n", resp.StatusCode)
    }

    var document rssDocument
    err = xml.NewDecoder(resp.Body).Decode(&document)
    return &document, err
}

如下：

使用type ()定义了四个类型：rssDocument、channel、image、item。rssDocument包括了channel，channel包括了image和item数组。
定义rssMatcher类型，后续该类型实现了Matcher的Search方法
定义init函数，其中调用Register函数进行注册
retrieve函数首字母小写，并没有导出
Search函数，首先调用retrieve函数，发送http请求，解析返回，将数据组装为rssDocument，然后遍历其channel下的item，根据搜索字符串，使用正则表达式进行解析，对于解析到的结果，构建成result，append到results数组
最终返回results数组

总结

有几点：

所有的文件都是放在$GOPATH/src/sample文件夹里面。放这里面import的时候才能用import sample/...
运行的时候cd到sample目录，使用go run .
所有在search文件夹下面的文件，都是属于package search
总体流程就是：
- 各个package初始化的时候调用init方法，init方法调用search.go中的Register方法，注册matcher到一个map里面
- main函数之后调用search.go中的Run方法
- 从data.json文件中获取Feeds
- 遍历Feeds，为每一个Feed开一个goroutine
- 调用Rss类型的matcher进行搜索，将结果写到Results channel中
- 调用match.go中的Display函数显示结果
- Display函数里面对channel进行遍历，会让channel进行block，此时也就将运行main的goroutine block住，也就不会直接退出。当channel被close的时候，遍历才会结束，此时main函数退出。如果main的goroutine不block住的话，那当main退出之后，所有的其他goroutine也会退出。