Bookdown例子：《Tidy Modeling with R》

年末了，秋收冬藏，该静下来读读书了。手头有几本书可读，《唐诗三百首》年初开卷的，现在还没有读完，因为今年大部分时间都用来研究技术了，有点惭愧，要读完。《弱监督学习实用指南》是本很不错的书，可以探索解决缺少标签数据的机器学习问题，很有实用价值。《图神经网络：基础、前沿与应用》是个大有可为的研究方向，在税务领域应该很有落地应用价值。这两本书明年可以深入学习研究。
上两篇文章，一篇提出了开源多层分布式B/S结构数据分析系统解决方案，一篇打通了Java→gRPC→Python的路径。所以在年末收尾的闲暇时间里，可以做点收宫的事，一是再花几天写个Java→gRPC→Python调用业务功能的例子，完成该路径。二是剩余的时间可以好好了解一下R语言机器学习的最新进展。上次读《R语言机器学习》已是几年之前了，这几年学Python和Shiny(去年主要是深度学习，Python与R，今年主要是Shiny)，大概忘得差不多了。所以需要一本新书复习一下。前些天人们提到了《Tidy Modeling with R》这本书（源码），跟《Mastering Shiny》这本书是个不错的搭配，觉得可以读一下，就先把它的Bookdown源码跑起来，后面再慢慢看。Rstudio(现已更名为Posit)很多书都是用bookdown包（一个Rmarkdown的扩展包）写的，好处就是源码很容易运行验证，运行一下就可以生成整本书，包括在线的HTML版，以及离线的PDF、Word版，有什么修订直接改源码，然后重新运行生成全书即可。这可能是比较适合IT领域的写作方式，代码、结果与说明及正文比较方便的组合到一起。
在Python的生态里，人们用Jupyter Notebook写作，也像Rmarkdown一样是代码、结果与文字组合在一起。Bookdown的扩展是增加了章节的处理，全书各章节自动编号，图表也可以自动编号，相互之间可以交叉引用，详细请参阅《bookdown: Authoring Books and Technical Documents with R Markdown》一书。当然，用bookdown写Python等的书也是可以的，Rmarkdown支持Python等几十种语言。
本篇稍为讲一下在Windows及Linux上跑通该书的源码。
R与Python对于我来说就是左手和右手，各有长处和适用的场景。所以要平衡分配好时间和精力。大雪已过，快到冬至了。一年之中只有冬夏二至是纯阴或纯阳，然后是阴尽阳生，阳尽阴生，生生不息。太极是两仪的统一，技术领域大概也是这样吧，前文提出的开源解决方案就整合了5种平台和8种语言，也不局限于开源产品。写下这段文字是希望作个沟通，避免人们不必要的误会与不愉快，岁月静好，和谐共生是人们都会喜欢的。
先作个简短的说明。
一、安装依赖包
书中的《Using Code Examples》一节说明了该书的运行环境是R-4.2.2与pandoc 2.19.2，项目主页说明了用下面的命令安装运行环境：

>install.packages("remotes")
>remotes::install_github("tidymodels/TMwR")

不过其中有几个包还是需要手工安装一下，并且我Winodws及Linux虚拟主机上均为R-4.1，我暂时不想升级到4.2，因为上面跑了很多例子。于是写段小程序按该节列出的包安装一下，一共75个包。

# install packages needed to build the book.
packages <-"applicable (0.1.0, RSPM), av (0.8.2, RSPM), baguette (1.0.0, RSPM), beans (0.1.0, RSPM),
  bestNormalize (1.8.3, RSPM), bookdown (0.30, RSPM), broom (1.0.1, RSPM), censored (0.1.1.9001, Github),
  corrplot (0.92, RSPM), corrr (0.4.4, RSPM), Cubist (0.4.1, RSPM), DALEXtra (2.2.1, RSPM), dials (1.1.0, RSPM),
  dimRed (0.2.6, RSPM), discrim (1.0.0, RSPM), doMC (1.3.8, RSPM), dplyr (1.0.10, RSPM), earth (5.3.1, RSPM),
  embed (1.0.0, RSPM), fastICA (1.2-3, RSPM), finetune (1.0.1, RSPM), forcats (0.5.2, RSPM),
  ggforce (0.4.1, RSPM), ggplot2 (3.4.0, RSPM), glmnet (4.1-4, RSPM), gridExtra (2.3, RSPM), infer (1.0.3, RSPM),
  kableExtra (1.3.4, RSPM), kernlab (0.9-31, RSPM), kknn (1.3.1, RSPM), klaR (1.7-1, RSPM), knitr (1.40, RSPM),
  learntidymodels (0.0.0.9001, Github), lime (0.5.3, RSPM), lme4 (1.1-31, RSPM), lubridate (1.9.0, RSPM),
  mda (0.5-3, RSPM), mixOmics (6.20.0, Bioconduc~), modeldata (1.0.1, RSPM), multilevelmod (1.0.0, RSPM),
  nlme (3.1-160, CRAN), nnet (7.3-18, CRAN), parsnip (1.0.2.9005, Github), patchwork (1.1.2, RSPM),
  pillar (1.8.1, RSPM), poissonreg (1.0.1, RSPM), prettyunits (1.1.1, RSPM), probably (0.1.0, RSPM),
  pscl (1.5.5, RSPM), purrr (0.3.5, RSPM), ranger (0.14.1, RSPM), recipes (1.0.3, RSPM), rlang (1.0.6, RSPM),
  rmarkdown (2.18, RSPM), rpart (4.1.19, CRAN), rsample (1.1.0, RSPM), rstanarm (2.21.3, RSPM),
  rules (1.0.0, RSPM), sessioninfo (1.2.2, RSPM), stacks (1.0.0, RSPM), stringr (1.4.1, RSPM),
  svglite (2.1.0, RSPM), text2vec (0.6.2, RSPM), textrecipes (1.0.1, RSPM), themis (1.0.0, RSPM),
  tibble (3.1.8, RSPM), tidymodels (1.0.0, RSPM), tidyposterior (1.0.0, RSPM), tidyverse (1.3.2, RSPM),
  tune (1.0.1, RSPM), uwot (0.1.14, RSPM), workflows (1.1.0, RSPM), workflowsets (1.0.0, RSPM),
  xgboost (1.6.0.1, RSPM), yardstick (1.1.0, RSPM)"

packages2<- trimws(unlist(strsplit(packages,"),")))
packages3<-strsplit(as.vector(packages2)," ")
packagenames <- data.frame(matrix(unlist(packages3), nrow=length(packages3), byrow=T),stringsAsFactors=FALSE) 
# 得到了包名列表
names<-packagenames[,1]
# 安装
install.packages(names)

# 这几个软件包需要手工安装一下，CentOS 7等Linux
# install ffmpeg & ffmpeg-devel  before installing package "av" on CentOS 7
# https://linoxide.com/install-ffmpeg-centos-7/
# # yum install ffmpeg  ffmpeg-devel
install.packages("av")

# For Chapter 10~16, concurrent processiong.
# Chaper 10~16 needs this package, so the book will only be built on Linux.
# UNIX only package
install.packages("doMC")

# Chapter 17
# Enable gcc-8 first, need gfortran 8
# # source /opt/rh/devtoolset-8/enable
# Need Fortran library to compile package float
# # yum install lapack-devel.x86_64
install.packages("float")
install.packages("text2vec")

# 下面这些包是Windows与Linux上都需要的。
# For chapter 16
# https://bioconductor.org/packages/release/bioc/html/mixOmics.html
if (!require("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
# Install Bioconductor 3.14 for R-4.1, current 3.16 is for R-4.2
BiocManager::install(version = "3.14")
BiocManager::install("mixOmics", force = TRUE)

devtools::install_github("tidymodels/learntidymodels")
devtools::install_version("processx", version = "3.8.0")

二、下载项目源码及数据文件
1、下载项目源码
在Windows上可以下载zip文件再解压。

$ cd /home/jean
$ git clone https://github.com/tidymodels/TMwR.git

在Rstudio中用File->Open Project打开项目，选择.Rproj文件。

Rstudio打开源码项目

2、下载数据
全书需要联网下载的数据文件只有一处，为了避免渲染时连不上服务器，我把它先下载了放到项目的RData目录下。

$ cd /home/jean/TMwR-main/RData
$ wget https://data.cityofchicago.org/api/views/5neh-572f/rows.csv?accessType=DOWNLOAD&bom=true&format=true
$ mv rows.csv?accessType=DOWNLOAD CTA_-_Ridership_-__L__Station_Entries_-_Daily_Totals.csv

三、渲染生成全书
1、Linux。
A、修改02-tidyverse.Rmd中对数据文件的引用。

# url <- "https://data.cityofchicago.org/api/views/5neh-572f/rows.csv?accessType=DOWNLOAD&bom=true&format=true"
url <- "./RData/CTA_-_Ridership_-__L__Station_Entries_-_Daily_Totals.csv"

B、渲染生成全书。

t1<-proc.time()
bookdown::render_book()
t2<-proc.time()
cat("\n\n\n\n\n\n")
cat(t2-t1)

我的Linux虚拟主机4核8G内存大概要172分钟跑完。输出在项目的_book目录下，index.html。

渲染生成HTML版全书

2、Windows。
A、修改第10~16章中引用doMC包做并行计算的语句，这个包只在Linux上运行，Windows上安装不了。找出所有对registerDoMC()的引用。数据文件的引用也如上修改。

Microsoft Windows [版本 10.0.19045.2251]
(c) Microsoft Corporation。保留所有权利。

C:\Users\Jean>d:

D:\>cd D:\books\TidyModelsWithR-main

D:\books\TidyModelsWithR-main>findstr /s /i "registerDoMC" *.Rmd
10-resampling.Rmd:# registerDoMC(cores = parallel::detectCores())
10-resampling.Rmd:# registerDoMC(cores = 2)
11-comparing-models.Rmd:#registerDoMC(cores = parallel::detectCores())
12-tuning-parameters.Rmd:# registerDoMC(cores = parallel::detectCores())
13-grid-search.Rmd:# registerDoMC(cores = parallel::detectCores(logical = TRUE))
14-iterative-search.Rmd:# registerDoMC(cores = parallel::detectCores(logical = TRUE))
15-workflow-sets.Rmd:   registerDoMC(cores = cores)
16-dimensionality-reduction.Rmd:# registerDoMC(cores = parallel::detectCores())

D:\books\TidyModelsWithR-main>

第15章15-workflow-sets.Rmd演示了Windows上并行处理的方法：

cores <- parallel::detectCores()
if (!grepl("mingw32", R.Version()$platform)) {
   library(doMC)
   registerDoMC(cores = cores)
} else {
   library(doParallel)
   cl <- makePSOCKcluster(cores)
   registerDoParallel(cl)
}

不过经运行测试，第10章10-resampling.Rmd 与16章16-dimensionality-reduction.Rmd这样改是不够的，第10章是函数extract_fit_parsnip(x)在使用library(doParallel)时返回了一个包含parsnip模型的列表，而不是使用library(doMC)时返回的一个parsnip模型，具体描述及解决办发请参阅该帖子。第11~14章可以参考第15章直接改用doParallel包作并行处理。

# Unix and macOS only
# library(doMC)
# registerDoMC(cores = parallel::detectCores())
# registerDoMC(cores = 2)

B、第10章另外的修改：

get_model <- function(x) {
  # Modified for library(doParallel) without tidy()
  # extract_fit_parsnip(x) %>% tidy()
  extract_fit_parsnip(x)
}

# Test it using: 
# get_model(lm_fit)

ctrl <- control_resamples(extract = get_model)
lm_res <- lm_wflow %>%  fit_resamples(resamples = ames_folds, control = ctrl)
# Added for library(doParallel)
# Stop parallel at last
stopCluster(cl)

lm_res

lm_res$.extracts[[1]]

# To get the results
# lm_res$.extracts[[1]][[1]]
tidy(lm_res$.extracts[[1]][[1]][[1]])

# all_coef <- map_dfr(lm_res$.extracts, ~ .x[[1]][[1]])
# Added by Jean 2022/12/26
get_coef<-function(model){
  return (tidy(model[[1]][[1]]))
}
all_coef<- bind_rows(lapply(lm_res$.extracts, FUN=get_coef))
# Show the replicates for a single predictor:
filter(all_coef, term == "Year_Built")

C、修改第16章16-dimensionality-reduction.Rmd的一处，umap+fda模型组合在Windows 上即使串行运行也出错，会引起R进程终止，R4.1与R4.2都一样，先不包括它（虽然与basic及pls recipes组合没有问题）。然后在control_grid()中加入并行调参时要引用的包，参阅资料1，参阅资料2。根据参阅资料2的解释，doMC用fork创建并行进程，它会拷贝已加载的包到新进程中（所以没有Windows版），而doParallel包等则不会，所以需要在pkgs参数中指明。

# ctrl <- control_grid(parallel_over = "everything")
ctrl <- control_grid(parallel_over = "everything", pkgs=c("bestNormalize","embed"))

bean_res <- 
  workflow_set(
    preproc = list(basic = class ~., pls = pls_rec, umap = umap_rec), 
    # models = list(bayes = bayes_spec, fda = fda_spec,
    models = list(bayes = bayes_spec, # fda = fda_spec,
                  rda = rda_spec, bag = bagging_spec,
                  mlp = mlp_spec)
  ) %>% 
  workflow_map(
    verbose = TRUE,
    seed = 1603,
    resamples = bean_val,
    grid = 10,
    metrics = metric_set(roc_auc),
    control = ctrl
  )

Windows上并行调参运行的效果：

i   No tuning parameters. `fit_resamples()` will be attempted
i  1 of 12 resampling: basic_bayes
v  1 of 12 resampling: basic_bayes (14.6s)
i  2 of 12 tuning:     basic_rda
v  2 of 12 tuning:     basic_rda (2.3s)
i   No tuning parameters. `fit_resamples()` will be attempted
i  3 of 12 resampling: basic_bag
v  3 of 12 resampling: basic_bag (4s)
i  4 of 12 tuning:     basic_mlp
v  4 of 12 tuning:     basic_mlp (13s)
i  5 of 12 tuning:     pls_bayes
v  5 of 12 tuning:     pls_bayes (4.3s)
i  6 of 12 tuning:     pls_rda
v  6 of 12 tuning:     pls_rda (5.9s)
i  7 of 12 tuning:     pls_bag
v  7 of 12 tuning:     pls_bag (4.5s)
i  8 of 12 tuning:     pls_mlp
v  8 of 12 tuning:     pls_mlp (13.5s)
i  9 of 12 tuning:     umap_bayes
v  9 of 12 tuning:     umap_bayes (1m 44.8s)
i 10 of 12 tuning:     umap_rda
v 10 of 12 tuning:     umap_rda (1m 43.4s)
i 11 of 12 tuning:     umap_bag
v 11 of 12 tuning:     umap_bag (1m 41.6s)
i 12 of 12 tuning:     umap_mlp
v 12 of 12 tuning:     umap_mlp (1m 56.9s)

D、渲染生成全书。

t1<-proc.time()
bookdown::render_book()
t2<-proc.time()
cat("\n\n\n\n\n\n")
cat(t2-t1)

> cat(t2-t1)
38.28 1.81 4581.55 NA NA
>

我的笔记本16（8物理）核24G内存大概要77分钟跑完。

渲染生成HTML版全书

现在，我们有了全书和可运行验证的代码了，慢慢阅读学习吧。

四、Bookdown项目简介
详细请参阅《R Markdown: The Definitive Guide》一书的《Chapter 12 Books》一章及《bookdown: Authoring Books and Technical Documents with R Markdown》一书。
1、_bookdown.yml。
它定义了全书总体的章节结构，index.Rmd是默认的全书首个Rmd文件，rmd_files列表包括了全书所有的Rmarkdown源码文件。此处每章书渲染前先加载_common.R脚本做一些代码块选项等全书通用的基本设置。

new_session: yes

rmd_files: [
  "index.Rmd",
  
  "01-software-modeling.Rmd",
  "02-tidyverse.Rmd",
  "03-base-r.Rmd",
  
  "04-ames.Rmd",
  "05-data-spending.Rmd",
  "06-fitting-models.Rmd",
  "07-the-model-workflow.Rmd",
  "08-feature-engineering.Rmd",
  "09-judging-model-effectiveness.Rmd",

  "10-resampling.Rmd",
  "11-comparing-models.Rmd",
  "12-tuning-parameters.Rmd",
  "13-grid-search.Rmd",
  "14-iterative-search.Rmd",
  "15-workflow-sets.Rmd",

  "16-dimensionality-reduction.Rmd",
  "17-encoding-categorical-data.Rmd",
  "18-explaining-models-and-predictions.Rmd",
  "19-when-should-you-trust-predictions.Rmd",
  "20-ensemble-models.Rmd",
  "21-inferential-analysis.Rmd",
  
  "pre-proc-table.Rmd",
  "references.Rmd"
]

before_chapter_script: "_common.R"

2、_output.yml。
它定义了HTML、PDF等各个版本输出的设置。比如这里为HTML版定义了Cascating Style Sheet网页格式文件，它们决定了网页版的整体外观。

bookdown::gitbook:
  css: [style.css, TMwR.css]
  dev: png
  config:
    toc:
      collapse: section
      before: |
        <li><strong><a href="./">Tidy Modeling with R</a></strong></li>
    edit:
      link: https://github.com/tidymodels/TMwR/edit/main/%s
      text: "Edit"
    fontsettings: null  
    sharing: no

bookdown::pdf_book:
  latex_engine: pdflatex
  citation_package: natbib
  includes:
    in_header: latex_extras/preamble.tex
  keep_tex: yes
  highlight: tango

3、index.Rmd。
作为样书的例子，稍为讲一下index.Rmd。在最简单的情况下，不需要前面的两个yml文件，它们的定义属于补充性质，只需要一个index.Rmd文件，就可以成书。然后它里面的一级标题就是该章的标题，二级以下每个标题是一节、小节，等等，可以在标题后的花括号中加入选项，默认会为章节编号（参见上面主页图）。它一定是书的第一章，只有它可以包括YAML meta描述，后面的章节都自动继承它的设置。
下面的代码选项“ {-} ”表示本节不参与编号。

## Acknowledgments {-}

交叉引用“\@ref(software-modeling)”引用了第一章01-software-modeling.Rmd等。

In Chapter \@ref(software-modeling), we outline a taxonomy for models and highlight what good software for modeling is like.

被引用的名字software-modeling在01-software-modeling.Rmd的一级标题（章名）中用{}定义。

# Software for modeling {#software-modeling}

该章源码：

---
knit: "bookdown::render_book"
title: "Tidy Modeling with R"
author: ["Max Kuhn and Julia Silge"]
date: "`r tmwr_version()`"
site: bookdown::bookdown_site
description: "The tidymodels framework is a collection of R packages for modeling and machine learning using tidyverse principles. This book provides a thorough introduction to how to use tidymodels, and an outline of good methodology and statistical practice for phases of the modeling process."
github-repo: tidymodels/TMwR
twitter-handle: topepos
cover-image: images/cover.png
documentclass: book
classoption: 11pt
bibliography: [TMwR.bib]
biblio-style: apalike
link-citations: yes
colorlinks: yes
---

# Hello World {-} 

<a href="https://amzn.to/35Hn96s"><img src="images/cover.png" width="350" height="460" alt="Buy from Amazon" class="cover" /></a>

Welcome to _Tidy Modeling with R_! This book is a guide to using a collection of software in the R programming language for model building called `r pkg(tidymodels)`, and it has two main goals: 

- First and foremost, this book provides a practical introduction to **how to use** these specific R packages to create models. We focus on a dialect of R called [the tidyverse](https://www.tidyverse.org/) that is designed with a consistent, human-centered philosophy, and demonstrate how the tidyverse and the `r pkg(tidymodels)` packages can be used to produce high quality statistical and machine learning models.

- Second, this book will show you how to **develop good methodology and statistical practices**. Whenever possible, our software, documentation, and other materials attempt to prevent common pitfalls. 

In Chapter \@ref(software-modeling), we outline a taxonomy for models and highlight what good software for modeling is like. The ideas and syntax of the tidyverse, which we introduce (or review) in Chapter \@ref(tidyverse), are the basis for the tidymodels approach to these challenges of methodology and practice. Chapter \@ref(base-r) provides a quick tour of conventional base R modeling functions and summarizes the unmet needs in that area. 

After that, this book is separated into parts, starting with the basics of modeling with tidy data principles. Chapters \@ref(ames) through \@ref(performance) introduces an example data set on house prices and demonstrates how to use the fundamental tidymodels packages: `r pkg(recipes)`, `r pkg(parsnip)`, `r pkg(workflows)`, `r pkg(yardstick)`, and others. 

The next part of the book moves forward with more details on the process of creating an effective model. Chapters \@ref(resampling) through \@ref(workflow-sets) focus on creating good estimates of performance as well as tuning model hyperparameters. 

Finally, the last section of this book, Chapters \@ref(dimensionality) through \@ref(inferential), covers other important topics for model building. We discuss more advanced feature engineering approaches like dimensionality reduction and encoding high cardinality predictors, as well as how to answer questions about why a model makes certain predictions and when to trust your model predictions.

We do not assume that readers have extensive experience in model building and statistics. Some statistical knowledge is required, such as random sampling, variance, correlation, basic linear regression, and other topics that are usually found in a basic undergraduate statistics or data analysis course. We do assume that the reader is at least slightly familiar with dplyr, ggplot2, and the `%>%` "pipe" operator in R, and is interested in applying these tools to modeling. For users who don't yet have this background R knowledge, we recommend books such as [*R for Data Science*](https://r4ds.had.co.nz/) by Wickham and Grolemund (2016). Investigating and analyzing data are an important part of any model process.

This book is not intended to be a comprehensive reference on modeling techniques; we suggest other resources to learn more about the statistical methods themselves. For general background on the most common type of model, the linear model, we suggest @fox08.  For predictive models, @apm and @fes are good resources. For machine learning methods, @Goodfellow is an excellent (but formal) source of information. In some cases, we do describe the models we use in some detail, but in a way that is less mathematical, and hopefully more intuitive. 


## Acknowledgments {-}

`\``{r, eval = FALSE, echo = FALSE}
library(tidyverse)
contribs_all_json <- gh::gh("/repos/:owner/:repo/contributors",
  owner = "tidymodels",
  repo = "TMwR",
  .limit = Inf
)
contribs_all <- tibble(
  login = contribs_all_json %>% map_chr("login"),
  n = contribs_all_json %>% map_int("contributions")
)
contribs_old <- read_csv("contributors.csv", col_types = list())
contribs_new <- contribs_all %>% anti_join(contribs_old, by = "login")
# Get info for new contributors
needed_json <- map(
  contribs_new$login, 
  ~ gh::gh("/users/:username", username = .x)
)
info_new <- tibble(
  login = contribs_new$login,
  name = map_chr(needed_json, "name", .default = NA),
  blog = map_chr(needed_json, "blog", .default = NA)
)
contribs_new <- contribs_new %>% left_join(info_new, by = "login")
contribs_all <- bind_rows(contribs_old, contribs_new) %>% arrange(login)
write_csv(contribs_all, "contributors.csv")
`\``

We are so thankful for the contributions, help, and perspectives of people who have supported us in this project. There are several we would like to thank in particular.

We would like to thank our RStudio colleagues on the `r pkg(tidymodels)` team (Davis Vaughan, Hannah Frick, Emil Hvitfeldt, and Simon Couch) as well as the rest of our coworkers on the RStudio open source team. Thank you to Desirée De Leon for the site design of the online work. We would also like to thank our technical reviewers, Chelsea Parlett-Pelleriti and Dan Simpson, for their detailed, insightful feedback that substantively improved this book, as well as our editors, Nicole Tache and Rita Fernando, for their perspective and guidance during the process of writing and publishing.


`\``{r, results = "asis", echo = FALSE, message = FALSE}
library(dplyr)
contributors <- read.csv("contributors.csv", stringsAsFactors = FALSE)
contributors <- contributors %>% 
  filter(!login %in% c("topepo", "juliasilge", "dcossyleon")) %>% 
  mutate(
    login = paste0("\\@", login),
    desc = ifelse(is.na(name), login, paste0(name, " (", login, ")"))
  )
cat("This book was written in the open, and multiple people contributed via pull requests or issues. Special thanks goes to the ", xfun::n2w(nrow(contributors)), " people who contributed via GitHub pull requests (in alphabetical order by username): ", sep = "")
cat(paste0(contributors$desc, collapse = ", "))
cat(".\n")
`\``

## Using Code Examples {-}

`\``{r pkg-list, echo = FALSE}
deps <- desc::desc_get_deps()
pkgs <- sort(deps$package[deps$type == "Imports"])
pkgs <- sessioninfo::package_info(pkgs, dependencies = FALSE)
df <- tibble::tibble(
  package = pkgs$package,
  version = pkgs$ondiskversion,
  source = pkgs$source
) %>% 
  mutate(
    source = stringr::str_split(source, " "),
    source = purrr::map_chr(source, ~ .x[1]),
    info = paste0(package, " (", version, ", ", source, ")")
    )
pkg_info <- knitr::combine_words(df$info)
`\``

This book was written with [RStudio](http://www.rstudio.com/ide/) using [bookdown](http://bookdown.org/). The [website](https://tmwr.org) is hosted via [Netlify](http://netlify.com/), and automatically built after every push by [GitHub Actions](https://help.github.com/actions). The complete source is available on [GitHub](https://github.com/tidymodels/TMwR). We generated all plots in this book using [ggplot2](https://ggplot2.tidyverse.org/) and its black and white theme (`theme_bw()`). 

This version of the book was built with `r R.version.string`, [pandoc](https://pandoc.org/) version `r rmarkdown::pandoc_version()`, and the following packages: `r pkg_info`.

4、数学公式。
科技文献往往少不了数学公式，比如统计学领域的文章（大数据算法的基础主要是概率论与统计学），都用LaTeX语法写，比如第19章的数学公式源码中这样写：

$$
\mathrm{logit}(p) = -1 - 2x - \frac{x^2}{5} + 2y^2 
$$

它渲染的效果如下：
$\mathrm{logit}(p) = -1 - 2x - \frac{x^2}{5} + 2y^2$

Bookdown例子：《Tidy Modeling with R》

推荐阅读更多精彩内容