The most important functions in rvest are:
Create an html document from a url, a file on disk or a string containing html with
read_html()
.Select parts of a document using css selectors:
html_nodes(doc, "table td")
(or if you've a glutton for punishment, use xpath selectors withhtml_nodes(doc, xpath = "//table//td")
). If you haven't heard of selectorgadget, make sure to readvignette("selectorgadget")
to learn about it.
注意的是,这里是html_nodes,因为依然是由html_node这个函数,而html_node只会选取一个节点Extract components with
html_tag()
(the name of the tag),html_text()
(all text inside the tag),html_attr()
(contents of a single attribute) andhtml_attrs()
(all attributes).(You can also use rvest with XML files: parse with
xml()
, then extract components usingxml_node()
,xml_attr()
,xml_attrs()
,xml_text()
andxml_tag()
.)Parse tables into data frames with
html_table()
.Extract, modify and submit forms with
html_form()
,set_values()
andsubmit_form()
.Detect and repair encoding problems with
guess_encoding()
andrepair_encoding()
.Navigate around a website as if you're in a browser with
html_session()
,jump_to()
,follow_link()
,back()
,forward()
,submit_form()
and so on. (This is still a work in progress, so I'd love your feedback.)
的