The most important functions in rvest are:
Create an html document from a url, a file on disk or a string containing html with
read_html().Select parts of a document using css selectors:
html_nodes(doc, "table td")(or if you've a glutton for punishment, use xpath selectors withhtml_nodes(doc, xpath = "//table//td")). If you haven't heard of selectorgadget, make sure to readvignette("selectorgadget")to learn about it.
注意的是,这里是html_nodes,因为依然是由html_node这个函数,而html_node只会选取一个节点Extract components with
html_tag()(the name of the tag),html_text()(all text inside the tag),html_attr()(contents of a single attribute) andhtml_attrs()(all attributes).(You can also use rvest with XML files: parse with
xml(), then extract components usingxml_node(),xml_attr(),xml_attrs(),xml_text()andxml_tag().)Parse tables into data frames with
html_table().Extract, modify and submit forms with
html_form(),set_values()andsubmit_form().Detect and repair encoding problems with
guess_encoding()andrepair_encoding().Navigate around a website as if you're in a browser with
html_session(),jump_to(),follow_link(),back(),forward(),submit_form()and so on. (This is still a work in progress, so I'd love your feedback.)
的