SQL Queries from inside R

In your life as a data scientist, you'll often be working with huge databases that contain tables with millions of rows. If you want to do some analyses on this data, it's possible that you only need a fraction of this data. In this case, it's a good idea to send SQL queries to your database, and only import the data you actually need into R.

dbGetQuery() is what you need. As usual, you first pass the connection object to it. The second argument is an SQL query in the form of a character string. This example selects theagevariable from thepeopledataset where gender equals "male":

dbGetQuery(con, "SELECT age FROM people WHERE gender = 'male'")

Apart from checking equality, you can also check forless thanandgreater thanrelationships, with, just like in R.

# Connect to the database

library(DBI)

con <- dbConnect(RMySQL::MySQL(),

dbname = "tweater",

host = "courses.csrrinzqubik.us-east-1.rds.amazonaws.com",

port = 3306,

user = "student",

password = "datacamp")

# Import post column of tweats where date is higher than '2015-09-21': latest

latest<-dbGetQuery(con,"select post from tweats where date>'2015-09-21'")

# Print latest

latest

# Create data frame specific

specific<-dbGetQuery(con,"select message from comments where tweat_id=77 and user_id>4")

# Print specific

specific

There are also dedicated SQL functions that you can use in theWHEREclause of an SQL query. For example,CHAR_LENGTH() returns the number of characters in a string.


Of course, SQL does not stop with the the three keywordsSELECT,FROMandWHERE. Another very often used keyword isJOIN, and more specificallyINNER JOIN. Take this call for example:

SELECT name, post

FROM users INNER JOIN tweats on users.id = user_id

WHERE date > "2015-09-19"

Here, theuserstable is joined with thetweatstable. This is possible because theidcolumn in theuserstable corresponds to theuser_idcolumn in thetweatstable. Also notice howname, from theuserstable, andpostanddate, from thetweatstable, can be referenced to without problems.

Can you predict the outcome of the following query?

SELECT post, message

FROM tweats INNER JOIN comments on tweats.id = tweat_id

WHERE tweat_id = 77


You've used dbGetQuery() multiple times now. This is a virtual function from the DBI package, but is actually implemented by the RMySQL package. Behind the scenes, the following steps are performed:

Sending the specified query with dbSendQuery();

Fetching the result of executing the query on the database with dbFetch();

Clearing the result with dbClearResult().

# Send query to the database

res <- dbSendQuery(con, "SELECT * FROM comments WHERE user_id > 4")

# Use dbFetch() twice

dbFetch(res, n = 2)

dbFetch(res)

# Clear res

dbClearResult(res)

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容