2021-01-18 Datacamp IntermediateR chapter4

Have another look at some useful math functions that R features:

abs(): Calculate the absolute value.
sum(): Calculate the sum of all the values in a data structure.
mean(): Calculate the arithmetic mean.
round(): Round the values to 0 decimal places by default. Try out ?round in the console for variations of round() and ways to change the number of digits to round to.

rev() function: reverse the vector

> x<- c(1:5,5:3)
> x
[1] 1 2 3 4 5 5 4 3
> rev(x)
[1] 3 4 5 5 4 3 2 1
> rev(1:7)
[1] 7 6 5 4 3 2 1

If you check out the documentation of mean(), you'll see that only the first argument, x, should be a vector. If you also specify a second argument, R will match the arguments by position and expect a specification of the trim argument. Therefore, merging the two vectors is a must!

R features a bunch of functions to juggle around with data structures::

seq(): Generate sequences, by specifying the from, to, and by arguments.
rep(): Replicate elements of vectors and lists.
sort(): Sort a vector in ascending order. Works on numerics, but also on character strings and logicals.
rev(): Reverse the elements in a data structures for which reversal is defined.
str(): Display the structure of any R object.
append(): Merge vectors or lists.
is.*(): Check for the class of an R object.
as.*(): Convert an R object from one class to another.
unlist(): Flatten (possibly embedded) lists to produce a vector.

# The linkedin and facebook lists have already been created for you
linkedin <- list(16, 9, 13, 5, 2, 17, 14)
facebook <- list(17, 7, 5, 16, 8, 13, 14)

# Convert linkedin and facebook to a vector: li_vec and fb_vec
li_vec<-unlist(linkedin)
fb_vec<-unlist(facebook)
# Append fb_vec to li_vec: social_vec
social_vec<-append(li_vec,fb_vec)
# Sort social_vec
rev(sort(social_vec))

fix the code:
the correct one:

rep(seq(1, 7, by = 2), times = 7)

the wrong one:

seq(rep(1,7,by=2),times=7))

The seq() function:
by=2 means the latter is 2 unites more than the form

# Create first sequence: seq1
seq1<-seq(1,500,by=3)
# Create second sequence: seq2
seq2<-seq(1200,900,by=-7)

# Calculate total sum of the sequences
print(sum(seq1,seq2))

if i want to create a sequence:
1-500 with 3 increment , the code is above.

In their most basic form, regular expressions can be used to see whether a pattern exists inside a character string or a vector of character strings. For this purpose, you can use:

grepl(), which returns TRUE when a pattern is found in the corresponding character string.
grep(), which returns a vector of indices of the character strings that contains the pattern.

Both functions need a pattern and an x argument, where pattern is the regular expression you want to match for, and the x argument is the character vector from which matches should be sought.

# The emails vector has already been defined for you
emails <- c("john.doe@ivyleague.edu", "education@world.gov", "dalai.lama@peace.org",
            "invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv")

# Use grepl() to match for "edu"
grepl("edu",emails)

# Use grep() to match for "edu", save result to hits
hits<-grep("edu",emails)

# Subset emails using hits
emails[hits]

there's more that can be added to make the pattern more robust:
use//.edu$

 The emails vector has already been defined for you
emails <- c("john.doe@ivyleague.edu", "education@world.gov", "dalai.lama@peace.org",
            "invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv")

# Use grepl() to match for .edu addresses more robustly
grepl("@.*\\.edu$",emails)

# Use grep() to match for .edu addresses more robustly, save result to hits
hits<-grep("@.*\\.edu$",emails)

# Subset emails using hits
emails[hits]

.* You can use them to match any character between the at-sign and the ".edu" portion of an email address.
\.edu$, to match the ".edu" part of the email at the end of the string. The \ part escapes the dot: it tells R that you want to use the . as an actual character.

use sub() to substitute the "@edu" with "@datacamp.edu"

# The emails vector has already been defined for you
emails <- c("john.doe@ivyleague.edu", "education@world.gov", "global@peace.org",
            "invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv")

# Use sub() to convert the email domains to datacamp.edu
sub("@.*\\.edu$","@datacamp.edu",emails)

.*: A usual suspect! It can be read as "any character that is matched zero or more times".
\\s: Match a space. The "s" is normally a character, escaping it (\\) makes it a metacharacter.
[0-9]+: Match the numbers 0 to 9, at least once (+).
([0-9]+): The parentheses are used to make parts of the matching string available to define the replacement. The \\1 in the replacementargument of sub() gets set to the string that is captured by the regular expression [0-9]+.

awards <- c("Won 1 Oscar.",
  "Won 1 Oscar. Another 9 wins & 24 nominations.",
  "1 win and 2 nominations.",
  "2 wins & 3 nominations.",
  "Nominated for 2 Golden Globes. 1 more win & 2 nominations.",
  "4 wins & 1 nomination.")
 
sub(".*\\s([0-9]+)\\snomination.*$", "\\1", awards)
[1] "Won 1 Oscar." "24"           "2"            "3"            "2"           
[6] "1"

Great! Can you explain why all of this happened? The ([0-9]+) selects the entire number that comes before the word “nomination” in the string, and the entire match gets replaced by this number because of the \1 that reference to the content inside the parentheses.

DATE:
To create a Date object from a simple character string in R, you can use the as.Date() function. The character string has to obey a format that can be defined using a set of symbols (the examples correspond to 13 January, 1982):

%Y: 4-digit year (1982)
%y: 2-digit year (82)
%m: 2-digit month (01)
%d: 2-digit day of the month (13)
%A: weekday (Wednesday)
%a: abbreviated weekday (Wed)
%B: month (January)
%b: abbreviated month (Jan)

The following R commands will all create the same Dateobject for the 13th day in January of 1982:

as.Date("1982-01-13")
as.Date("Jan-13-82", format = "%b-%d-%y")
as.Date("13 January, 1982", format = "%d %B, %Y")

Notice that the first line here did not need a format argument, because by default R matches your character string to the formats "%Y-%m-%d" or "%Y/%m/%d".

In addition to creating dates, you can also convert dates to character strings that use a different date notation. For this, you use the format() function. Try the following lines of code:

today <- Sys.Date()
format(Sys.Date(), format = "%d %B, %Y")
format(Sys.Date(), format = "Today is a %A!")

Similar to working with dates, you can use as.POSIXct() to convert from a character string to a POSIXct object, and format() to convert from a POSIXct object to a character string. Again, you have a wide variety of symbols:

%H: hours as a decimal number (00-23)
%I: hours as a decimal number (01-12)
%M: minutes as a decimal number
%S: seconds as a decimal number
%T: shorthand notation for the typical format %H:%M:%S
%p: AM/PM indicator

For a full list of conversion symbols, consult the strptimedocumentation in the console:

?strptime

Again,as.POSIXct() uses a default format to match character strings. In this case, it's %Y-%m-%d %H:%M:%S. In this exercise, abstraction is made of different time zones.

# Definition of character strings representing dates
str1 <- "May 23, '96"
str2 <- "2012-03-15"
str3 <- "30/January/2006"

# Convert the strings to dates: date1, date2, date3
date1 <- as.Date(str1, format = "%b %d, '%y")
date2<-as.Date("2012-03-15")
date3<-as.Date(str3,format="%d/%B/%Y")

# Convert dates to formatted strings
format(date1, "%A")
format(date2, "%d")
format(date3, "%b %Y")

# Definition of character strings representing times
str1 <- "May 23, '96 hours:23 minutes:01 seconds:45"
str2 <- "2012-3-12 14:23:08"

# Convert the strings to POSIXct objects: time1, time2
time1 <- as.POSIXct(str1, format = "%B %d, '%y hours:%H minutes:%M seconds:%S")
time2<-as.POSIXct("2012-3-12 14:23:08")

# Convert times to formatted strings
print(format(time1, "%M"))
format(time2, "%I:%M %p")

for the calculation of date and time:

# day1, day2, day3, day4 and day5 are already available in the workspace

# Difference between last and first pizza day
print(day5-day1)

# Create vector pizza
pizza <- c(day1, day2, day3, day4, day5)
# Create differences between consecutive pizza days: day_diff
day_diff<-diff(pizza)
day_diff
# Average period between two consecutive pizza days
mean(day_diff)

Calculations using POSIXct objects are completely analogous to those using Date objects. Try to experiment with this code to increase or decrease POSIXct objects:

now <- Sys.time()
now + 3600          # add an hour
now - 3600 * 24     # subtract a day
Adding or subtracting time objects is also straightforward:

birth <- as.POSIXct("1879-03-14 14:37:23")
death <- as.POSIXct("1955-04-18 03:47:12")
einstein <- death - birth
einstein

2021-01-18 Datacamp IntermediateR chapter4

推荐阅读更多精彩内容