数据框Data frames ------From STHDA

Data frames ---From STHDA

A data frame is like a matrix but can have columns with different types (numeric, character, logical).
Rows are observations (individuals) and columns are variables.

Create a data frame

using the function data.frame(), as follow:

friends_data <- data.frame(name = c("A","B","C","D"),
                          age = c(25,27 ,26,29),
                          height = c(180, 170, 185, 169),
                          married = c(T,F,F,T)
)# Create a data frame
friends_data # Print
is.data.frame(friends_data) #To check whether a data is a data frame, use the is.data.frame() function. Returns TRUE if the data is a data frame.
data.frame() is.data.frame()

col1 <- c(5, 6, 7, 8, 9)# Numeric vectors
col2 <- c(2, 4, 5, 9, 8)# Numeric vectors
col3 <- c(7, 3, 4, 8, 7)# Numeric vectors
my_data <- cbind(col1, col2, col3)# Combine the vectors by column
my_data
is.data.frame(my_data)
class() as.data.frame()

The object “friends_data” is a data frame, but not the object “my_data”. We can convert-it to a data frame using the as.data.frame() function:

class(my_data)# What is the class of my_data? --> matrix

my_data2 <- as.data.frame(my_data)# Convert it as a data frame

class(my_data2)# Convert it as a data frame

As described in matrix section, you can use the function t() to transpose a data frame:

t(friends_data)
t()

Subset a data frame

To select just certain columns from a data frame, you can either refer to the columns by name or by their location (i.e., column 1, 2, 3, etc.).

  1. Positive indexing by name and by location
    Select rows/columns by positive indexing---Select by row/column names
# Access the data in 'name' column
# dollar sign is used  ***$***
friends_data$name

# or use this
friends_data[, 'name']

# Subset columns 1 and 3
friends_data[ , c(1, 3)]
positive indexing:$ ,"colname",c(1,2,3,4:5)
  1. Negative indexing
    Exclude rows/columns by negative indexing
# Exclude column 1
friends_data[, -1]

Negative indexing: -1,c(-1,-2)
  1. Index by characteristics
    Selection by logical: T F
#We want to select all friends with age >= 27.

friends_data$age >= 27# Identify rows that meet the condition,return  lodgic, 
 #TRUE specifies that the row contains a value of age >= 27.else ,FALSE, not

friends_data[friends_data$age >= 27, ]# Select the rows that meet the condition

#The R code above, tells R to get all rows from friends_data where age >= 27, and then to return all the columns.

#If you don’t want to see all the column data for the selected rows but are just interested in displaying, for example, friend names and age for friends with age >= 27, you could use the following R code:


friends_data[friends_data$age >= 27,  c(1, 2)]# Use column locations
# Or use column names
friends_data[friends_data$age >= 27, c("name", "age")]
logical indexing

indexing: positive,negative,logic

a. If you’re finding that your selection statement is starting to be inconvenient, you can --put your row and column selections into variables first---, such as:
b. Then you can select the rows and columns with those variables:

age27 <- friends_data$age >= 27
cols <- c("name", "age")

friends_data[age27, cols]
index by variables
  1. function :subset()
    It’s also possible to use the function subset() as follow.
    subset()
# Select friends data with age >= 27
subset(friends_data, age >= 27)
function :subset()
  1. function: attach() and detach().
    Another option is to use the functions attach() and detach().
    The function attach() takes a data frame and makes its columns accessible by simply giving their names.
    used as follow:
# Attach a data frame
attach(friends_data)
# === Data manipulation ====
friends_data[age>=27, ]
# === End of data manipulation ====
# Detach the data frame
detach(friends_data)
functions attach() and detach().

Extend a data frame

a. $ #Add new column in a data frame

# Add group column to friends_data
friends_data$group <- friend_groups
friends_data
$
variable$colname

b. It’s also possible to use the functions cbind() and rbind() to extend a data frame.

cbind(friends_data, group = friend_groups)
cbind(df1,df2)

Calculations with data frame or matrix

With numeric data frame, you can use the function
rowSums(),
colSums(),
colMeans(),
rowMeans()
and apply()
as described in matrix section.
rowSums() and colSums() functions: Compute the total of each row and the total of each column, respectively.
It’s also possible to perform simple operations on matrice. For example, the following R code multiplies each element of the matrix by 2:

matrix

Note that, it’s also possible to use the function apply() to apply any statistical functions to rows/columns of matrices.
Use apply() as follow:

my_data
my_data*2
log2(my_data)#compute the log2 values
rowSums(my_data)# Total of each row
colSums(my_data)# Total of each column
  
 #apply(X, MARGIN, FUN) #X: your data matrix #MARGIN: possible values are 1 (for rows) and 2 (for columns) #FUN: the function to apply on rows/columns

apply(my_data, 1, mean) # Compute row means
apply(my_data, 1, median)# Compute row medians
apply(my_data, 2, mean)# Compute column means
numeric matrix row col calculations
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容