advance-R 19.3 Missing values

Missing values

这一节没有很明白……

Rcpp为C++定义了NumericVector, IntegerVector, CharacterVector, Matrix等新数据类型，与R的numeric, charactor, matrix对应。

Rcpp最基础的R数据类型是RObject, 这是NumericVector, IntegerVector等的基类, 通常不直接使用。

因为RObject类是基类，所以其成员函数也适用于NumericVector等类。 isNULL, isObject, isS4可以查询是否NULL, 是否对象，是否S4对象。 inherits可以查询是否继承自某个特定类。用attributeNames, hasAttribute, attr可以访问对象的属性。用hasSlot, slot可以访问S4对象的插口（slot）。

from:http://www.math.pku.edu.cn/teachers/lidf/docs/Rbook/html/_Rbook/rcpp-vectype.html

Scalars

The following code explores what happens when you take one of R's missing values, coerce(强制) it into a scalar. and then coerce back to an R vector. Note that this kind of experimentation is a useful way to figure out what any operation does.
下面的代码探索发生了什么当你把R的一个缺失值,强迫(强制)成一个标量。然后强制回到R向量。请注意，这种实验是一种很有用的方法，可以计算出任何操作的功能。

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List scalar_missings(){
  int int_s = NA_INTEGER;
  String chr_s = NA_STRING;
  bool lgl_s = NA_LOGICAL;
  double num_s = NA_REAL;
  
  return List::create(int_s, chr_s, lgl_s, num_s);
}

library(Rcpp)
sourceCpp('scalar_missings.cpp')

str(scalar_missings())

With the exception of bool, things look pretty good here: all of the missing values have been preserved.
However, as we'll see in the following sections, things are not quite as straightforward as they seem.
除了bool之外，这里的情况看起来非常好:所有缺失的值都得到了保留。
然而，正如我们将在下面的部分中看到的，事情并不像看上去那么简单。

1.Integers

With integers, missing values are stored as the smallest integer. If you don't do anything to them, they'll be preserved. But, since C++ doesn't know that the smallest integer has this special behaviour, if you do anything to it you're likely to get an incorrect value

So if you want to work with missing values in integers, either use a length one IntegerVector or be very careful with your code.

对于整数，丢失的值存储为最小的整数。如果你什么都不做，它们就会被保存下来。但是，由于c++不知道最小的整数有这种特殊的行为，如果对它做任何操作，都可能得到一个不正确的值
所以如果你想处理整数中缺失的值，要么使用长度为1的整数向量，要么在代码中非常小心。
例如：

> evalCpp('NA_INTEGER + 1')
[1] -2147483647

evalCpp: 计算c++表达式。这将使用cppFunction创建一个c++函数，并调用它来获得结果。

2.Doubles

With doubles, you may be able to get away with ignoring missing values and working with NaNs (not a number). This is because R's NA is a special type of IEEE 754 floating point number NaN. So any logical expression that involves a NaN(or in C++, NAN) always evaluates as FALSE:
使用double，您可以忽略缺失的值并使用NaNs(不是数字)。这是因为R的NA是IEEE 754浮点数NaN的一种特殊类型。因此，任何包含NaN(或c++中的NaN)的逻辑表达式的计算结果总是为FALSE:

> evalCpp("NAN + 1")
[1] NaN
> evalCpp("NAN - 1")
[1] NaN
> evalCpp("NAN / 1")
[1] NaN
> evalCpp("NAN * 1")
[1] NaN

Strings

没问题

Boolean

R: TRUE, FALSE, NA
C++: true，false
要确认不含NA，否则它会被转化为True

Vectors

With vectors, you need to use a missing value specific to the type of vector, NA_REAL, NA_INTEGER, NA_LOGICAL, NA_STRING:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List missing_sampler(){
  return List::create(
    NumericVector::create(NA_REAL),
    IntegerVector::create(NA_INTEGER),
    LogicalVector::create(NA_LOGICAL),
    CharacterVector::create(NA_STRING)
  );
}

# =======Vector=========
> sourceCpp('missing_sampler.cpp')
> str(missing_sampler())
List of 4
 $ : num NA
 $ : int NA
 $ : logi NA
 $ : chr NA

To check if a value in a vector is missing, use the class method ::is_na():
要检查向量中的值是否丢失，可以使用::is_na():

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
LogicalVector is_naC(NumericVector x){
  int n = x.size();
  LogicalVector out(n);
  
  for (int i = 0; i < n; ++i){
    out[i] = NumericVector::is_na(x[i]);
  }
  return out;
}

is_naC(c(NA, 5.4, 3.2, NA))
[1]  TRUE FALSE FALSE  TRUE

Another alternative is the sugar function is_na(), which takes a vector and returns a logical vector.

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
LogicalVector is_naC2(NumericVector x){
  return is_na(x);
}

> is_naC(c(NA, 5.4, 3.2, NA))
[1]  TRUE FALSE FALSE  TRUE