1. You are training a classification model with logistic regression. Which of the following statements are true? Check all that apply.【D】
A. Introducing regularization to the model always results in equal or better performance on the training set.
B. Introducing regularization to the model always results in equal or better performance on the training set.
【解析】如果引入的正则化的lambda参数过大,就会导致欠拟合,从而会导致最后的结果更糟。
C. Adding many new features to the model helps prevent overfitting on the training set.
【解析】增加许多新的特征到预测模型里会让预测模型更好的拟合训练集的数据,但是如果添加的特征太多,就会有可能导致过拟合,从而导致不能泛化到需要预测的数据,因而导致预测不够精准。
D. Adding a new feature to the model always results in equal or better performance on examples not in the training set.
【解析】增加新的变量可能会导致过度拟合,从而导致更糟糕的结果预测,而不是训练集的拟合。
E. Adding a new feature to the model always results in equal or better performance on the training set.
【解析】增加新的特征会让预测模型更佳具有表达性,从而会更好的拟合训练集。By adding a new feature, our model must be more (or just as) expressive, thus allowing it learn more complex hypotheses to fit the training set.
2.Which of the following statements are true? Check all that apply.【BD】
A. Suppose you have a multi-class classification problem with three classes, trained with a 3 layer network. Let a(3)1=(hΘ(x))1 be the activation of the first output unit, and similarly a(3)2=(hΘ(x))2 and a(3)3=(hΘ(x))3. Then for any input x, it must be the case that a(3)1+a(3)2+a(3)3=1.
B.In a neural network with many layers, we think of each successive layer as being able to use the earlier layers as features, so as to be able to compute increasingly complex functions.
C.If a neural network is overfitting the data, one solution would be to decrease the regularization parameter λ.
D.If a neural network is overfitting the data, one solution would be to increase the regularization parameter λ.
3.You are using the neural network pictured below and have learned the parameters Θ(1)=[11−1.55.13.72.3] (used to compute a(2)) and Θ(2)=[10.6−0.8] (used to compute a(3)} as a function of a(2)). Suppose you swap the parameters for the first hidden layer between its two units so Θ(1)=[115.1−1.52.33.7] and also swap the output layer so Θ(2)=[1−0.80.6]. How will this change the value of the output hΘ(x) ?【A】
A.It will stay the same.
B.It will increase.
C.It will decrease
D.Insufficient information to tell: it may increase or decrease.
4. Which of the following statements aretrue? Check all that apply. 【BD】
A. Suppose you are traininga logistic regression classifier using polynomial features and want to selectwhat degree polynomial (denoted d in thelecture videos) to use. After training the classifier on the entire trainingset, you decide to use a subset of the training examples as a validation set.This will work just as well as having a validation set that is separate(disjoint) from the training set.
B. Suppose you areusing linear regression to predict housing prices, and your dataset comessorted in order of increasing sizes of houses. It is then important to randomlyshuffle the dataset before splitting it into training, validation and testsets, so that we don’t have all the smallest houses going into the trainingset, and all the largest houses going into the test set.
**C. **It is okay touse data from the test set to choose the regularization parameter λ, but not themodel parameters (θ).
D. A typical splitof a dataset into training, validation and test sets might be 60% training set,20% validation set, and 20% test set.
5.Suppose you have a dataset with n = 10 features and m = 5000 examples. After training your logistic regression classifier with gradient descent, you find that it has underfit the training set and does not achieve the desired performance on the training or cross validation sets. Which of the following might be promising steps to take? Check all that apply.【AC】
A. Use an SVM with a Gaussian Kernel.
【解析】带有高斯核的SVM可以拟合出更复杂的决策边界,这意味着可以一定程度上修正前拟合。
B. Use a different optimization method since using gradient descent to train logistic regression might result in a local minimum.
C. Create / add new polynomial features.
D. Increase λ.