Cousera——Machine Learning习题笔记

代价函数及梯度下降算法的应用

/#1

Consider the following training set of m=4 training examples:

x y
1 0.5
2 1
4 2
0 0

Consider the linear regression model hθ(x)=θ0+θ1x. What are the values of θ0 and θ1 that you would expect to obtain upon running gradient descent on this model? (Linear regression will be able to fit this data perfectly.)

  • A. θ0=0.5,θ1=0

  • B. θ0=0.5,θ1=0.5

  • C. θ0=1,θ1=1

  • D. θ0=1,θ1=0.5

  • F. θ0=0,θ1=0.5

    分析解答:由四组样本数据可以得出一个标准的一元一次线性方程,由此可求出答案是F
    

/#2

Let f be some function so that

f(θ0,θ1) outputs a number. For this problem,

f is some arbitrary/unknown smooth function (not necessarily the

cost function of linear regression, so f may have local optima).

Suppose we use gradient descent to try to minimize f(θ0,θ1)

as a function of θ0 and θ1. Which of the

following statements are true? (Check all that apply.)

  • A. If θ0 and θ1 are initialized at the global minimum, then one iteration will not change their values.

  • B. Setting the learning rate α to be very small is not harmful, and can only speed up the convergence of gradient descent.

  • C. If the first few iterations of gradient descent cause f(θ0,θ1) to increase rather than decrease, then the most likely cause is that we have set the learning rate α to too large a value.

  • D. No matter how θ0 and θ1 are initialized, so long as α is sufficiently small, we can safely expect gradient descent to convergen to the same solution.

      分析解答:学习速率影响其数据变化的快慢
    

/#3

For this question, assume that we are

using the training set from Q1. Recall our definition of the

cost function was J(θ0,θ1)=12m∑i=1m(hθ(x(i))−y(i))2.

What is J(0,1)? In the box below,

please enter your answer (Simplify fractions to decimals when entering answer, and '.' as the decimal delimiter e.g., 1.5).

    分析解答:展开公式直接带入得0.5

多元线性回归方程

Suppose m=4 students have taken some class, and the class had a midterm exam and a final exam. You have collected a dataset of their scores on the two exams, which is as follows:

midterm exam (midterm exam)^2 final exam
89 7921 96
72 5184 74
94 8836 87
69 4761 78

You'd like to use polynomial regression to predict a student's final exam score from their midterm exam score. Concretely, suppose you want to fit a model of the form hθ(x)=θ0+θ1x1+θ2x2, where x1 is the midterm score and x2 is (midterm score)2. Further, you plan to use both feature scaling (dividing by the "max-min", or range, of a feature) and mean normalization.

What is the normalized feature x1(3)? (Hint: midterm = 94, final = 87 is training example 3.) Please round off your answer to two decimal places and enter in the text box below.

公式:正规方程特征 = (目标值 - 平均值)/(Max-Min)

  分析解答:平均值为 (7921+5184+8836+4761)/4=6675.5
  Max-Min=8836-4761=4075
  (94-6675.5)/4075=-1.61509202
  保留两位小数为-1.62
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容