###熵、信息增益、信息增益比、基尼系数
####熵 --- 度量随机变量的不确定性(纯度)
定义:假设随机变量X是一个取有限个数的离散随机变量,其概率分布为:
其概率 P(X=xi) = pi , ( i = 1,2, ... , n)
因此随机变量X的熵:
curDate = startDate
all_user_data = sc.parallelize([])
print"end Date:",endDate
whilecurDate <= endDate:
dateStr = curDate.strftime("%Y%m%d")
inputpath ="/user/map_rec/rec/orders_with_poiid_v3/"+ dateStr