Introduction to Data Science (still reading...)

Introduction to Data Science

Before we start

Libraries that we are using

There's bunch of library that data scientist will use during the research, here are some important libraries:

  • NumPy: provides multidimensional array with basic operation, linear algebra function
  • SciPy: provides numerical algorithms and domain-specific toolbox. Signal processing, optimization, statistics...
  • Matplotlib: data visualization
  • SCIKIT-Learn: machine learning library, classification, regression, clustering, dimensiality reduction, model selection, and preprocessing.
  • PANDAS: data analysis library. Importing CSV, text file, xls, SQL databases, and some other files. It can handling missing data and intelligent data alignment.

Python version and import toolboxes

This book use Anaconda distribution of python. It has integrated some of the most imoportant package that we will use. Detail about installing will not be covered in this note. And we are using Jupyter Notebook Environment. All the code was on python 2.7. The reason why we don't use python 3.6 is the poor 3rd party module support. Everything runs fine with 2.7.

Begin with python

Remember to import the toolboxes that we will use in the program

import pandas as pd
import numpy as np
import matplotlib.pylot as plt
#so that we can use short cut pd, np, plt

To run the code in just one cell, use Cell -> Run or just press Ctrl+Enter
To add a cell below, use Insert -> Insert Cell Below or just press Ctrl+B

Data input and output

The data structure of Pandas called DataFrame object. It's basiclly a tabular data structure, that's rows and columns.
Colums are called series, consist of a list of serval values.
e.g.

data = { 'year': [
2010, 2011, 2012,
2010, 2011, 2012,
2010, 2011, 2012
],
'team': [
'FCBarcelona', 'FCBarcelona',
'FCBarcelona', 'RMadrid',
'RMadrid', 'RMadrid',
'ValenciaCF', 'ValenciaCF',
'ValenciaCF'
],
'wins':     [30, 28, 32, 29, 32, 26, 21, 17, 19],
'draws':    [6, 7, 4, 5, 4, 7, 8, 10, 8],
'losses':   [2, 3, 2, 4, 2, 5, 9, 11, 11]
}

football = pd.DataFrame(data, columns = [
'year', 'team', 'wins', 'draws', 'losses'
]
)
index year team wins draws losses
0 2010 FCBarcelona 30 6 2
1 2011 FCBarcelona 28 7 3
2 2012 FCBarcelona 32 4 2
3 2010 RMadrid 29 5 4
4 2011 RMadrid 32 4 2
5 2012 RMadrid 26 7 5
6 2010 ValenciaCF 21 8 9
7 2011 ValenciaCF 17 10 11
8 2012 ValenciaCF 19 8 11

this is what the result looks like

index year team wins draws losses
0 2010 FCBarcelona 30 6 2
1 2011 FCBarcelona 28 7 3
2 2012 FCBarcelona 32 4 2
3 2010 RMadrid 29 5 4
4 2011 RMadrid 32 4 2
5 2012 RMadrid 26 7 5
6 2010 ValenciaCF 21 8 9
7 2011 ValenciaCF 17 10 11
8 2012 ValenciaCF 19 8 11

We can not only use dictionary to input data, but also a CSV file.
e.g.

edu = pd.read_csv('files/Data.csv',
na_values = ':',
usecols = ["A","B","C"])
edu

na_values set the character that represents NaN
usecols set the name of the columns.
You can also use Pandas to read some other files, by using read_excel(), read_hdf(), read_table(), read_clipboard()
by calling edu.head() edu.tail() we can see the first 5 rows and the last 5 rows of the Data in edu.
If we want to know the name of the column or the index or the value, call attribute columns, index, value.
If you want a quick view how this statistic is like. call edu.describe()
It give us the count, mean, standard diviation, min, max, 25% 50% 75% quantile of each column or series

Selecting Data

Select or filter some of the data normally use square bracket [ ].

# select just the column name "Value"
edu[`Value`]

#select some rows
edu[10: 14]

# select some rows of some suluns
edu.i[90: 94, ['TIME', 'GEO']]

We can also fiter out some of the data that we don't want. Criteria for filtering can use all the Boolean operator.

# we want the Value larger than 6.5, and just last 5 row of the data
edu[edu['Value'] > 6.5].tail()

to represent NaN using isnull()

edu[edu["Value"].isnull]

Manipulating Data

Aggregation function

We can call the aggregation functions to get some information about the data that we want to know about.
Here're some commenly used functions:count(), sum(), mean(), median(), max(), min(), std(), var(), prod() -> product of values
They have a common parameter axis.
axis = 0 means function apply to the rows for each column.
axis = 1 means function apply to the columns for each row. (Normally this don't make sence)
Note that the max function of Pandas and of Python are different.
In Pandas, NaN are exclude.

max function of python actually regard NaN as maximum

print "Pandas max function:" edu['Value'].max()  #return 8.81
print "Python max function:" max(edu['Value'])  #return NaN

we can also apply functions in NumPy by append. apply(np.functionname)
e.g.

s = edu["Value"].apply(np.sqrt)
s.head()

we can also apply a lambda function

s = edu["Value"].apply(lambda d: d**2)

Lambda function is a easy and better way to define a simple function that with no name and will only be use one time.

def f(x):
return x**2
print f(8) # result: 64

#is equal to
g = lambda x: x**2
print g(8) # result: 64

Insert and remove

We can simply use assign operator over a DataFrame.

edu['ValueNorm'] = edu['Value']/edu['Value'].max()

Some of the data should be removed. Parameter axis shows whether we are removing the row or the column. Parameter inplace shows whether we are changing the duplicate or the original. If you don't want the old data , set True

edu.drop('ValueNorm', axis = 1, inplace = True)

Inssert new row at bottom using append.
remember to set ignore_index to True, otherwise it will be set to index 0, which will lead to an error if index 0 already exist.

edu = edu.append({"TIME": 2000, "Value": 5.00, "GEO": 'a'}, ignore_index = True)

deleting the last row would be easy if you write like this.

edu.drop(max(edu.index), axis = 0, inplace = True)

To delete NaN we can use isnull() function and set axis to 0 to delate the whole row.

eduDrop = edu.drop(edu['Value'].isnull(), axis = 0)

or we can use specific dropna() function

eduDrop = edu.dropna(how = 'any', subset = ['Value'])

If we want to set a default value to all the NaN we can use fillna()

eduFilled = edu.fillna(value = {"Value": 0})

Sorting

Data that sorted by column are commenly seen. We can use sort function.

edu.sort_values(by = 'Value', ascending = False,
inplace = True)

note that if the inplace was True, there will be no new DataFrame return.
Sorted data have their oringinal index, so we can just use sort_index to return the original order

edu.sort_index(axis = 0, ascending = True, inplace = True)

Grouping Data

We want to group data that meet some commen criteria. Pandas has the groupby function to do this.

# shows the mean of the value of each country
group = edu[["GEO", "Value"]].groupby("GEO").mean()
- Value
GEO
Austria 5.618333
Belgium 6.189091
Bulgaria 4.093333
Cyprus 7.023333
Czech Republic 4.16833

result:

- Value
GEO
Austria 5.618333
Belgium 6.189091
Bulgaria 4.093333
Cyprus 7.023333
Czech Republic 4.16833

Rearranging Data

Sometimes we need to make the index meaningful by rearranging data. The result is normally a new spreadsheet containing the data that we want and well arranged.

filtered_data = edu[edu["TIME"] > 2005]
pivedu = pd.pivot_table(filtered_data, values = 'Value',
index = ['GEO'],
columns = ['Timer'])
pivedu.head()
TIME 2006 2007 2008 2009 2010 2011
GEO
Austria ... ... ... ... .. ..
Belgium .... ... ... ... ... ...
Bulgaria ... ... .. ... ... ..
Cyprus ... .. .. .. ... ..
Czech Republic ... .... ... .... .. ...

result:

TIME 2006 2007 2008 2009 2010 2011
GEO
Austria ... ... ... ... .. ..
Belgium .... ... ... ... ... ...
Bulgaria ... ... .. ... ... ..
Cyprus ... .. .. .. ... ..
Czech Republic ... .... ... .... .. ...

and we can directly select specific row and column by lable of the new spreadsheet that we just made.

pivedu.ix[['Spain','Portugal'], [2006, 2011]]

Ranking

Ranking number is really useful. Forexample, we want to know the rank of each country by year.
in the example, we first rename the Germany that with a long long description. Then we drop all the ``NaN`. Finally we do rank.

pivedu = pivedu.rename(index = {'Germany(until 1990 former territory of the FRG)': Germany})
pivedu = pivedu.dropna()
pivedu.rank(ascending = False, method = 'first').head()

if we want to see the rank from top to bottom, we can first rank then sort.

totalSum = pivedu.sum(axis = 1)
totalSum.rank(ascending = False, method = 'dense')
.sort_values().head()
GEO -
Denmark 1
Cyprus 2
Finland 3
Malta 4
Belgium 5

result:

GEO -
Denmark 1
Cyprus 2
Finland 3
Malta 4
Belgium 5

Plotting

By using plot function, we can plot the graphic.

totalSum = pivedu.sum(axis = 1)
.sort_values(ascending = False)
totalSum.plot(kind = 'bar', style = 'b', alpha = 0.4, title = "Total Values for Country")

kind define which kind of graphic are we using.
Style refer to the style properties, in bar gragh means color of the bar.
alpha refer to percentage transparency of the color
if we want to add a legend, a description of the graph, use legend function.

myplot.legend(loc = 'center left', # set reletive location of the legend with respect to the plot
bbox_to_anchor = (1, .5)) # set a absolute position with respect to the plot

Basic of statistics in Python

First we introduce two concept. Population and sample. Population is a collection ob object, a sample is a part of the population that is observed.
Before the statistics we should prepare the data that suit our need.

Data Prepareation

Data prepareation include following operation:

  1. Obtaining the data: e.g. read a file, scraping from internet
  2. Parsing the data: to unwrap the datapackage
  3. Cleaning the data: deal with incomplete records
  4. Building data structure: e.g. a database
file = open('files/ch03/adult.data','r')
data = [] # set a empty list
for line in file:
data1 = line.split(', ')
if len(data1) == 15:
data.append([.., .., .., .., .., .., .., .., ..., ..., .., .., .., .., ..])

df = pd.DataFrame(data)
df.columns = ['..', ... ,'..']
# give them a data lable in column property

after fit it in DataFrame, we can command shape to know the exact number of the data samples, in rows and column.

df.shape
# return (32561, 15)

we can also group the dataFrame like we did earlier.

ml50 = df[(df.sex == 'Male') & (df.income == '>50K\n')]

Summerize some statistic value

  • Mean
    $$\mu=\frac{1}{n}\sum_{i=1}^nx_i$$
ml['age'].mean()
  • Sample Variance
    $$\sigma2=\frac{1}{n}\sum_{i}(x_i-\mu)2$$
    σ is the standard deviation
ml['age'].var()
ml['age'].std()
  • Sample Median: ml['age'].median()
  • Quantil: a fraction p of the data value is less or equal to $x_p$
  • Data distribution: commenly use histogram to show the distribution
maAge = ml['age']
mlAge.hist(normed = 0, histtype = 'stepfilled', bins = 20)
# bins: width of the bin
# normed: 1 to show the value of the density function at the bin.(0-1)
#         0 to show the number of the sample in the bin(normally a real number)

to make two histogram compareable, we can overlapping each other in a same graphic

import seaborn as sns
fmAge.hist(normed = 0, histtype = 'stepfilled', alpha = .5, bins = 20)
mlAge.hist(normed = 0, histtype = 'stepfilled', alpha = 0.5,
color = sns.desaturate("indianred", .75),
bin = 10)

if we want to show the cumulative histogram, add a parameter with cumulative = True

Outlier Treatment

First define the standard of the outlier. Normally the samples that far from the median and whose value exceed the mean by 2 or 3 standard deviation.

df2 = df.drop(df.index[
(df.income == '>50K\n') &
(df['age'] > df['age'].median() + 35) &
(df['age'] > df['age'].median() - 15)
])

you can check the removal of your data by overlapping the graphic before and after the removing.
histogram() returns

  • hist: value of the bar(according to normed), Array
  • bin.edge: edge of the bin, array of float

Measureing Asymmetry: Skewness and Pearson's Median Skewness Coefficient

Skewness: how asymmetric the distribution is.
Skewness of a set of n data samples, $x_i$:
$$ g_1 = \frac{1}{n}\frac{\sum_{i}(x_i - \mu3)}{\sigma3} $$
nagative skewness means the distribution "skews left". A normal distribution have a 0 skewness.
we can define a function to calculate the skewness of a data set

def skewness()
res = 0
m = x.mean()
s = x.std()
for i in x:
res += (i-m) * (i-m) * (i-m)
res /= (len(x) * s * s * s)
return res

Normal Distribution

Also called the Gaussian distribution,
$$ P(x) = \frac{1}{\sqrt{2\pi\sigma2}}e{- \frac{({x-\mu})2}{2\sigma2}} $$
really important in the statistics

Kernel Density

We don't really get every distribution's parameter of every data set.
Instead, we use kernel density estimation. We use Gaussian kernel to generate the density around the data. The sum of these Gaussian distribution can give us a continuous function. So that we can have a continuous representation of the original distribution

x1 = np.random.normal(-1, 0.5, 15)
# random.normal have 3 parameter
# loc: center of the normal distribution
# scale: standard deviation
# size: output shape, canbe a tuple of int or int
x2 = np.random.normal(6, 1, 10)
y = np.r_[x1, x2]
# r_[] merging all the input to a new Array.
x = np.linspace(min(y), max(y), 100)
# linspace return evenly distribute number, array

s = 0.4 # smoothing parameter

#calculate the kernels
kernels = np.transpose([norm.pdf(x, yi, s) for yi in y])
#norm.pdf(x, loc, scale)
#pdf stands for probability density function
plt.plot(x, kernels, 'k:')
plt.plot(x, kernels.sum(1), 'r')
plt.plot(y, np.zeros(len(y)), 'bo', ms = 10)

we can also call the build in function of SciPy

from scipy.stats import kde
density = kde.gaussian_kde(y)
# kde.gaussian_kde Representation of a kernel-density estimate using Gaussian kernels.
xgrid = np.linspace(x.min(), x.max(), 200)
plt.hist(y, bins = 28, normed = True)
plt.plot(xgrid, density(xgrid), 'r-')

Estimation

Mean

we guess the mean $\mu$ of the distribution using the sample mean. This process called estimation and the statistic called estimator.
The sample mean minimize the following mean squared error
$$ MSE = \frac{1}{n}\sum{(\bar{x}-\mu)^2} $$

NTs = 200
mu = 0.0
var = 1.0
err = 0.0
NPs = 1000
for i in range(NTs)
x = np.random.normal(mu, var, NPs)
err += (x.mean()-mu)**2
print 'MSE: ', err/NTests
Variance

If the sample number is large enough, we can use sample variance estimator
$$ \bar{\sigma}^2 = \frac{1}{n}\sum(x_i-\bar{x})^2 $$
if we don't have enough sample, this estimator won't works well, it will be biased. So better estimator is
$$ \bar{\sigma}^2 = \frac{1}{n-1}\sum(x_i-\bar{x})^2 $$

Standard Score

$$ z_i = \frac{(x_i-\mu)}{\sigma} $$
It's a normalize number set. After the normalizing, it has a distribution, mean of 0 and variace of 1. It inherits the shape of the orincginal dataset.

Covariance, Rank Correlation

Covarience only will be mention when two variable share the same tendency.
We center the data with respect to their mean, and look for the difference.
$$ Cov(X, Y) = \frac{1}{n}\sum_{i=1}^ndx_idy_i $$
Pearson's correlation is after we normalized the data and multiply them together.
$$ \rho_i = \frac{x_i-\mu_X}{\rho_X}\frac{y_i-\mu_Y}{\rho_Y} $$
$$ \rho = \frac{1}{n}\sum_{i=1}^n{\rho_i}=\frac{Cov(X,Y)}{\sigma_X\sigma_Y} $$
Pearson's correlation is always between -1 and +1. The magnitude shows the correlation. The sign shows positive or negative relation
$\rho = 0$ dose not mean that they have no correlation, it only shows the correlation of first order. And note that it dosen't work well in the presence of outliers.
Spearmans Rank Correlation is a robust solution especially when the data contain outliers. Use rank to sort sample data.
$$ r_s = 1-\frac{6\sum_i{d_i2}}{n\cdot(n2-1)} $$
with $$ d_i = rg(x_i)-rg(y_i)$$
is the difference of the rank of $x$ and $y$

Statistical Inference

Theres two way to address the problem of statistical inference.

  • frequentist approach
  • bayesian approach:

The frequentist approach

If we accept this asumption, that means we concern about the population parameters from analysis of sample.
There's three things we mostly concern about:

  • point estimation: a particular value that approximates some parameter of interest. e.g. mean or variance
  • confidence intervals, or set estimattes: range of the value that best represent the parameter of insterest
  • hypothesis: accept or reject

Estimation

Remember estimation is not truth. More data the estimation will be better.

Point estimation

Sampling

data = pd.read_csv("files/aaa.csv")
data['Date'] = data[u'Dia de mes'].apply(lambda x: str(x)) +
'-' +
data[u'Mes de any'].apply(lambda x: str(x))
data['Date'] = pd.to_datetime(data['Date'])
accidents = data.groupby(['Date']).size()
print accidents.mean()

How we estimate the sampling distribution of the sample mean

  1. Draw $s$ (a large number) independent samples $ { x^1,..., x^s} $ from the population where each element $x^j$ is composed of ${x_i^j}_{i=1,...,n}$.
  2. Evaluate the sample mean $\hat{\mu}j=\frac{1}{n}\sum_{i=1}nx_i^j$ of each sample
  3. Estimate the sampling distribution of $\hat{\mu}$ by the empirical distribution of the sample replications.
# population
df = accidents.to_frame() # convert accidents to DataFrame
nTest = 10000
elements = 200

# mean array of samples
means = [0] * nTest # build a sero array with nTest 0

#sample generation
for i in range(nTest):
rows = np.random.choice(df.index.values, elements)
sampled_df = df.ix[rows]
means[i] = sampled_df.mean()

Traditional Approach

But actually we can't always get access to the population, so we have to solve the problem by using some theoretical results from traditonal statistics.
Standard error
$$ SE = \frac{\sigma_x}{\sqrt{n}} $$
This formula use the standard deviation $\sigma_x$, which is not known. If the population distribution is not skewed and estimation is sufficiently good if $n>30$. THis allow us to use the standard error of the sample value if we can't get access to the population.

rows = np.random.choice(df.index.values, 200)
sampledDf = df.ix[rows]
estSigmaMean = sampledDf.std()/math.sqrt(200)

Computationally intensive approach

When the full dataset has super large population, a alternative to the traditional.
In the bootstrap, we draw n observation with replacement from the oringinal data to create a bootstrap sample or resample. Then, we can calculate the mean for this resample. By repeating this process more times, we can get a good approximation of the mean sampling distribution.

def meanBootStrap (X, numberb):
x = [0] * numberb
for i in range (numberb):
sample = [X[j]
for j in np.random.randint(len(X), size = len(X)
]
x[i] = np.mean(sample)
return x
m = meanBootstrap(accidents, 10000)
np.mean(m)

Confidence Interval

Confidence Level 90% 95% 99% 99.9%
z Value 1.65 1.96 2.58 3.291

A point estimate $\Theta$, is hardly to be perfect. So we want to provide a plausible range of value of the parameter. This range called confidence interval.
We care about the interval and the plausibility .
If the point estimate follow the normal model with standard error SE, then a confidence interval for the population parameter is $$\Theta \pm z \times SE$$

Confidence Level 90% 95% 99% 99.9%
z Value 1.65 1.96 2.58 3.291

Use bootstrapping to calculate the 95% confidence interval

m= meanBootstrap(accidents, 10000)
sampleMean = np.mean(m)
sampleSe = np.std(m)

# Confidence Interval
ci = [np.percentile(m, 2.5), np.precentile(m, 97.5)]

95% confidence means:

In 95% of the cases, when I compute the 95% confidence interval from this sample, the true mean of the population will fall within the interval define by the bounds: $\pm1.96\times SE$

Hypothesis Testing

Two kinds of hypotheses:

  • $H_0$: Two distribution is the same. Null hypothesis
  • $H_1$: Two distribution is different. Alternative hypothesis
n = len(counts2013)
mean = counts2013.mean()
s = counts2013.std()
ci = [ mean - s*1.96/np.sqrt(n), mean + s*1.96/np.sqrt()]

Interpretation of CI Test
The result canbe reject, or refuse to reject.

if we use a 95% confidence interval to test a problem where the null hypothesis is true, we will make an error whenever the point estimate is at least $1.96\sigma$ away from the population parameter. This hapen about 5% of the time.

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 212,332评论 6 493
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,508评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 157,812评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,607评论 1 284
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,728评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,919评论 1 290
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,071评论 3 410
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,802评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,256评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,576评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,712评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,389评论 4 332
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,032评论 3 316
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,798评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,026评论 1 266
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,473评论 2 360
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,606评论 2 350

推荐阅读更多精彩内容

  • 曾经去在广州学西点有美术课而无聊画的一幅,临摹的图画起名:破壳
    星影520阅读 240评论 0 2
  • 有一次和他吵架了 我俩都很高冷谁也不让谁 然后他就特别生气 一脸严肃 喊我的大名 盯着我的眼睛 我有点害怕 接着他...
    徐青禾阅读 256评论 0 0
  • 阿晨是个天生的歌手,他唱起歌来说不上婉转动听,但是沧桑有力。所以他常常说以后想做一名民谣歌手。据阿晨了解的几个民谣...
    嘿哥儿阅读 766评论 0 0
  • 三月随想阅读 3,875评论 1 1