聚类—Finding Related Posts

Preprocessing—similarity measured as similar number of common words 步骤

  1. Firstly, tokenizing the text.
  2. Throwing away words that occur way too often to be of any help in
    detecting relevant posts.
  3. Throwing away words that occur so seldom that there is only a small
    chance that they occur in future posts.
  4. Counting the remaining words.
  5. Calculating TF-IDF values from the counts, considering the whole.
    text corpus.

代码测试

  • 数据介绍
Post filenames Post content
01.txt This is a toy post about machine learning. Actually, it containsnot much interesting stuff.
02.txt Imaging databases can get huge.
03.txt Most imaging databases safe images permanently.
04.txt Imaging databases store images.
05.txt Imaging databases store images. Imaging databases store images. Imaging databases store images.
  • Python 代码
import os
import sys

import scipy as sp

from sklearn.feature_extraction.text import CountVectorizer

from utils import DATA_DIR

TOY_DIR = os.path.join(DATA_DIR, "toy")
posts = [open(os.path.join(TOY_DIR, f)).read() for f in os.listdir(TOY_DIR)]    // 打开数据文档


new_post = "imaging databases"

import nltk.stem
english_stemmer = nltk.stem.SnowballStemmer('english')

// Extending the vectorizer with NLTK's stemmer
class StemmedCountVectorizer(CountVectorizer):
    def build_analyzer(self):
        analyzer = super(StemmedCountVectorizer, self).build_analyzer()
        return lambda doc: (english_stemmer.stem(w) for w in analyzer(doc))

# vectorizer = CountVectorizer(min_df=1, stop_words='english',
# preprocessor=stemmer)
vectorizer = StemmedCountVectorizer(min_df=1, stop_words='english')

from sklearn.feature_extraction.text import TfidfVectorizer
// Extending the vectorizer with NLTK's stemmer
class StemmedTfidfVectorizer(TfidfVectorizer):

    def build_analyzer(self):
        analyzer = super(StemmedTfidfVectorizer, self).build_analyzer()
        return lambda doc: (english_stemmer.stem(w) for w in analyzer(doc))

vectorizer = StemmedTfidfVectorizer(
    min_df=1, stop_words='english', decode_error='ignore')

X_train = vectorizer.fit_transform(posts)

num_samples, num_features = X_train.shape
print("#samples: %d, #features: %d" % (num_samples, num_features))

#samples: 5, #features: 17

new_post_vec = vectorizer.transform([new_post])

// Return the counter vectors
print(new_post_vec, type(new_post_vec))

(0, 5) 0.7071067811865476
(0, 4) 0.7071067811865476 <class 'scipy.sparse.csr.csr_matrix'>

// Return the full ndarray()
print(new_post_vec.toarray())

[[0. 0. 0. 0. 0.70710678 0.70710678 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]]

// The following words that have been tokenized
print(vectorizer.get_feature_names())
``` python
> ['actual', 'capabl', 'contain', 'data', 'databas', 'imag', 'interest', 'learn', 'machin', 'perman', 'post', 'provid', 'save', 'storag', 'store', 'stuff', 'toy']
//计算向量的相似性
def dist_raw(v1, v2):
    delta = v1 - v2
     //计算向量的范数
    return sp.linalg.norm(delta.toarray())


def dist_norm(v1, v2):
    v1_normalized = v1 / sp.linalg.norm(v1.toarray())  //标准化处理
    v2_normalized = v2 / sp.linalg.norm(v2.toarray())  //标准化处理

    delta = v1_normalized - v2_normalized

    return sp.linalg.norm(delta.toarray())

dist = dist_norm

best_dist = sys.maxsize
best_i = None

for i in range(0, num_samples):
    post = posts[i]
    if post == new_post:
        continue
    post_vec = X_train.getrow(i)
    d = dist(post_vec, new_post_vec)

    print("=== Post %i with dist=%.2f: %s" % (i, d, post)) //打印各text与new_post的文档比较结果

=== Post 0 with dist=1.41: This is a toy post about machine learning. Actually, it contains not much interesting stuff.
=== Post 1 with dist=1.08: Imaging databases provide storage capabilities.
=== Post 2 with dist=0.86: Most imaging databases save images permanently.

=== Post 3 with dist=0.92: Imaging databases store data.
=== Post 4 with dist=0.92: Imaging databases store data. Imaging databases store data. Imaging databases store data.=== Post 0 with dist=1.41: This is a toy post about machine learning. Actually, it contains not much interesting stuff.
=== Post 1 with dist=1.08: Imaging databases provide storage capabilities.
=== Post 2 with dist=0.86: Most imaging databases save images permanently.
=== Post 3 with dist=0.92: Imaging databases store data.
=== Post 4 with dist=0.92: Imaging databases store data. Imaging databases store data. Imaging databases store data.

    if d < best_dist:
        best_dist = d
        best_i = i
//找出与new_post最相近的text,以及对应的差异度
print("Best post is %i with dist=%.2f" % (best_i, best_dist))

Best post is 2 with dist=0.86

小结

通过建立这种简单而又高效的模型,可以将文本转换为简洁的“特征向量”,然后再判断、比较文本差异,但是这个模型也些个不足,具体直接引用原文说明:

  • It does not cover word relations. With the previous vectorization
    approach, the text "Car hits wall" and "Wall hits car" will both have
    the same feature vector.
  • It does not capture negations correctly.For instance, the text "I will
    eat ice cream" and "I will not eat ice cream" will look very similar by
    means of their feature vectors, although they contain quite the
    opposite meaning. This problem, however, can be easily changed by
    not only counting individual words, also called unigrams, but also
    considering bigrams (pairs of words) or trigrams (three words in a
    row).
  • It totally fails with misspelled words. Although it is clear to the
    readers that "database" and "databas" convey the same meaning,
    our approach will treat them as totally different words.

聚合 Clustering

来自王天一老师的《人工智能基础课:物以类聚,人以群分:聚类分析》
聚类分析是一种无监督学习方法,其目标是学习没有分类标记的训练样本,以揭示数据的内在性质和规律。具体来说,聚类分析要将数据集划分为若干个互不相交的子集,每个子集中的元素在某种度量之下都与本子集内的元素具有更高的相似度。
用这种方法划分出的子集就是“聚类”(或称为“簇”),每个聚类都代表了一个潜在的类别。分类和聚类的区别也正在于此:分类是先确定类别再划分数据;聚类则是先划分数据再确定类别。
聚类分析本身并不是具体的算法,而是要解决的一般任务,从名称就可以看出这项任务的两个核心问题:一是如何判定哪些样本属于同一“类”,二是怎么让同一类的样本“聚”在一起。
解决哪些样本属于同一“类”的问题需要对相似性进行度量。无论采用何种划定标准,聚类分析的原则都是让类内样本之间的差别尽可能小,而类间样本之间的差别尽可能大。度量相似性最简单的方法就是引入距离测度,聚类分析正是通过计算样本之间的距离来判定它们是否属于同一个“类”。根据线性代数的知识,如果每个样本都具有 N 个特征,那就可以将它们视为 N维空间中的点,进而计算不同点之间的距离。
作为数学概念的距离需要满足非负性(不小于 0)、同一性(任意点与其自身之间的距离为 0)、对称性(交换点的顺序不改变距离)、直递性(满足三角不等式)等性质。在聚类分析中常用的距离是“闵可夫斯基距离”,其定义为

公式.jpg

式中的 p是个常数。当 p=2时,闵可夫斯基距离就变成了欧式距离,也就是通常意义上的长度。

K-mean 算法

来自王天一老师的《人工智能基础课:物以类聚,人以群分:聚类分析》
K 均值算法是典型的原型聚类算法,它将聚类问题转化为最优化问题。具体做法是先找到 k个聚类中心,并将所有样本分配给最近的聚类中心,分配的原则是让所有样本到其聚类中心的距离平方和最小。显然,距离平方和越小意味着每个聚类内样本的相似度越高。但是这个优化问题没有办法精确求解,因而只能搜索近似解。kk 均值算法就是利用贪心策略,通过迭代优化来近似求解最小平方和的算法。
K 均值算法的计算过程非常直观。首先从数据集中随机选取 k个样本作为 k个聚类各自的中心,接下来对其余样本分别计算它们到这 k个中心的距离,并将样本划分到离它最近的中心所对应的聚类中。当所有样本的聚类归属都确定后,再计算每个聚类中所有样本的算术平均数,作为聚类新的中心,并将所有样本按照 k个新的中心重新聚类。这样,“取平均 - 重新计算中心 - 重新聚类”的过程将不断迭代,直到聚类结果不再变化为止。
大多数K均值类型的算法需要预先指定聚类的数目 k,这是算法为人诟病的一个主要因素。此外,由于算法优化的对象是每个聚类的中心,因而 K均值算法倾向于划分出相似大小的聚类,这会降低聚类边界的精确性。

K-mean例程

"""
====================================
Demonstration of k-means assumptions
====================================

This example is meant to illustrate situations where k-means will produce
unintuitive and possibly unexpected clusters. In the first three plots, the
input data does not conform to some implicit assumption that k-means makes and
undesirable clusters are produced as a result. In the last plot, k-means
returns intuitive clusters despite unevenly sized blobs.
"""
print(__doc__)

# Author: Phil Roth <mr.phil.roth@gmail.com>
# License: BSD 3 clause

import numpy as np
import matplotlib.pyplot as plt

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

plt.figure(figsize=(12, 12))

n_samples = 1500
random_state = 170
X, y = make_blobs(n_samples=n_samples, random_state=random_state)

# Incorrect number of clusters
y_pred = KMeans(n_clusters=2, random_state=random_state).fit_predict(X)

plt.subplot(221)
plt.scatter(X[:, 0], X[:, 1], c=y_pred)
plt.title("Incorrect Number of Blobs")

# Anisotropicly distributed data
transformation = [[0.60834549, -0.63667341], [-0.40887718, 0.85253229]]
X_aniso = np.dot(X, transformation)
y_pred = KMeans(n_clusters=3, random_state=random_state).fit_predict(X_aniso)

plt.subplot(222)
plt.scatter(X_aniso[:, 0], X_aniso[:, 1], c=y_pred)
plt.title("Anisotropicly Distributed Blobs")

# Different variance
X_varied, y_varied = make_blobs(n_samples=n_samples,
                                cluster_std=[1.0, 2.5, 0.5],
                                random_state=random_state)
y_pred = KMeans(n_clusters=3, random_state=random_state).fit_predict(X_varied)

plt.subplot(223)
plt.scatter(X_varied[:, 0], X_varied[:, 1], c=y_pred)
plt.title("Unequal Variance")

# Unevenly sized blobs
X_filtered = np.vstack((X[y == 0][:500], X[y == 1][:100], X[y == 2][:10]))
y_pred = KMeans(n_clusters=3,
                random_state=random_state).fit_predict(X_filtered)

plt.subplot(224)
plt.scatter(X_filtered[:, 0], X_filtered[:, 1], c=y_pred)
plt.title("Unevenly Sized Blobs")

plt.show()
Demonstration of k-means assumptions.png

K-mean 算法应用

数据源

20newsgroups是将近20000篇新闻文档集合,内容涉及20领域,详细信息请参考20 Newsgroups,sklearn.datasets模块就含有20newsgroups数据集。

代码分析

import sklearn.datasets
import scipy as sp

new_post = \
    """Disk drive problems. Hi, I have a problem with my hard disk.
After 1 year it is working only sporadically now.
I tried to format it, but now it doesn't boot any more.
Any ideas? Thanks.
"""

print("""\
Dear reader of the 1st edition of 'Building Machine Learning Systems with Python'!
For the 2nd edition we introduced a couple of changes that will result into
results that differ from the results in the 1st edition.
E.g. we now fully rely on scikit's fetch_20newsgroups() instead of requiring
you to download the data manually from MLCOMP.
If you have any questions, please ask at http://www.twotoreal.com
""")

all_data = sklearn.datasets.fetch_20newsgroups(subset="all")
 //总文档数
print("Number of total posts: %i" % len(all_data.filenames))
# Number of total posts: 18846

groups = [
    'comp.graphics', 'comp.os.ms-windows.misc', 'comp.sys.ibm.pc.hardware',
    'comp.sys.mac.hardware', 'comp.windows.x', 'sci.space']
train_data = sklearn.datasets.fetch_20newsgroups(subset="train",
                                                 categories=groups)
 //Training文档数
print("Number of training posts in tech groups:", len(train_data.filenames))

Dear reader of the 1st edition of 'Building Machine Learning Systems with Python'!
For the 2nd edition we introduced a couple of changes that will result into
results that differ from the results in the 1st edition.
E.g. we now fully rely on scikit's fetch_20newsgroups() instead of requiring
you to download the data manually from MLCOMP.
If you have any questions, please ask at http://www.twotoreal.com
Number of total posts: 18846
Number of training posts in tech groups: 3529

//#执行Preprocessing—similarity measured as similar number of common words 步骤
labels = train_data.target
num_clusters = 50  # sp.unique(labels).shape[0]

import nltk.stem
english_stemmer = nltk.stem.SnowballStemmer('english')

from sklearn.feature_extraction.text import TfidfVectorizer


class StemmedTfidfVectorizer(TfidfVectorizer):

    def build_analyzer(self):
        analyzer = super(TfidfVectorizer, self).build_analyzer()
        return lambda doc: (english_stemmer.stem(w) for w in analyzer(doc))

vectorizer = StemmedTfidfVectorizer(min_df=10, max_df=0.5,
                                    stop_words='english', decode_error='ignore'
                                    )

vectorized = vectorizer.fit_transform(train_data.data)
num_samples, num_features = vectorized.shape
print("#samples: %d, #features: %d" % (num_samples, num_features))
// K-mean算法执行
from sklearn.cluster import KMeans
km = KMeans(n_clusters=num_clusters, n_init=1, verbose=1, random_state=3)
clustered = km.fit(vectorized)

print("km.labels_=%s" % km.labels_)
# km.labels_=[ 6 34 22 ...,  2 21 26]

print("km.labels_.shape=%s" % km.labels_.shape)

// K-mean聚类算法结果评估
from sklearn import metrics
print("Homogeneity: %0.3f" % metrics.homogeneity_score(labels, km.labels_))
# Homogeneity: 0.400
print("Completeness: %0.3f" % metrics.completeness_score(labels, km.labels_))
# Completeness: 0.206
print("V-measure: %0.3f" % metrics.v_measure_score(labels, km.labels_))
# V-measure: 0.272
print("Adjusted Rand Index: %0.3f" %
      metrics.adjusted_rand_score(labels, km.labels_))
# Adjusted Rand Index: 0.064
print("Adjusted Mutual Information: %0.3f" %
      metrics.adjusted_mutual_info_score(labels, km.labels_))
# Adjusted Mutual Information: 0.197
print(("Silhouette Coefficient: %0.3f" %
       metrics.silhouette_score(vectorized, labels, sample_size=1000)))


Initialization complete
Iteration 0, inertia 5686.053
Iteration 1, inertia 3164.888
Iteration 2, inertia 3132.208
Iteration 3, inertia 3111.713
Iteration 4, inertia 3098.584
Iteration 5, inertia 3092.191
Iteration 6, inertia 3087.277
Iteration 7, inertia 3084.100
Iteration 8, inertia 3082.800
Iteration 9, inertia 3082.234
Iteration 10, inertia 3081.949
Iteration 11, inertia 3081.843
Iteration 12, inertia 3081.791
Iteration 13, inertia 3081.752
Iteration 14, inertia 3081.660
Iteration 15, inertia 3081.617
Iteration 16, inertia 3081.589
Iteration 17, inertia 3081.571
Converged at iteration 17: center shift 0.000000e+00 within tolerance 2.069005e-08
km.labels_=[48 23 31 ... 6 2 22]
km.labels_.shape=3529

//找出同类中与new_post最相近的post
new_post_vec = vectorizer.transform([new_post])
new_post_label = km.predict(new_post_vec)[0]

similar_indices = (km.labels_ == new_post_label).nonzero()[0]

similar = []
for i in similar_indices:
    dist = sp.linalg.norm((new_post_vec - vectorized[i]).toarray())
    similar.append((dist, train_data.data[i]))

similar = sorted(similar)
print("Count similar: %i" % len(similar))   //打印同类中有多少相近post

show_at_1 = similar[0]
show_at_2 = similar[int(len(similar) / 10)]
show_at_3 = similar[int(len(similar) / 2)]

print("=== #1 ===")
print(show_at_1)
print()

print("=== #2 ===")
print(show_at_2)
print()

print("=== #3 ===")
print(show_at_3)

Count similar: 56
=== #1 ===
(1.0378441731334074, "From: Thomas Dachsel GERTHD@mvs.sas.com\nSubject: BOOT PROBLEM with IDE controller\nNntp-Posting-Host: sdcmvs.mvs.sas.com\nOrganization: SAS Institute Inc.\nLines: 25\n\nHi,\nI've got a Multi I/O card (IDE controller + serial/parallel\ninterface) and two floppy drives (5 1/4, 3 1/2) and a\nQuantum ProDrive 80AT connected to it.\nI was able to format the hard disk, but I could not boot from\nit. I can boot from drive A: (which disk drive does not matter)\nbut if I remove the disk from drive A and press the reset switch,\nthe LED of drive A: continues to glow, and the hard disk is\nnot accessed at all.\nI guess this must be a problem of either the Multi I/o card\nor floppy disk drive settings (jumper configuration?)\nDoes someone have any hint what could be the reason for it.\nPlease reply by email to GERTHD@MVS.SAS.COM\nThanks,\nThomas\n+-------------------------------------------------------------------+\n| Thomas Dachsel |\n| Internet: GERTHD@MVS.SAS.COM |\n| Fidonet: Thomas_Dachsel@camel.fido.de (2:247/40) |\n| Subnet: dachsel@rnivh.rni.sub.org (UUCP in Germany, now active) |\n| Phone: +49 6221 4150 (work), +49 6203 12274 (home) |\n| Fax: +49 6221 415101 |\n| Snail: SAS Institute GmbH, P.O.Box 105307, D-W-6900 Heidelberg |\n| Tagline: One bad sector can ruin a whole day... |\n+-------------------------------------------------------------------+\n")
=== #2 ===
(1.1503043264096682, 'From: rpao@mts.mivj.ca.us (Roger C. Pao)\nSubject: Re: Booting from B drive\nOrganization: MicroTech Software\nLines: 34\n\nglang@slee01.srl.ford.com (Gordon Lang) writes:\n\n>David Weisberger (djweisbe@unix.amherst.edu) wrote:\n>: I have a 5 1/4" drive as drive A. How can I make the system boot from\n>: my 3 1/2" B drive? (Optimally, the computer would be able to boot\n>: from either A or B, checking them in order for a bootable disk. But\n>: if I have to switch cables around and simply switch the drives so that\n>: it can't boot 5 1/4" disks, that's OK. Also, boot_b won't do the trick\n>: for me.)\n>: \n>: Thanks,\n>: Davebo\n>We had the same issue plague us for months on our Gateway. I finally\n>got tired of it so I permanently interchanged the drives. The only\n>reason I didn't do it in the first place was because I had several\n>bootable 5-1/4's and some 5-1/4 based install disks which expected\n>the A drive. I order all new software (and upgrades) to be 3-1/2 and\n>the number of "stupid" install programs that can't handle an alternate\n>drive are declining with time - the ones I had are now upgraded. And\n>as for the bootable 5-1/4's I just cut 3-1/2 replacements.\n\n>If switching the drives is not an option, you might be able to wire up\n>a drive switch to your computer chasis. I haven't tried it but I think\n>it would work as long as it is wired carefully.\n\nI did this. I use a relay (Radio Shack 4PDT) instead of a huge\nswitch. This way, if the relay breaks, my drives will still work.\n\nIt works fine, but you may still need to change the CMOS before the\ndrive switch will work correctly for some programs.\n\nrp93\n-- \nRoger C. Pao {gordius,bagdad}!mts!rpao, rpao@mts.mivj.ca.us\n')
=== #3 ===
(1.2793959084781283, 'From: vg@volkmar.Stollmann.DE (Volkmar Grote)\nSubject: IBM PS/1 vs TEAC FD\nDistribution: world\nOrganization: Me? Organized?\nLines: 21\n\nHello,\n\nI already tried our national news group without success.\n\nI tried to replace a friend's original IBM floppy disk in his PS/1-PC\nwith a normal TEAC drive.\nI already identified the power supply on pins 3 (5V) and 6 (12V), shorted\npin 6 (5.25"/3.5" switch) and inserted pullup resistors (2K2) on pins\n8, 26, 28, 30, and 34.\nThe computer doesn't complain about a missing FD, but the FD's light\nstays on all the time. The drive spins up o.k. when I insert a disk,\nbut I can't access it.\nThe TEAC works fine in a normal PC.\n\nAre there any points I missed?\n\nThank you.\n\tVolkmar\n\n---\nVolkmar.Grote@Stollmann.DE\n')

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 212,884评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,755评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 158,369评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,799评论 1 285
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,910评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,096评论 1 291
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,159评论 3 411
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,917评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,360评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,673评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,814评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,509评论 4 334
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,156评论 3 317
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,882评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,123评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,641评论 2 362
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,728评论 2 351

推荐阅读更多精彩内容

  • 因为这个MV大致可以划分为三个部分,就是女孩小时候,女孩遇到男孩,女孩要分别,所以我想在MV中加入【小提琴和猫】的...
    我想念那个不太热的夏天阅读 195评论 0 0
  • 对我们大部分普通人来说,活着的一大意义,或者说一大乐趣,就是“装逼”。说得好听点就是向别人展示自己的魅力和价值。那...
    微游榜阅读 1,040评论 0 0
  • 陈会阅读 156评论 0 1
  • 用“...”做标题早已忘记了是何时,QQ空间的日志中用的最多,博客的几篇文章亦然。我不知道是不是我独创,但...
    边缘酋长_a000阅读 216评论 0 0
  • 你 是我不可言说的秘密 放在心底轻轻地压着 突然地 嘴里像是被你塞进一颗蜜糖 甜掉了任何话语 18.3.26
    安文歌阅读 210评论 0 0