前段时间的云计算课程中,老师要求使用Spark Streaming完成一个小项目。考虑到时间紧迫,我们小组直接使用了Kaggle上的数据集,立意也简化为在地图上打点,以分辨哪些地区更容易发生犯罪案件,为警方的警力部署提供一点支持。为了更好地体现流处理的概念,也考虑到与ajax轮询和long poll相比,websocket的优点,以及之前做过的日志推送(比如在网页上打印tomcat日志)的经验,我决定使用websocket完成数据从后端向前端的传输。
本文旨在提供一个服务端不断向客户端推送新消息的简易版本,线程安全和缓存等暂不考虑在内。redis客户端为redis-py,python版本为3.7。
整体结构
Spark Streaming将处理后的数据发布到Redis中。由一个读线程订阅Redis中的这组数据,并在接收到消息后立即post到tornado的RequestHandler。Handler负责将数据发给Register。Register中保存了需要被发送的数据以及维护着一组保持连接的客户端,当无客户端连接时,Register将数据缓存起来,当存在客户端连接时,再将数据推送出去。
目录结构
- demo
- static
- template
- index.html
- server.py
server.py
import json
import os
import time
from threading import Thread
import redis
import tornado
from tornado import ioloop, web, websocket, httpclient
from tornado.web import RequestHandler
# Register().callbacks数组存储Websocket的回调函数。当收到新消息时,Register()首先将
# 消息缓存起来,如果有客户端连接,即callbacks数组不为空,则触发所有的回调函数,将消息推送
# 给客户端。
class Register(object):
def login(self, callback):
self.callbacks.append(callback)
self.notify_callbacks()
def logout(self, callback):
self.callbacks.remove(callback)
def trigger(self, message):
self.messages_cache.append(message)
self.notify_callbacks()
def notify_callbacks(self):
print('notify callbacks')
if len(self.callbacks) is not 0:
for callback in self.callbacks:
callback(json.dumps(self.messages_cache))
self.messages_cache.clear()
else:
print('There is no client connected,save message in cache only')
def __init__(self):
self.callbacks = []
self.messages_cache = []
# 主页路由
class IndexHandler(RequestHandler):
def get(self):
self.render('index.html')
# WebsocketHandller().连接建立时,将callback注册到register,
# 连接关闭时清理自己的callback。
class MyWebSocketHandler(websocket.WebSocketHandler):
def open(self):
print(str(self) + "connection open")
self.application.register.login(self.callback)
def on_message(self, message):
print(message)
def on_close(self):
self.application.register.logout(self.callback)
print(str(self) + "connection closed")
def callback(self, message):
self.write_message(message)
# 接收消息,并将消息发给register
class NewMessageHandler(RequestHandler):
def post(self, *args, **kwargs):
data = json.loads(self.request.body)
print(data)
self.application.register.trigger(data)
# 配置tornado web应用
class Application(tornado.web.Application):
def __init__(self):
self.register = Register()
handlers = [
(r"/", IndexHandler),
(r"/pigeon", MyWebSocketHandler),
(r"/message", NewMessageHandler)
]
settings = dict(
template_path=os.path.join(
os.path.dirname(__file__), "templates"),
static_path=os.path.join(
os.path.dirname(__file__), "static"),
debug=False
)
tornado.web.Application.__init__(self, handlers, **settings)
#这里不是Spark Streaming的主场,所以用publisher模拟发布数据
def publisher():
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
a = 1
while True:
r.publish("my_channel", "Hello:" + str(a))
a += 1
time.sleep(1)
# 订阅redis中的特定channel,收到消息后,调用data_handler向/message发消息
def subscriber():
redis = redis.Redis(host='localhost', port=6379, decode_responses=True)
p = redis.pubsub()
p.subscribe(**{'my_channel': data_handler})
p.run_in_thread()
def data_handler(message):
url = "http://127.0.0.1:8090/message"
data = {'data': message['data']}
http_request = httpclient.HTTPRequest(url, method="POST",
body=json.dumps(data))
http_client = httpclient.HTTPClient()
http_client.fetch(http_request)
if __name__ == "__main__":
Thread(target=publisher).start()
subscriber()
app = Application()
app.listen(8090)
tornado.ioloop.IOLoop.current().start()
index.html
<script>
var ws = new WebSocket("ws://localhost:8090/pigeon");
ws.onmessage = function (evt) {
console.log(evt.data);
};
</script>