一直有看到网上有讨论Python2和Python3的比较,最近公司也在考虑是否在spark-python大数据开发环境中升级到python3。通过本篇博文记录Python2.7.13和Pthon3.5.3的各方面比较。
环境配置
这里继续使用我们在之前博文里配置的环境。
因为是比较Python2和Python3差异,所以单纯升级Python版本无法解决,我通过pyenv和virtualenv两个工具来实现隔离的测试环境。
参考文档:使用pyenv和virtualenv搭建python虚拟环境、使用 pyenv 可以在一个系统中安装多个python版本
配置的步骤如下:
- 最开始是更新Tkinter,不然后续要重新再来一次,不要问我为什么知道...
sudo yum install tkinter -y
sudo yum install tk-devel tcl-devel -y
- 更新pyenv依赖软件
sudo yum install readline readline-devel readline-static -y
yum install openssl openssl-devel openssl-static -y
yum install sqlite-devel -y
yum install bzip2-devel bzip2-libs -y
- 下载安装pyenv,并下载python2.7.13和python3.5.3
git clone https://github.com/yyuu/pyenv.git ~/.pyenv
chgmod 777 -R ~/.pyenv
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bash_profile
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bash_profile
echo 'eval "$(pyenv init -)"' >> ~/.bash_profile
exec $SHELL
source ~/.bash_profile
pyenv install --list
pyenv install -v 2.7.13
pyenv install -v 3.5.3
- 下载安装pyenv-virtualenv,并安装两个隔离环境
git clone https://github.com/yyuu/pyenv-virtualenv.git ~/.pyenv/plugins/pyenv-virtualenv
echo 'eval "$(pyenv virtualenv-init -)"' >> ~/.bash_profile
source ~/.bash_profile
pyenv virtualenv 2.7.13 py2
pyenv virtualenv 3.5.3 py3
好,到此基本搞定两个隔离的python环境,测试如下,我们可以发现当前的python环境从centos7默认的2.7.5切换到2.7.13再切换到3.5。
[kejun@localhost ~]$ python -V
Python 2.7.5
[kejun@localhost ~]$ pyenv activate py2
(py2) [kejun@localhost ~]$ python -V
Python 2.7.13
(py2) [kejun@localhost ~]$ pyenv deactivate
[kejun@localhost ~]$ pyenv activate py3
(py3) [kejun@localhost ~]$ python -V
Python 3.5.
详细测试:
我们安装了常用的数据分析第三方工具包,并做了安装测试和样例测试,样例测试的脚本见最下。
分类 | 工具名 | 用途 |
---|---|---|
数据收集 | scrapy | 网页采集,爬虫 |
数据收集 | scrapy-redis | 分布式爬虫 |
数据收集 | selenium | web测试,仿真浏览器 |
数据处理 | beautifulsoup | 网页解释库,提供lxml的支持 |
数据处理 | lxml | xml解释库 |
数据处理 | xlrd | excel文件读取 |
数据处理 | xlwt | excel文件写入 |
数据处理 | xlutils | excel文件简单格式修改 |
数据处理 | pywin32 | excel文件的读取写入及复杂格式定制 |
数据处理 | Python-docx | Word文件的读取写入 |
数据分析 | numpy | 基于矩阵的数学计算库 |
数据分析 | pandas | 基于表格的统计分析库 |
数据分析 | scipy | 科学计算库,支持高阶抽象和复杂模型 |
数据分析 | statsmodels | 统计建模和计量经济学工具包 |
数据分析 | scikit-learn | 机器学习工具库 |
数据分析 | gensim | 自然语言处理工具库 |
数据分析 | jieba | 中文分词工具库 |
数据存储 | MySQL-python | mysql的读写接口库 |
数据存储 | mysqlclient | mysql的读写接口库 |
数据存储 | SQLAlchemy | 数据库的ORM封装 |
数据存储 | pymssql | sql server读写接口库 |
数据存储 | redis | redis的读写接口 |
数据存储 | PyMongo | mongodb的读写接口 |
数据呈现 | matplotlib | 流行的数据可视化库 |
数据呈现 | seaborn | 美观的数据可是湖库,基于matplotlib |
工具辅助 | jupyter | 基于web的python IDE,常用于数据分析 |
工具辅助 | chardet | 字符检查工具 |
工具辅助 | ConfigParser | 配置文件读写支持 |
工具辅助 | requests | HTTP库,用于网络访问 |
# encoding=utf-8
import sys
import platform
import traceback
import gc
import ctypes
STD_OUTPUT_HANDLE= -11
FOREGROUND_BLACK = 0x0
FOREGROUND_BLUE = 0x01 # text color contains blue.
FOREGROUND_GREEN= 0x02 # text color contains green.
FOREGROUND_RED = 0x04 # text color contains red.
FOREGROUND_INTENSITY = 0x08 # text color is intensified.
class WinPrint:
"""
提供给Windows打印彩色字体使用
"""
std_out_handle = ctypes.windll.kernel32.GetStdHandle(STD_OUTPUT_HANDLE)
def set_cmd_color(self, color, handle=std_out_handle):
bool = ctypes.windll.kernel32.SetConsoleTextAttribute(handle, color)
return bool
def reset_color(self):
self.set_cmd_color(FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE)
def print_red_text(self, print_text):
self.set_cmd_color(FOREGROUND_RED | FOREGROUND_INTENSITY)
print (print_text)
self.reset_color()
def print_green_text(self, print_text):
self.set_cmd_color(FOREGROUND_GREEN | FOREGROUND_INTENSITY)
print (print_text)
self.reset_color()
class UnixPrint:
"""
提供给Centos打印彩色字体
"""
def print_red_text(self, print_text):
print('\033[1;31m%s\033[0m'%print_text)
def print_green_text(self, print_text):
print('\033[1;32m%s\033[0m'%print_text)
py_env = "Python2" if sys.version.find("2.7") > -1 else "Python3"
sys_ver = "Windows" if platform.system().find("indows") > -1 else "Centos"
my_print = WinPrint() if platform.system().find("indows") > -1 else UnixPrint()
def check(sys_ver, py_env):
"""
装饰器,统一输入输出
顺便测试带参数的装饰器,非必须带参数
"""
def _check(func):
def __check():
try:
func()
my_print.print_green_text(
"[%s,%s]: %s pass." % (sys_ver, py_env, func.__name__))
except:
traceback.print_exc()
my_print.print_red_text(
"[%s,%s]: %s fail." % (sys_ver, py_env, func.__name__))
return __check
return _check
def make_requirement(filepath, filename):
"""
处理pip requirements文件
"""
result = []
with open(filepath + "\\" + filename, "r") as f:
data = f.readlines()
for line in data:
if line.find("==") > -1:
result.append(line.split("==")[0] + "\n")
else:
result.append(line + "\n")
with open(filepath + "\\" + filename.split(".")[0] + "-clean.txt",
"w") as f1:
f1.writelines(result)
@check(sys_ver, py_env)
def test_scrapy():
from scrapy import signals
from selenium import webdriver
from scrapy.http import HtmlResponse
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
@check(sys_ver, py_env)
def test_matplotlib():
import matplotlib.pyplot as plt
l = [1, 2, 3, 4, 5]
h = [20, 14, 38, 27, 9]
w = [0.1, 0.2, 0.3, 0.4, 0.5]
b = [1, 2, 3, 4, 5]
fig = plt.figure()
ax = fig.add_subplot(111)
rects = ax.bar(l, h, w, b)
# plt.show()
@check(sys_ver, py_env)
def test_beautifulSoup():
from bs4 import BeautifulSoup
html_str = "<html><meta/><head><title>Hello</title></head><body onload=crash()>Hi all<p></html>"
soup = BeautifulSoup(html_str, "lxml")
# print (soup.get_text())
@check(sys_ver, py_env)
def test_lxml():
from lxml import html
html_str = "<html><meta/><head><title>Hello</title></head><body onload=crash()>Hi all<p></html>"
html.fromstring(html_str)
@check(sys_ver, py_env)
def test_xls():
import xlrd
import xlwt
from xlutils.copy import copy
excel_book2 = xlwt.Workbook()
del excel_book2
excel_book1 = xlrd.open_workbook("1.xlsx")
del excel_book1
import docx
doc = docx.Document("1.docx")
# print (doc)
del doc
gc.collect()
@check(sys_ver, py_env)
def test_data_analysis():
import pandas as pd
import numpy as np
data_list = np.array([x for x in range(100)])
data_serial = pd.Series(data_list)
# print (data_serial)
from scipy import fft
b = fft(data_list)
# print (b)
@check(sys_ver, py_env)
def test_statsmodels():
import statsmodels.api as sm
data = sm.datasets.spector.load()
data.exog = sm.add_constant(data.exog, prepend=False)
# print data.exog
@check(sys_ver, py_env)
def test_sklearn():
from sklearn import datasets
iris = datasets.load_iris()
data = iris.data
# print(data.shape)
@check(sys_ver, py_env)
def test_gensim():
import warnings
warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim')
from gensim import corpora
from collections import defaultdict
documents = ["Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system",
"System and human system engineering testing of EPS",
"Relation of user perceived response time to error measurement",
"The generation of random binary unordered trees",
"The intersection graph of paths in trees",
"Graph minors IV Widths of trees and well quasi ordering",
"Graph minors A survey"]
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
for document in documents]
frequency = defaultdict(int)
for text in texts:
for token in text:
frequency[token] += 1
texts = [[token for token in text if frequency[token] > 1]
for text in texts]
dictionary = corpora.Dictionary(texts)
dictionary.save('deerwester.dict')
@check(sys_ver, py_env)
def test_jieba():
import jieba
seg_list = jieba.cut("我来到了北京参观天安门。", cut_all=False)
# print("Default Mode: " + "/ ".join(seg_list)) # 精确模式
@check(sys_ver, py_env)
def test_mysql():
import MySQLdb as mysql
#测试pet_shop连接
db = mysql.connect(host="xx", user="yy", passwd="12345678", db="zz")
cur = db.cursor()
sql="select id from role;"
cur.execute(sql)
result = cur.fetchall()
db.close()
# print (result)
@check(sys_ver, py_env)
def test_SQLAlchemy():
from sqlalchemy import Column, String, create_engine,Integer
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
engine = create_engine('mysql://xxx/yy',echo=False)
DBSession = sessionmaker(bind=engine)
Base = declarative_base()
class rule(Base):
__tablename__="role"
id=Column(Integer,primary_key=True,autoincrement=True)
role_name=Column(String(100))
role_desc=Column(String(255))
new_rule=rule(role_name="test_sqlalchemy",role_desc="forP2&P3")
session=DBSession()
session.add(new_rule)
session.commit()
session.close()
@check(sys_ver, py_env)
def test_redis():
import redis
pool = redis.Redis(host='127.0.0.1', port=6379)
@check(sys_ver, py_env)
def test_requests():
import requests
r=requests.get(url="http://www.cnblogs.com/kendrick/")
# print (r.status_code)
@check(sys_ver, py_env)
def test_PyMongo():
from pymongo import MongoClient
conn=MongoClient("localhost",27017)
if __name__ == "__main__":
print ("[%s,%s] start checking..." % (sys_ver, py_env))
test_scrapy()
test_beautifulSoup()
test_lxml()
test_matplotlib()
test_xls()
test_data_analysis()
test_sklearn()
test_mysql()
test_SQLAlchemy()
test_PyMongo()
test_gensim()
test_jieba()
test_redis()
test_requests()
test_statsmodels()
print ("[%s,%s] finish checking." % (sys_ver, py_env))