前提:假设你已经了解了Python的基础知识,并且已经安装了python3
文章涉及的程序运行在MAC系统下,但代码本身不受限于任何操作系统;大部分代码运行在 IPython Notebook 上。
如果你需要一种工具来解决数据科学问题的工具,我推荐你使用Pandas,她是一个高性能,高效率,高水平的数据分析库。当然,本文只是学习笔记,如果你想更好的了解Pandas是什么,以及她如何让你着迷的话,我推荐一下两篇文章关于Pandas的介绍(我不确定链接何时失效 ☺):
1. 十分钟快速入门 Pandas
2. Python 数据科学入门教程:Pandas
Installing pandas with PyPI
Pandas官方支持python版本(Officially Python 2.7, 3.5, 3.6, and 3.7.)
pip install pandas
Installing using your Linux distribution’s package manager
Distribution | Status | Download / Repository Link | Install method |
---|---|---|---|
Debian | stable | official Debian repository | sudo apt-get install python3-pandas |
Debian & Ubuntu | unstable (latest packages) | NeuroDebian | sudo apt-get install python3-pandas |
Ubuntu | stable | official Ubuntu repository | sudo apt-get install python3-pandas |
OpenSuse | stable | OpenSuse Repository | zypper in python3-pandas |
Fedora | stable | official Fedora repository | dnf install python3-pandas |
Centos/RHEL | stable | EPEL repository | yum install python3-pandas |
Running the test suite 运行单元测试
Panda提供了一系列详尽的单元测试,目前已覆盖了约97%的代码基。要在您的机器上运行它,以验证一切都都运行正常(并且您已经安装了所有的依赖项,包括软和硬件环境),请确保您有pytest并运行:
pip install pytest # 安装pytest
import pandas as pd
pd.test()
# output
running: pytest --skip-slow --skip-network /usr/local/lib/python3.6/site-packages/pandas
=================================================================== test session starts ===================================================================
platform darwin -- Python 3.6.5, pytest-3.9.2, py-1.7.0, pluggy-0.8.0
rootdir: /usr/local/lib/python3.6/site-packages/pandas, inifile:
collected 27342 items / 4 skipped
../../../../../../../usr/local/lib/python3.6/site-packages/pandas/tests/test_algos.py ..........................................................X.. [ 0%]
......................................................................................................................
一大波单元测试正在靠近..............................................................................
......................................................................................................................
==================== 12130 passed, 12 skipped in 368.339 seconds =====================
Dependencies 依赖
- setuptools: 24.2.0 or higher Python的distutilsde工具的增强工具,可以让程序员更方便的创建和发布 Python 包,特别是那些对其它包具有依赖性的状况。
- NumPy: 1.9.0 or higher NumPy系统是Python的一种开源的数值计算扩展。这种工具可用来存储和处理大型矩阵,比Python自身的嵌套列表(nested list structure)结构要高效的多(该结构也可以用来表示矩阵(matrix))。
- python-dateutil: 2.5.0 or higher 顾名思义,Date处理工具
- pytz 与dateutil结合使用,处理时区
Recommended Dependencies 推荐依赖
-
numexpr: for accelerating certain numerical operations.
numexpr
uses multiple cores as well as smart chunking and caching to achieve large speedups. If installed, must be Version 2.4.6 or higher. Numpy性能提升工具 -
bottleneck: for accelerating certain types of
nan
evaluations.bottleneck
uses specialized cython routines to achieve large speedups. If installed, must be Version 1.0.0 or higher. 用C语言编写的NumPy数组函数的集合