本文的目的是:
通过Tushare获取股票基本信息, 并对获取的数据做进一步处理。
Tushare是什么
Tushare是一个免费、开源的python财经数据接口包。主要实现对股票等金融数据从数据采集、清洗加工 到 数据存储的过程,能够为金融分析人员提供快速、整洁、和多样的便于分析的数据,为他们在数据获取方面极大地减轻工作量,使他们更加专注于策略和模型的研究与实现上。考虑到Python pandas包在金融量化分析中体现出的优势,Tushare返回的绝大部分的数据格式都是pandas DataFrame类型,非常便于用pandas/NumPy/Matplotlib进行数据分析和可视化。当然,如果您习惯了用Excel或者关系型数据库做分析,您也可以通过Tushare的数据存储功能,将数据全部保存到本地后进行分析。应一些用户的请求,从0.2.5版本开始,Tushare同时兼容Python 2.x和Python 3.x,对部分代码进行了重构,并优化了一些算法,确保数据获取的高效和稳定。
Tushare的安装
假设是Windows平台, 首先安装Python. 个人建议使用Cygwin, 不会的话建议自己摸索下, 官方安装教程。为了方便通过Cygwin安装包, 建议安装cyg-apt.
- 安装Python
apt-cyg install python3 python3-pip
- 安装Tushare
pip install pandas bs4 lxml tshare
- 测试当前版本
import tushare as ts print(ts.__version__)
使用Tushare获取A股指数
使用文件存储数据
直接上python3
脚本
#!/usr/bin/python3
#-*- coding: utf-8 -*-
# FileName: GetIndexFromTushare.py
import sys,os
from glob import glob
from datetime import datetime
from pathlib import Path
import tushare as ts
# Get current date
current_date = datetime.now()
''' Function: years_before
Para: current, i
current: current date
i: the date of i years before current date
Explanation:
We will get data by tushare, as it is suggested that
we should get data range in one year, insteading of
the whole data, to avoid the 465 response from data server
'''
def years_before(current, i):
return current.replace(year=current.year-i).strftime("%Y-%m-%d")
''' Function: combine_csvs
Para: thedir, basename, partten
thedir: the root directory
basename: the shared file names in the root directory
partten: the partten of file name to search
Explanation:
We will save the data of each year to a file named `basename_i.csv`, where `basename` is the stock name and `i` is the year from current year
then we will use this function to combine then into one csv file
'''
def combine_csvs(thedir, basename, partten):
fname = thedir + basename + ".csv"
# delete the dumplicated file
if (Path(fname).exists()):
os.remove(fname)
# search all the `./basename_i.csv` in to an array
csv_arr=glob(thedir+basename+partten)
#print(csv_arr)
# open the fname to write, a=append
fout=open(fname, "a")
for csv in csv_arr:
f=open(csv)
# remove the header of csv
if csv_arr.index(csv) != 0:
f.__next__()
for line in f:
fout.write(line)
f.close()
os.remove(csv)
fout.close()
''' Function: pairwise
Para: arr
Explanation锛
Given a array, we will pair each nearby two elements into a new array
The arr will be each year of current date getting from `years_before`
function, it will return the range of each two year
'''
def pairwise(arr):
if not arr: return
for i in range(len(arr)-1):
yield arr[i], arr[i+1]
# define the stocks name, id and initial date
stocks ={
#'SHS_index' : ["000001", "2016-12-19"],
'AS_index' : ["000002", "1990-12-19"],
#'SZ300_index' : ["000300", "2005-04-08"],
}
for name, id_date in stocks.items():
id= id_date[0]
date= id_date[1]
#print(name, id, date)
# construct the range of time, separated by year
i=0
date_arr=[]
while( date < years_before(current_date, i) ):
date_arr.append(years_before(current_date, i))
i=i+1
date_arr.append(date)
#print(date_arr)
#Get the data
for date_e, date_b in pairwise(date_arr):
#print(date_b, date_e, "\n")
data = ts.get_h_data(id, start=date_b, end=date_e, index=True, pause=10, retry_count=5)
# save data to csv file
data.to_csv(name + '_' + str(date_arr.index(date_b)) + '.csv', columns=['date', 'high', 'low', 'close', 'volume', 'amount'])
# combine into one csv file
combine_csvs('./', name, "*.csv")
#sys.exit()