HDF5的使用笔记

缘起

根据 https://zhuanlan.zhihu.com/p/208842196 大神的测试，HDF5具有性能优势

安装

conda install h5py

数据写入

import numpy as np
import h5py

hf = h5py.File("test.hdf5",'w')
data = np.random.randn(10000,100,1000)
labels = np.random.randint(100, size=(1000))
hf.create_dataset("data",data=data)
hf.create_dataset("labels",data=labels)
hf.close()

数据读取

逐步讲解，在使用时候若不明确指明数据，就不将数据加载到内存。

f = h5py.File("test.hdf5", "r") # 打开文件
db = f['data'] # 获得索引，不加载到内存
print(db.shape) # 同样不将数据加载到内存
xx =db[100,:,:] # **加载到内存**
f.close() # 常规关闭文件，建议用with语句

with 语句版本

f = h5py.File("test.hdf5", "r")
for key in f.keys():
    print(key) #Names of the root level object names in HDF5 file - can be groups or datasets.
    print(type(f[key]))
    # print(f[key].value)
    data = np.array(f[key])
    print(data)

其他

如果你的数据真的非常大（大数据），一个data放不下（内存放不下），那么可以使用 pytables ,中文博客也有些介绍的，https://blog.csdn.net/q7w8e9r4/article/details/133855371

最后编辑于：2024.02.25 14:12:34

HDF5的使用笔记

缘起

安装

数据写入

数据读取

其他

推荐阅读更多精彩内容