Python:读取特定行(小文件、重复文件、大型文件的不同解决方案)

问题描述

当使用for循环读取文件时,在某些情况下,我们只想读取特定的行,比如第26行和第30行,对于不同的情况,有3个内置特性可以实现这个目标。

When using a for loop to read a file, in some cases we only want to read specific lines, say line #26 and #30, there are 3 built-in features to achieve this goal for different cases.

For reading small files

对于小文件的快速解决办法:

Use fileobject.readlines() or for line in fileobject as a quick solution for small files.

f = open('filename')
lines=f.readlines()
print lines[25]
print lines[29]

or:

lines = [25, 29]
i = 0
f = open('filename')
for line in f:
    if i in lines:
        print i
    i += 1

For reading many files, possible repeatedly

使用linecache是一个更优雅的解决方案,它可以快速读取许多文件,甚至可以重复读取。
There is a more elegant solution for extracting many lines: linecache

import linecache
linecache.getline('/etc/passwd', 4)
'sys:x:3:3:sys:/dev:/bin/sh\n'

将4改为想要的行号,就可以了。请注意,由于计数是从零开始的,所以第4行是第5行。

Change the 4 to your desired line number, and you're on. Note that 4 would bring the fifth line as the count is zero-based.

For large files which won't fit into memory

当文件非常大,而且无法放入内存时,用enumerate()。注意,使用此方法可能会变慢,因为文件是按顺序读取的。
If the file to read is big, and cause problems when read into memory or you don't want to read the whole file in memory at once, it might be a good idea to use enumerate():

fp = open("file")
for i, line in enumerate(fp):
    if i == 25:
        # 26th line
    elif i == 29:
        # 30th line
    elif i > 29:
        break
fp.close()

Note that i == n-1 for the nth line.


In Python 2.6 or later:

with open("file") as fp:
    for i, line in enumerate(fp):
        if i == 25:
            # 26th line
        elif i == 29:
            # 30th line
        elif i > 29:
            break

整理并翻译自:stackoverflow

https://stackoverflow.com/questions/2081836/reading-specific-lines-only?answertab=active#tab-top

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • pyspark.sql模块 模块上下文 Spark SQL和DataFrames的重要类: pyspark.sql...
    mpro阅读 13,170评论 0 13
  • Introduction What is Bowtie 2? Bowtie 2 is an ultrafast a...
    wzz阅读 11,178评论 0 5
  • NAME dnsmasq - A lightweight DHCP and caching DNS server....
    ximitc阅读 7,952评论 0 0
  • 不论是谁,都难免有失误。聪明,不是不犯错误,而是同样的错误不犯两次。 这样,会使我们少犯错误,更加睿智。 ...
    廖阔阅读 1,864评论 0 0
  • 闲来无事就想找点什么事情做才感觉踏实,就写一写前段时间看过的一段电影《比悲伤更悲伤的故事》看完后的感受吧,其...
    一个小公举阅读 2,047评论 0 0

友情链接更多精彩内容