日志格式如下
[main] INFO com.jzdata.press.core.PressTest - select cs_bill_customer_sk,count(*) from catalog_sales where cs_item_sk =2 group by cs_bill_customer_sk order by cs_bill_customer_sk limit 100; true 2640
[main] INFO com.jzdata.press.core.PressTest - select cs_bill_customer_sk,count(*) from catalog_sales where cs_item_sk =16 group by cs_bill_customer_sk order by cs_bill_customer_sk limit 100; true 282
[main] INFO com.jzdata.press.core.PressTest - select cs_bill_customer_sk,count(*) from catalog_sales where cs_item_sk =13 group by cs_bill_customer_sk order by cs_bill_customer_sk limit 100; true 291
[main] INFO com.jzdata.press.core.PressTest - select cs_bill_customer_sk,count(*) from catalog_sales where cs_item_sk =11 group by cs_bill_customer_sk order by cs_bill_customer_sk limit 100; true 320
等
需要提取cs_item_sk 以1结尾的并且最后是true的值
代码如下
import re
string = r'cs_item_sk[\s=]*(\d*?1+)\s+.+?true\s*(\d+)$'
# string = r'cs_item_sk'
pattern = re.compile(string)
with open('./src.txt', 'r') as f:
for line in f.readlines():
line = line.strip()
# line = 'where cs_item_sk =997'
m = pattern.search(line)
if m is not None:
print(m.groups())