我有一个pandas列,列出了用户所做的连续日志操作,同时在每个整个日志记录会话中在移动应用程序中发布照片.假设单个列表如下所示:
my_list = [ 'action_a','action_b','action_c','action_z','action_j','action_a','action_z']
1)action_a – 照片上传的开始
2)action_z – 照片上传结束
3)其他actions_i – action_a和action_z之间可能发生的所有操作.
4)可能存在错误,例如’action_j’,它们不在’action_a’,’action_z’之间,我们不应该将它们考虑在内
5)照片上传过程可能无法完成 – 因此可能存在’action_a’,’action_b’之类的路径.
GOAL =将my_list分隔为以’action_a’开头并以’action_z’结尾或在另一’action_a’之前结束的所有操作路径的子列表.所以结果应该是这样的:
['action_a','action_z'] ['action_a','action_b'] ['action_a','action_z']
所以目前我正试图解决这个问题:首先我删除了所有的my_lists,其中’action_z’的数量大于’action_a’的数量或者没有’action_a’的数量.然后我做到了:
indices_a = [i for i,x in enumerate(my_list) if x == "action_a"] indices_z = [i for i,x in enumerate(my_list) if x == "action_z"] if(len(indices_z)<1): for i_a,x_a in enumerate(indices_a): if (i_a+1 != len(indices_a)): indices_z.append(indices_a[i_a+1]-1) else: indices_z.append(len(my_list)-1) else: for i_a,x_a in enumerate(indices_a): if (i_a+1 != len(indices_a)): if (indices_z[i_a] > indices_a[i_a+1] ): indices_z.insert(i_a,indices_a[i_a+1]-1) else: indices_z.append(len(my_list)-1) res=[] for i,j in zip(indices_a,indices_z): res.append(my_list[i:j+1] )
好像它有效.有什么更好的方法?
解决方法
我试图简化一些事情并提出这个逻辑:
result = [] curr_list = None for item in my_list: if item == 'action_a': if curr_list is not None: # Only append is there is content result.append(curr_list) # Create a new list curr_list = [] try: # Try to append the current item curr_list.append(item) if item == 'action_z': # Close the current list but don't initialize # a new one until we encounter action_a result.append(curr_list) curr_list = None except AttributeError: # This means we haven't encountered action_a yet # Just ignore and move on pass if curr_list is not None: # Append an "open" list if there is one result.append(curr_list) for item in result: print(item)
结果:
['action_a','action_z'] ['action_a','action_z']