python – 将包含列表的嵌套字典展开到pandas DataFrame中

我有一个嵌套字典,子字典使用列表：

nested_dict = {'string1': {69: [1231,232],67:[682,12],65: [1,1]},`string2` :{28672: [82,23],22736:[82,93,1102,102],19423: [64,23]},... }

列表中至少有两个元素用于子词典,但可能会有更多.

我想将这个字典“展开”成一个pandas DataFrame,第一个字典键有一列(例如’string1′,’string2′,..),一个列用于子目录键,一列用于第一个字典键列表中的项目,下一个项目的一列,依此类推.

这是输出应该是什么样子：

col1       col2    col3     col4    col5    col6
string1    69      1231     232
string1    67      682      12
string1    65      1        1
string2    28672   82       23
string2    22736   82       93      1102    102
string2    19423   64       23

当然,我尝试使用pd.DataFrame.from_dict：

new_df = pd.DataFrame.from_dict({(i,j): nested_dict[i][j] 
                           for i in nested_dict.keys() 
                           for j in nested_dict[i].keys()
                           ...

现在我被卡住了.并且存在许多问题：

>我如何解析字符串(即nested_dict [i] .values()),使每个元素都是一个新的pandas DataFrame列？
>以上实际上不会为每个字段创建一列
>以上内容不会填充带有元素的列,例如： string1应该位于子目录键值对的每一行中. (对于col5和col6,我可以用零填充NA)
>我不确定如何正确命名这些列.

最佳答案

这应该会给你你想要的结果,虽然它可能不是最优雅的解决方案.这可能是更好的(更多的熊猫方式).

我解析了你的嵌套字典并构建了一个字典列表(每行一个).

# some sample input
nested_dict = {
    'string1': {69: [1231,'string2' :{28672: [82,'string3' :{28673: [83,24],22737:[83,94,1103,103],19424: [65,24]}
}

# new list is what we will use to hold each row
new_list = []
for k1 in nested_dict:
    curr_dict = nested_dict[k1]
    for k2 in curr_dict:
        new_dict = {'col1': k1,'col2': k2}
        new_dict.update({'col%d'%(i+3): curr_dict[k2][i] for i in range(len(curr_dict[k2]))})
        new_list.append(new_dict)

# create a DataFrame from new list
df = pd.DataFrame(new_list)

输出：

      col1   col2  col3  col4    col5   col6
0  string2  28672    82    23     NaN    NaN
1  string2  22736    82    93  1102.0  102.0
2  string2  19423    64    23     NaN    NaN
3  string3  19424    65    24     NaN    NaN
4  string3  28673    83    24     NaN    NaN
5  string3  22737    83    94  1103.0  103.0
6  string1     65     1     1     NaN    NaN
7  string1     67   682    12     NaN    NaN
8  string1     69  1231   232     NaN    NaN

假设输入将始终包含足够的数据来创建col1和col2.

我遍历nested_dict.假设nested_dict的每个元素也是字典.我们也循环遍历该字典(curr_dict).键k1和k2用于填充col1和col2.对于其余的键,我们遍历列表内容并为每个元素添加一列.

python – 将包含列表的嵌套字典展开到pandas DataFrame中

猜你在找的Python相关文章