python-2列上的Groupby以及String列上的过滤器

前端之家收集整理的这篇文章主要介绍了python-2列上的Groupby以及String列上的过滤器 前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。

样本DF:

ID   Name        Price     Condition   Fit_Test
1    Apple         10      Good        Super_Fit
2    Apple         10      OK          Super_Fit
3    Apple         10      Bad         Super_Fit
4    Orange        12      Good        Not_Fit
5    Orange        12      OK          Not_Fit
6    Banana        15      OK          Medium_Fit
7    Banana        15      Bad         Medium_Fit
8    Pineapple     25      OK          Medium_Fit
9    Pineapple     25      OK          Medium_Fit
10   Cherry        30      Bad         Medium_Fit

预期DF:

ID   Name        Price     Condition   Fit_Test
1    Apple         10      Good        Super_Fit
2    Apple         10      OK          Super_Fit
3    Apple         10      Bad         Super_Fit

4    Orange        12      Good        Not_Fit

6    Banana        15      OK          Medium_Fit

8    Pineapple     25      OK          Medium_Fit
9    Pineapple     25      OK          Medium_Fit
10   Cherry        30      Bad         Medium_Fit

问题陈述:

我想按名称和价格分组,然后根据条件进行过滤.

>如果在名称和价格中存在良好,不良和确定这三个条件,则仅保留一个良好且Fit_Test不是Super_Fit
>如果在“名称”和“价格”中存在“良好”和“确定”的条件,则仅保留“良好”(ID 4,5仅是预期的ID 4),而Fit_Test不是Super_Fit
>如果在“名称”和“价格”中存在“坏”和“确定”的条件,则仅保留“确定”(ID 6,7仅是预期的ID 6),而Fit_Test不是Super_Fit
>如果在“名称”和“价格”中存在“确定”和“确定”的条件,则存在“良好”和“良好存在”或“不良”,则不执行任何操作,然后仅保留“确定”(期望的ID 8,9,10是ID 8,10)并且Fit_Test不是Super_Fit

更新答案

>测试的第一个答案和编辑适用于所有没有Fit_Test列条件的df.在此答案中,预期DF将没有第2行和第2行. 3也如答案所示
>“更新更新”答案在需要添加其他列Fit_Test时有效,并且仅在值不为Super_Fit时才有效.

在这两种解决方案中,基于“条件”列和“ 2列分组”的行过滤是相同的.

我在数字列上找到了带有筛选器分组依据的东西,但在字符串列上却找不到.

最佳答案
想法是创建集合以进行比较:

a = df.join(df.groupby(['Price','Name'])['Condition'].apply(set).rename('m'),on=['Price','Name'])['m']
print (a)
0    {Bad,Good,OK}
1    {Bad,OK}
2    {Bad,OK}
3         {Good,OK}
4         {Good,OK}
5          {Bad,OK}
6          {Bad,OK}
7               {OK}
8               {OK}
9              {Bad}
Name: m,dtype: object
m1 = (a == set({'Bad','Good','OK'})) | (a == set({'Good','OK'}))
m2 = a == set({'Bad','OK'})
#check if unique value - length of set is 1
m3 = a.str.len() == 1
m4 = df['Condition'] == 'Good'
m5 = df['Condition'] == 'OK'

df = df[(m1 & m4) | (m2 & m5) | m3]
print (df)
   ID       Name  Price Condition
0   1      Apple     10      Good
3   4     Orange     12      Good
5   6     Banana     15        OK
7   8  Pineapple     25        OK
8   9  Pineapple     25        OK
9  10     Cherry     30       Bad

编辑测试:

为了进行测试,可以使用assign:

print (df.assign(sets=a,m1 = m1,m2=m2,m3=m3,m4=m4,m5=m5,m=m))
   ID       Name  Price Condition             sets     m1     m2     m3  \
0   1      Apple     10      Good  {Bad,OK}   True  False  False   
1   2      Apple     10        OK  {Bad,OK}   True  False  False   
2   3      Apple     10       Bad  {Bad,OK}   True  False  False   
3   4     Orange     12      Good       {Good,OK}   True  False  False   
4   5     Orange     12        OK       {Good,OK}   True  False  False   
5   6     Banana     15        OK        {Bad,OK}  False   True  False   
6   7     Banana     15       Bad        {Bad,OK}  False   True  False   
7   8  Pineapple     25        OK             {OK}  False  False   True   
8   9  Pineapple     25        OK             {OK}  False  False   True   
9  10     Cherry     30       Bad            {Bad}  False  False   True   

      m4     m5      m  
0   True  False   True  
1  False   True  False  
2  False  False  False  
3   True  False   True  
4  False   True  False  
5  False   True   True  
6  False  False  False  
7  False   True   True  
8  False   True   True  
9  False  False   True  

编辑更新:

对于新条件,请使用:

    m6 = df['Fit_Test'] == 'Super_Fit'
    df = df[((m1 & m4) | (m2 & m5) | m3) | m6]
    print (df)
       ID       Name  Price Condition    Fit_Test
    0   1      Apple     10      Good   Super_Fit
    1   2      Apple     10        OK   Super_Fit
    2   3      Apple     10       Bad   Super_Fit
    3   4     Orange     12      Good     Not_Fit
    5   6     Banana     15        OK  Medium_Fit
    7   8  Pineapple     25        OK  Medium_Fit
    8   9  Pineapple     25        OK  Medium_Fit
    9  10     Cherry     30       Bad  Medium_Fit

猜你在找的Python相关文章