Power BI - Finding and Filtering Duplicates Across Columns

I've got a set of data where four individual parts and part numbers (all text) are compiled to become combinations. They can combine in 2 different ways, either: - Column A then B then C then D, OR - Column A then C then B then D

I've got the combinations each in two separate columns.

Unfortunately, this leads to some duplicates being created, because A+B+C+D is functionally equivalent (for my purposes) to A+C+B+D. When the duplicates are in a single column, removing them is simple enough. Removing duplicates from different columns is a little bit trickier, and that's where I'm looking for your help.

Example

I've mocked up an example of how the items are concatinated. As you can see, there are no duplicates within the same column, but there are duplicates of the same combinations (colour-coded for simplicity) appearing in both columns, and often in a different row of data so it's not as simple as looking across the row. I'd love your help to identify these and filter (or remove) the duplicates.

Also, note that the relationship between Concat 1 and Concat 2 is pretty much meaningless (for these purposes) other than that I'd like to filter/remove any duplicate values at all, whether those duplicates are within the same column or across the two columns.

Thanks,

Dan

1 answer

  • answered 2018-10-12 13:04 horseyride

    Assuming you start with Table1 having six columns without a header row, where columns 5/6 have the two combinations, this removes duplicates across rows and columns. It does not remove duplicates on its own row (ABBA/ABBA)

    let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Column1", "Column2", "Column3", "Column4"}, "Attribute", "Value"),
    #"Removed Duplicates" = Table.Distinct(#"Unpivoted Other Columns", {"Value"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Duplicates", List.Distinct(#"Removed Duplicates"[Attribute]), "Attribute", "Value")
    in #"Pivoted Column"
    

    if you also need to remove duplicates on its own row use this instead:

    let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Column1", "Column2", "Column3", "Column4"}, "Attribute", "Value"),
    #"Removed Duplicates" = Table.Distinct(#"Unpivoted Other Columns", {"Value"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Duplicates", List.Distinct(#"Removed Duplicates"[Attribute]), "Attribute", "Value"),
    #"Filtered Rows" = Table.SelectRows(#"Pivoted Column", each ([Column6] <> null) and ([Column5] <> null))
    in #"Filtered Rows"