Determine segregating SQL column in known sets

I have a known target set of records in a large Oracle DB table that I want to pull. Rather than querying them by ID though, I want to find a a column within the table that holds a value(s) that is only assigned to the known set and not assigned to any records that are not in the set, thereby giving me a value to key off of.

Example:

Target IDs: 1, 2, 3

ID  Color  Dir    Size
1   red    up     S
2   red    up     M
3   red    down   L
-----------------------
4   red    left   S
5   blue   left   S
6   red    left   M
7   red    right  M
8   blue   right  M

In this scenario, the solution I'm looking for is the "Dir" column, as values up and down are exclusive to records in the desirable set and cover the entire set.

The table I'm working with has 80,000+ records and 100+ columns, so I'm looking for a way to perform this investigation in an automated manner, whether it be by SQL script or with tools like SSIS/SSAS 2008, Excel, PowerShell, etc. What SQL functions and/or utilities can help in this process?

1 answer

  • answered 2018-01-12 02:42 Soukai

    I know this is a long shot, but I have a solution for you that should work in SQL Server. Of course, the caveat being that you would need to get your data from Oracle into SQL Server first.

    There are a couple methods listed here:

    https://dba.stackexchange.com/questions/23782/what-is-the-easiest-way-to-move-data-from-oracle-to-sql-server

    If you manage to do that, you can give this a try:

    IF OBJECT_ID('TestTable1', 'U') IS NOT NULL DROP TABLE TestTable1;
    
    CREATE TABLE TestTable1 (
        ID INT IDENTITY(1, 1) PRIMARY KEY,
        Color VARCHAR(10),
        Dir VARCHAR(10),
        Size VARCHAR(10)
    );
    
    INSERT INTO TestTable1 (Color, Dir, Size)
    VALUES  ('red', 'up', 'S'),
            ('red', 'up', 'M'),
            ('red', 'down', 'L'),
            ('red', 'left', 'S'),
            ('blue', 'left', 'S'),
            ('red', 'left', 'M'),
            ('red', 'right', 'M'),
            ('blue', 'right', 'M');
    
    
    DECLARE @tableName sysname = 'TestTable1';
    DECLARE @targetIDs VARCHAR(MAX) = '1, 2';
    
    -- Get all of the columns associated with the specified table:
    DECLARE @targetObjectID INT = OBJECT_ID(@tableName, 'U');
    DECLARE @ColumnNames TABLE (
        ID INT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
        ColumnName sysname NOT NULL,
        HasOverlap BIT NULL
    );
    
    INSERT INTO @ColumnNames (ColumnName)
        SELECT c.name
        FROM sys.columns c
        WHERE c.object_id = @targetObjectID
            AND c.name <> 'ID';
    
    -- Define a template to use for searching column values for overlap:
    DECLARE @columnTestSQL NVARCHAR(MAX) = '
        WITH TargetValues AS (
            -- This produces a list of values for the target IDs:
            SELECT DISTINCT <ColumnName> [Val]
            FROM <TableName>
            WHERE ID IN (<TargetIDList>)
        )
        SELECT @hasOverlap_OUT = 1
        FROM <TableName> t
        WHERE
            -- Here we check for overlap with other IDs:
            t.ID NOT IN (<TargetIDList>)
            AND EXISTS (
                SELECT 1
                FROM TargetValues tv
                WHERE tv.Val = t.<ColumnName>
            );
    ';
    SET @columnTestSQL = REPLACE(@columnTestSQL, '<TableName>', @tableName);
    SET @columnTestSQL = REPLACE(@columnTestSQL, '<TargetIDList>', @targetIDs);
    
    -- Set up for loop:
    DECLARE @curID INT = 1;
    DECLARE @maxID INT = (SELECT MAX(ID) FROM @ColumnNames);
    DECLARE @curColumnName sysname;
    DECLARE @curHasOverlap BIT;
    DECLARE @curSQL NVARCHAR(MAX);
    
    -- Go through each column:
    WHILE @curID <= @maxID BEGIN
    
        -- Initialize this iteration:
        SELECT
            @curColumnName = cn.ColumnName,
            @curHasOverlap = 0
        FROM @ColumnNames cn
        WHERE cn.ID = @curID;
    
        -- Use the template to generate a dynamic SQL statement specific to the current column:
        SET @curSQL = REPLACE(@columnTestSQL, '<ColumnName>', @curColumnName);
    
        -- Execute the dynamic SQL to check for overlap:
        EXEC sp_executesql
            @stmt = @curSQL,
            @params = N'@hasOverlap_OUT BIT OUTPUT',
            @hasOverlap_OUT = @curHasOverlap OUTPUT;
    
        -- Record the results:
        UPDATE @ColumnNames
        SET HasOverlap = @curHasOverlap 
        WHERE ID = @curID;
    
        SET @curID += 1;
    END
    
    -- Output a list of fields with no overlap:
    SELECT ColumnName
    FROM @ColumnNames
    WHERE HasOverlap = 0;
    

    I have to admit I was a little thrown off by the tags on this question at first. I thought I would go ahead and post this anyway in case it helps.