An impossible task? Smart views against sharded databases over multiple servers

Our IT department inherited an absolute nightmare of a database system. We have 5 database servers (3 of which are MSSQL 2008R2 and 2 are MSSQL 2016). Instead of being multi-tenant, they are single-tenant. This means that we have hundreds of shards across the 5 servers (we have a total of 505 database shards).

In order to query ALL data across all the shards, each server has a "parent" database composed of views which then UNION ALL the data from each shard on the server into a single view (we call them "Cross Server Views"). Then, on the "mother" server, the cross server view uses linked server connections to the 4 "child" servers to retrieve their data.

Basically, it looks like this: enter image description here

If we want to get ALL data from Customers, we execute a simple SELECT * FROM CUSTOMERS query against the "main"/"mother" database server, where CUSTOMERS is a view that gets data from its own shards, plus the other 4 "child" servers. Each "child" db server also has a Customers view that gets data from its own shards. This "mother" view is built like this:

--Its own shards, plus the rollup views from the child db servers
select * from shard1..Customers union all 
select * from shard2..Customers union all 
select * from shard3..Customers union all 
--etc... union all
select * from DB2..Customers union all 
select * from DB3..Customers union all 
select * from DB4..Customers union all 
select * from DB5..Customers

and each of the "child" views are built like this:

--ONLY its own shards
select * from shard4..Customers union all
select * from shard5..Customers union all

There are 505 total shards, so these views are much larger than I have described above. Basically, a big hairy mess!!!!

Because of this architecture, we have a problem in that if a single shard goes down, they all go down (schemas will ALWAYS be identical, so we're skipping that issue). We have been tasked with re-writing these views to be "smart", in that if a shard or child server is unavailable, it will skip over them and return the data for the shards/servers it can find.

I thought I had a solution that was working fine, until I got to the last two servers, which are the MSSQL 2016 servers.

In a nutshell, I had created a stored procedure that gathered database names via sys.databases, excluded the db's we don't care about, and generated a COALESCE'd dynamic sql statement which was then executed via sp_executesql (code below):



    SELECT @sql = COALESCE(@sql + ' UNION ALL ', '') + 'SELECT * FROM ' + name + '.dbo.' + @table + ' WITH (NOLOCK)'
    FROM sys.databases
    WHERE state_desc = 'ONLINE'
    AND name NOT IN ('master', 'model', 'msdb', 'tempdb', 'DBLOOKUP', 'MM_STATE_MT', 'kids', 'distribution', 'LiteSpeedLocal')

        DECLARE @available INT

        BEGIN TRY
            EXEC @available = sys.sp_testlinkedserver N'CHILD1'
            SET @sql = @sql + ' UNION ALL SELECT * FROM [CHILD1].viewdb.dbo.' + @table + ' WITH (NOLOCK)'
        END TRY
            SET @available = 0
        END CATCH

        BEGIN TRY
            EXEC @available = sys.sp_testlinkedserver N'CHILD2'
            SET @sql = @sql + ' UNION ALL SELECT * FROM [CHILD2].viewdb.dbo.' + @table + ' WITH (NOLOCK)'
        END TRY
            SET @available = 0
        END CATCH

        BEGIN TRY
            EXEC @available = sys.sp_testlinkedserver N'CHILD3'
            SET @sql = @sql + ' UNION ALL SELECT * FROM [CHILD3].viewdb.dbo.' + @table + ' WITH (NOLOCK)'
        END TRY
            SET @available = 0
        END CATCH

        BEGIN TRY
            EXEC @available = sys.sp_testlinkedserver N'CHILD4'
            SET @sql = @sql + ' UNION ALL SELECT * FROM [CHILD4].viewdb.dbo.' + @table + ' WITH (NOLOCK)'
        END TRY
            SET @available = 0
        END CATCH

    EXEC sp_executesql @sql;

Then in the viewdb views, I made use of a self-referencing linked server and OPENQUERY to execute the procedure, like so (removed the stored procedure name and table name):

SELECT * FROM OPENQUERY(local, 'SET FMTONLY OFF; SET NOCOUNT ON; EXEC [stored procedure] ''[table_we_want_to_query]''')

This worked fine on the MSSQL 2008 servers because of the way FMTONLY OFF works on that edition. However, starting with MSSQL 2012, they changed how FMTONLY works. So on the 2016 servers, if we try to query the views we get the following error:

The metadata could not be determined because statement 'EXEC sp_executesql @sql' in procedure '[stored procedure]'  contains dynamic SQL.  Consider using the WITH RESULT SETS clause to explicitly describe the result set.

I've tried WITH RESULT SETS UNDEFINED, which doesn't work. I can't specify a result set in the stored procedure because it is generic (takes a table name input param to determine which table to query).

I have tried every manner of dynamic sql and switch options I can possibly think of or find via Google. I'm dead in the water with this process on the 2016 servers. Does anyone have any ideas how to work around this or perhaps a different approach to solve the problem? Keep in mind, changing the entire architecture of the database is not available at this time. We have to work with what we have for now.

Please let me know if there is any additional information that could be beneficial. Thank you!