Computing optimum solution for arranging blocks with minimum moves

What started as a simple problem has turned into a challenge. And now I'm perilously close to being defeated by it. Help?

It starts off so simply. Picture a class like this:

class Unit
{
    // Where we are
    public int CurrentPos;

    // How long we are
    public int Length;

    // Where we belong
    public int TargetPos;
}

Now, assume you have a (static) collection of hundreds of thousands of these (like this). The goal is to move things from their CurrentPos, to their TargetPos. The catch is that sometimes there's already something at TargetPos (or partially overlapping it). In that case, the 'something' (or somethings) will need to get moved out of the way first.

Since 'moving' is an expensive operation, an optimum solution would only ever move a Unit once (from its current position to its target position). So I start by moving Units whose TargetPos is already free, then moving things into the space freed by the first moves, etc.

But eventually I run into the real challenge. At its simplest: I am trying to move A, but B is in the way, so I try to move B, but C is in the way, so I try to move C, but A is in the way. A->B->C->A.

For people who prefer numbers:

+------------+--------+-----------+
| CurrentPos | Length | TargetPos |
+------------+--------+-----------+
|        100 |     10 |       110 | A
|        110 |      5 |       120 | B
|        120 |     10 |       100 | C
+------------+--------+-----------+

I want to move the 10 at Pos 100 to 110, but there's already something there. So I try to move the 5 at 110 to 120, but there's already something there too. Finally I try to move the 10 at 120 to 100, but, wait, that's a loop!

To "break" the loop I could pick any of those entries and just move it out of the way. But picking the '5' allows me to minimize the size of "Move Out of the Way" moves (as contrasted with "Move Down to TargetPos" moves). Items that are "Moved out of the Way" will still have to be moved a second time to their own targets once the way is clear.

To be clear, what I'm trying to minimize isn't the number of "Move out of the ways," it's the size. Moving four Units of Length 2 is a better deal than moving one Unit of Length 10.

Logic tells me there's got to be 1 optimum solution for a given data set, no matter how large. One that is the absolute minimum "Lengths Moved" required to break all the loops. The trick is, how do you find it?

Rather than lay out all the reasons this is hard, let's head straight to the code. I've created a simple framework (written in c#) which enables me to try various strategies without having to code each one from scratch.

First there's my implementation of Unit:

class Unit : IComparable<int>
{
    /// <summary>
    /// Where we are
    /// </summary>
    public int CurrentPos;

    /// <summary>
    /// How long we are
    /// </summary>
    public readonly int Length;

    /// <summary>
    /// Where we belong
    /// </summary>
    public readonly int TargetPos;

    /// <summary>
    /// Units who are blocking me
    /// </summary>
    public List<Unit> WhoIsBlockingMe;

    public Unit(int c, int l, int t)
    {
        CurrentPos = c;
        Length = l;
        TargetPos = t;

        WhoIsBlockingMe = null;
    }

    /// <summary>
    /// Indicate that a child is no longer to be considered blocking.
    /// </summary>
    /// <param name="rb">The child to remove</param>
    /// <returns>How many Units are still blocking us.</returns>
    public int UnChild(Unit rb)
    {
        bool b = WhoIsBlockingMe.Remove(rb);
        Debug.Assert(b);

        return WhoIsBlockingMe.Count;
    }

    public override string ToString()
    {
        return string.Format("C:{0} L:{1} T:{2}", CurrentPos, Length, TargetPos);
    }

    public override int GetHashCode()
    {
        return TargetPos.GetHashCode();
    }

    /// <summary>
    /// Used by BinarySearch
    /// </summary>
    /// <param name="other">CurrentPos being sought.</param>
    /// <returns></returns>
    public int CompareTo(int other)
    {
        return CurrentPos.CompareTo(other);
    }
}

Mostly what you'd expect. Probably worth highlighting WhoIsBlockingMe. This is the list of other Units that are currently preventing this Unit from moving to its desired TargetPos. It is automatically populated and maintained by the framework.

And here's the framework:

abstract class FindOpt
{
    #region Members

    protected static readonly string DataDir = @"c:\vss\findopt2\data\"; // <--------- Adjust to suit!!!

    /// <summary>
    /// The Pos where I move Units "out of the way" (see MoveOut).
    /// </summary>
    private int m_RunningLast;

    /// <summary>
    /// Count of MoveOuts executed.
    /// </summary>
    private int m_Moves;

    /// <summary>
    /// The total size of MoveOuts executed.  This is what I'm trying to minimize.
    /// </summary>
    private int m_MoveSize;

    /// <summary>
    ///  The complete list of Units read from export.tab.
    /// </summary>
    protected readonly List<Unit> m_Units;

    /// <summary>
    /// A collection to keep track of who would get freed by moving a particular unit.
    /// </summary>
    protected readonly Dictionary<Unit, List<Unit>> m_Tree;

    /// <summary>
    /// Units freed (possibly due to cascading) waiting to be MoveDown.
    /// </summary>
    protected readonly Queue<Unit> m_ZeroChildren;

    /// <summary>
    /// Is m_Units currently sorted properly so BinarySearch will work?
    /// </summary>
    private bool UnitsOutOfDate;

    #endregion

    public FindOpt()
    {
        m_RunningLast = int.MaxValue;
        m_Moves = 0;
        m_MoveSize = 0;

        m_Units = new List<Unit>();
        m_Tree = new Dictionary<Unit, List<Unit>>();
        m_ZeroChildren = new Queue<Unit>();
        UnitsOutOfDate = true;

        // Load the Units
        using (StreamReader sr = new StreamReader(DataDir + @"export.tab"))
        {
            string s;

            while ((s = sr.ReadLine()) != null)
            {
                string[] sa = s.Split('\t');

                int c = int.Parse(sa[0]);
                int l = int.Parse(sa[1]);
                int t = int.Parse(sa[2]);

                Unit u = new Unit(c, l, t);
                m_Units.Add(u);
            }
        }
    }
    public int CalcBest()
    {
        // Build the dependency tree.
        BuildTree();

        // Process anything that got added to m_ZeroChildren Queue while 
        // building the tree.
        ProcessQueue();

        // Perform any one time initialization subclasses might require.
        Initialize();

        // Keep looping until no Units are blocking anything.
        while (m_Tree.Count > 0)
        {
            // Pick a Unit to MoveOut.  
            Unit rb = PickVictim();

            // Subclass gave up (or is broken)
            if (rb == null)
                return int.MaxValue;

            // When the Unit gets MoveOut, any items in
            // m_Tree that were (solely) blocked by it will get
            // added to the queue.
            WhackVictim(rb);

            // Process any additional Units freed by WhackVictim
            ProcessQueue();
        }

        Console.WriteLine("{0} Moves: {1}/{2}", this.GetType().Name, m_Moves, m_MoveSize);

        return m_MoveSize;
    }

    // Intended to be overridden by child class
    protected virtual void Initialize()
    {
    }
    // Intended to be overridden by child class
    protected abstract Unit PickVictim();

    // Called by BinarySearch to re-sort m_Units as 
    // needed.  Both MoveOut and MoveDown can trigger this.
    private void CheckUnits()
    {
        if (UnitsOutOfDate)
        {
            m_Units.Sort(delegate (Unit a, Unit b)
            {
                return a.CurrentPos.CompareTo(b.CurrentPos);
            });
            UnitsOutOfDate = false;
        }
    }
    protected int BinarySearch(int value)
    {
        CheckUnits();

        int lower = 0;
        int upper = m_Units.Count - 1;

        while (lower <= upper)
        {
            int adjustedIndex = lower + ((upper - lower) >> 1);

            Unit rb = m_Units[adjustedIndex];
            int comparison = rb.CompareTo(value);
            if (comparison == 0)
                return adjustedIndex;
            else if (comparison < 0)
                lower = adjustedIndex + 1;
            else
                upper = adjustedIndex - 1;
        }

        return ~lower;
    }
    // Figure out who all is blocking someone from moving to their
    // TargetPos.  Null means no one.
    protected List<Unit> WhoIsBlockingMe(int pos, int len)
    {
        List<Unit> ret = null;

        int a1 = BinarySearch(pos);

        if (a1 < 0)
        {
            a1 = ~a1;

            if (a1 > 0)
            {
                Unit prev = m_Units[a1 - 1];
                if (prev.CurrentPos + prev.Length > pos)
                {
                    ret = new List<Unit>(2);
                    ret.Add(prev);
                }
            }
        }

        int endpoint = pos + len;
        while (a1 < m_Units.Count)
        {
            Unit cur = m_Units[a1];

            if (cur.CurrentPos < endpoint)
            {
                if (ret == null)
                    ret = new List<Unit>(2);
                ret.Add(cur);
            }
            else
            {
                break;
            }

            a1++;
        }

        return ret;
    }
    // Move a Unit "Out of the way."  This is the one we are
    // trying to avoid.  And if we *must*, we still want to
    // pick the ones with the smallest rb.Length.
    protected void MoveOut(Unit rb)
    {
        // By definition: Units that have been "MovedOut" can't be blocking anyone.
        // Should never need to do this to a Unit more than once.
        Debug.Assert(rb.CurrentPos < m_RunningLast, "Calling MoveOut on something that was already moved out");

        // By definition: Something at its target can't be blocking anything and 
        // doesn't need to be "MovedOut."
        Debug.Assert(rb.CurrentPos != rb.TargetPos, "Moving from TargetPos to Out");

        m_Moves++;
        m_MoveSize += rb.Length;

        m_RunningLast -= rb.Length;
        rb.CurrentPos = m_RunningLast;
        UnitsOutOfDate = true;
    }
    // This is the "good" move that every Unit will eventually
    // execute, moving it from CurrentPos to TargetPos.  Units
    // that have been "MovedOut" will still need to be moved
    // again using this method to their final destination.
    protected void MoveDown(Unit rb)
    {
        rb.CurrentPos = rb.TargetPos;
        UnitsOutOfDate = true;
    }
    // child of rb has been moved, either out or down.  If
    // this was rb's last child, it's free to be MovedDown.
    protected void UnChild(Unit rb, Unit child)
    {
        if (rb.UnChild(child) == 0)
            m_ZeroChildren.Enqueue(rb);
    }
    // rb is being moved (either MoveOut or MoveDown).  This
    // means that all of the things that it was blocking now
    // have one fewer thing blocking them.
    protected void FreeParents(Unit rb)
    {
        List<Unit> list;

        // Note that a Unit might not be blocking anyone, and so
        // would not be in the tree.
        if (m_Tree.TryGetValue(rb, out list))
        {
            m_Tree.Remove(rb);

            foreach (Unit rb2 in list)
            {
                // Note that if rb was the last thing blocking rb2, rb2
                // will get added to the ZeroChildren queue for MoveDown.
                UnChild(rb2, rb);
            }
        }
    }
    protected void ProcessQueue()
    {
        // Note that FreeParents can add more entries to the queue.
        while (m_ZeroChildren.Count > 0)
        {
            Unit rb = m_ZeroChildren.Dequeue();

            FreeParents(rb);
            MoveDown(rb);
        }
    }
    protected bool IsMovedOut(Unit rb)
    {
        return (rb == null) || (rb.CurrentPos >= m_RunningLast) || (rb.CurrentPos == rb.TargetPos);
    }
    private void BuildTree()
    {
        // Builds m_Tree (Dictionary<Unit, List<Units>)
        // When the Unit in the Key in is moved (either MoveOut or MoveDown), each of 
        // the Values has one less thing blocking them.

        // Victims handles the special case of Units blocking themselves.  By definition,
        // no moving of other units can free this, so it must be a MoveOut.
        List<Unit> victims = new List<Unit>();

        foreach (Unit rb in m_Units)
        {
            rb.WhoIsBlockingMe = WhoIsBlockingMe(rb.TargetPos, rb.Length);
            if (rb.WhoIsBlockingMe == null)
            {
                m_ZeroChildren.Enqueue(rb);
            }
            else
            {
                // Is one of the things blocking me myself?
                if (rb.WhoIsBlockingMe.Contains(rb))
                {
                    victims.Add(rb);
                }

                // Add each of my children to the appropriate node in m_Tree, indicating
                // they are blocking me.
                foreach (Unit rb2 in rb.WhoIsBlockingMe)
                {
                    List<Unit> list;
                    if (!m_Tree.TryGetValue(rb2, out list))
                    {
                        // Node doesn't exist yet.
                        list = new List<Unit>(1);
                        m_Tree.Add(rb2, list);
                    }
                    list.Add(rb);
                }
            }
        }

        foreach (Unit rb in victims)
        {
            WhackVictim(rb);
        }
    }
    // Take the "Victim" proposed by a subclass's PickVictim
    // and MoveOut it.  This might cause other items to get added
    // to the ZeroChildren queue (generally a good thing).
    private void WhackVictim(Unit rb)
    {
        FreeParents(rb);
        MoveOut(rb);
    }
}

Things worth highlighting here:

  • The DataDir controls where to read data. Adjust this to point to where you've downloaded (and extracted) export.tab.
  • The expectation is that child classes will choose which Unit (aka Victim) to move out of the way. Once it does, the framework will move it, along with any Units that that move frees up.
  • You might also want to pay attention to m_Tree. If I have a Unit, I can use this Dictionary to find out who all is being blocked by it (the reverse of Unit.WhoIsBlockingMe).

And here is a simple class that uses the framework. Its purpose is to tell the framework which Unit it should MoveOut next. In this case it just offers up Victims starting from the largest length and working its way down. Eventually it's going to succeed, since it will just keep offering Units until there are none left.

class LargestSize : FindOpt
{
    /// <summary>
    /// The list of Units left that are blocking someone.
    /// </summary>
    protected readonly List<Unit> m_AltTree;
    private int m_Index;
    public LargestSize()
    {
        m_AltTree = new List<Unit>();
        m_Index = 0;
    }
    protected override void Initialize()
    {
        m_AltTree.Capacity = m_Tree.Keys.Count;

        // m_Tree.Keys is the complete list of Units that are blocking someone.
        foreach (Unit rb in m_Tree.Keys)
            m_AltTree.Add(rb);

        // Process the largest Units first.
        m_AltTree.Sort(delegate (Unit a, Unit b)
        {
            return b.Length.CompareTo(a.Length);
        });
    }
    protected override Unit PickVictim()
    {
        Unit rb = null;

        for (; m_Index < m_AltTree.Count; m_Index++)
        {
            rb = m_AltTree[m_Index];
            if (!IsMovedOut(rb))
            {
                m_Index++;
                break;
            }
        }

        return rb;
    }
}

Nothing too surprising. Perhaps worth noting is that moving one Unit will often allow other Units to be moved as well (that's kinda the point of breaking a loop). Such being the case, this code uses IsMovedOut to see if the next Victim it's planning to offer has already been moved to its TargetPos. If so, we skip that one and move on to the next.

As you might imagine, LargestSize does a pretty terrible job at minimizing the size of MoveOuts (moving a total of almost 12 million "Lengths"). Although it does a pretty good job at minimizing the number of moves (895), that's not what I'm after. It's also pleasantly fast (~1 second).

LargestSize Moves: 895 / 11,949,281

A similar routine can be used to start with the smallest and work its way up. That gives way more moves (which is interesting, but not really important), and a much smaller move size (which is a good thing):

SmallestSize Moves: 157013 / 2,987,687

As I've mentioned, I've got others, some better (with as few as 294 moves) and some worse. However, my very best move sizes so far is 1,974,831. Is that good? Bad? Well, I happen to know that there's a solution that requires less than 340,000, so... pretty bad.

For completeness, here's the code to call all this:

class Program
{
    static void Main(string[] args)
    {
        FindOpt f1 = new LargestSize();
        f1.CalcBest();
    }
}

Stitch those 4 pieces together and you've got the complete test harness. To test out your own approach, just modify PickVictim in the subclass to return whatever your best guess at which Unit should be MovedOut next.

So, what's my goal here?

I'm trying to find a way to compute the "optimal" set of MoveOuts to break every loop, where optimal means smallest total Lengths. And I'm not just interested in the answer for this sample set, I'm trying to create a way to find the optimal results for any data set in a reasonable amount of time (think: seconds, not days). Thus walking all possible permutations of a dataset with hundreds of thousands of records (can you say 206858!?) is probably not the solution I'm after.

What I can't quite wrap my head around is where to start? Should I pick this Unit? Or that one? Since virtually every Unit is in a loop, every one can be freed up by moving something else. So given 2 Units, how can you say with certainty which one is going to lead to the optimal solution?

  • Smallest and Largest clearly aren't going to get you there, at least not alone.
  • Looking to see which Units would free the most parents? Tried that.
  • Looking to see who frees the biggest parents? Tried that too.
  • How about making a list of all the loops? Then you could pick the smallest in the loop. You could even figure out which Units are involved in multiple loops. Moving a Unit with a length of 10 that breaks 10 different loops seems like a better deal than a Length of 5 that only breaks 1, yes? Turns out this is much harder than you might think. There are a LOT of loops (more than will fit in my 64gig of RAM). Still, my current 'best' lies down this path. Of course my current best stinks...

What else is worth mentioning?

  • This code is written in c#, but that's just because I find it easier to prototype there. Since what I'm after is the algorithm, feel free to write in whatever language you like. The sample data set is just tab-delimited text, and using the framework is entirely optional. If your solution is written in something I can't read, I'll ask questions.
  • Remember the goal isn't (just) to figure out the optimum solution for this dataset. I want an efficient way to compute the optimum result for any data set.
  • Don't use threading/GPU/etc to speed up processing. Again: looking for an efficient algorithm. Without that, it doesn't matter what else you do.
  • Assume all Lengths, CurrentPos and TargetPos are > 0.
  • Assume TargetPos + Length never overlap each other.
  • Splitting Units into smaller lengths is not permitted.
  • In case you missed the link above, the sample data set is here. Note that in order to keep the download size down, I have omitted the (~300,000) Units that aren't blocked by loops. The framework already handles them, so they're just distracting.
  • There is 1 Unit that is blocking itself (good old 900). I left it in the dataset, but the framework already handles it explicitly.
  • There are some Units that aren't blocking anything, but still can't be moved because someone is blocking them (ie they are in m_Units and have values in their WhoIsBlockingMe, but are not in m_Tree.Keys since moving them won't free up anything else). Not sure what to do with this information. Move them first? Last? Can't see how knowing this helps, but there it is.
  • Doing some analysis, I find that roughly 1/3 of the 206,858 Units in this dataset are of length 1. In fact, 2/3 are Length 8 or less. Only 3 of them are hugely big (ie bigger than the currently known Optimal solution). Move them first? Last? Not quite sure what to do with this info either.

Is StackOverflow the best place for this question? The code is 'broken' in that it doesn't give me the result I want. I've heard of CodeGolf, but never been there. Since this is just a test harness and not production code, CodeReview seemed like a poor fit.


Edit 1: In response to the comment by @user58697, here's a loop where a single Unit (A) is blocking 10 others. For good measure, I made it a loop:

+------------+--------+-----------+
| CurrentPos | Length | TargetPos |
+------------+--------+-----------+
|        100 |     10 |      1000 | A
|        120 |     20 |        81 | B
|        140 |      1 |       101 | C
|        141 |      1 |       102 | D
|        142 |      1 |       103 | E
|        143 |      1 |       104 | F
|        144 |      1 |       105 | G
|        145 |      1 |       106 | H
|        146 |      1 |       107 | I
|       1003 |      1 |       108 | J
|        148 |     50 |       109 | K
+------------+--------+-----------+

Here we see that B is blocked by A (the last bit of B overlaps the first bit of A). Likewise the last bit of A blocks the first bit of K. C-J are obviously blocked as well. So not only is A blocking multiple blocks, its blocking lengths totaling 78, even though it is only Length 10 itself. And of course A itself is blocked by J.