Most efficient way to query stack overflow database for a question and its answers

So I'm trying to query the stack overflow database for a question and its answers. So far I have come across two ways to do this:

SELECT questions.Id as [Post Link], questions.title, answers.body, questions.viewcount
FROM Posts answers
INNER JOIN Posts questions ON answers.parentid =

and the second way is this

SELECT * # Replace the actual fields
FROM posts 
WHERE (Id = {POST_ID}) OR (ParentId = {POST_ID})

which approach is better and why ? is there a different way to do this ? and is there a term in sql for this parent child relationship. Any topics I could study up on how to design efficient queries on this?

2 answers

  • answered 2019-11-14 05:48 Dmitrij Kultasev

    There is no such thing as "The best" here. As always with databases. Moreover these 2 queries are completely different and return different result. You can't compare them together.

    The difference is:

    • First query would return all posts with it's answers if there is at least 1 answer. If you want to return posts with the answers also, then use OUTER join for that.
    • Second query will return post as 1 row and all other posts as other rows. For example, if the post has 1 answer then you'll get 2 rows instead of 1 in the first query.

    So, there is no proper answer to your question, but I would stick with the 1st approach (just change the JOIN type there).

  • answered 2019-11-14 06:29 MichaƂ Turczyn

    If two results are just as good for you it all comes down to performance.

    In terms of performance there's couple of things you could study, as indexes for example, and how they are used be the SQL engine.

    So, in terms of performance, second query is likely to be better, because there you can query just one table instead of two (quite obvious).

    Also you have WHERE clause (and ON in first query) which greatly depend on indexes.

    Since Id columns are very often idexed, second query seems to be very efficient.