2

I have two tables that I'm attempting to do an inner join with.

One is a users table where the primary key is id.

Another table is bars where user_id is a foreign key. bars also has a column called foo_id where food_id is a foreign key to the foos table.

I am trying to put together an ActiveRecord query where I can select all users that were created on or before N days ago and do not have any foos where bars.foo_id equal to a particular id. I tried doing something like this:

users = User.where("users.created_at <= ?", 50.days.ago).joins(:bars).where("bars.foo_id != 5")

This query fields over 30,000 results, which is incorrect, cause the Users table only has 12,000 rows.

What exactly am I doing wrong here?

0

4 Answers 4

2

You're getting your join math wrong and it's having the effect of creating rows for each user + foo combination. This is how a full join works. The reason for this slip is because you haven't actually joined the bars table to the users table. Normally you have to join with a condition, like in your case bars.user_id=users.id would be a good idea.

That being said, what you want to do instead is determine which users qualify, then load those:

users = User.where('id IN (SELECT DISTINCT user_id FROM bars WHERE bars.foo_id!=?)', 5)

This sub-select, if run on its own, should return simply a list of users without that particular foo. Using this as a WHERE condition should load only those users.

1
  • It's not doing a full join - if there is an association called :bars then joins(:bars) joins that table with an appropriate on clause (and raises an error if there is no such association) Commented Nov 8, 2012 at 9:10
2

try using this

    User.includes(:bars).where("users.created_at <= ?", 50.days.ago).where("bars.foo_id != 5")
1

This should work -

User.joins(:bars).where("bars.foo_id != ? and users.created_at <= ?", 5, 50.days.ago).select("distinct users.*")

It'll generate following sql -

 select distinct users.* from users 
 INNER JOIN bars on bars.user_id = user.id
 WHERE bars.foo_id != 5 and users.created_at <= '2012-09-19 10:59:54'
0

As it is your join will cause one row in the result set for every user/bar condition satisfying the where clause: if a user has 5 bars it will appear 5 times.

You can't just fix this with a distinct as that would select all the rows where the join could be realised: all the users with at least 1 bar whose foo_id is not 5 instead of all the users with no such bars

You can do this with a left join:

User.joins("left join bars on bars.user_id = users.id and bars.foo_id = 5").where("users.created_at < ? AND bars.id is null", 50.days.ago")

This tries to join 'bad' bars (foo_id=5) to the users. Because it's a left join, if no such bar exists then a row will be returned in the result set with all the columns for the bars table set to null, which you can then filter on. It's important to have the condition on the bars (foo_id = 5) in the ON clause so that it contributes to how the join is done rather than filtering after the join.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.