From: Peter Williams Problem: As the comment above the calculation of max_pull in the function states, there is a need to ensure that negative results of the subtractions do not wrap around to large numbers. This has not been implemented for the (max_load - busiest_load_per_task) expression and the possible consequences are for undesirable movement of tasks from one group to another group. E.g. consider a numa system with two nodes, each node containing four processors. If there are two processes in node-0 and with node-1 being completely idle, one of those processes will be moved to node-1 whereas the desired behavior is to retain those two processes in node-0. Fix: Make sure that max_load is greater than busiest_load_per_task before making the calculation. If it isn't max_pull will be zero and we skip directly to out_balanced. Signed-off-by: Peter Williams Cc: Ingo Molnar Signed-off-by: Andrew Morton --- kernel/sched.c | 2 ++ 1 files changed, 2 insertions(+) Index: linux-2.6.17-rc4-ck1/kernel/sched.c =================================================================== --- linux-2.6.17-rc4-ck1.orig/kernel/sched.c 2006-05-15 21:51:38.000000000 +1000 +++ linux-2.6.17-rc4-ck1/kernel/sched.c 2006-05-15 21:51:39.000000000 +1000 @@ -2183,6 +2183,8 @@ find_busiest_group(struct sched_domain * * by pulling tasks to us. Be careful of negative numbers as they'll * appear as very large values with unsigned longs. */ + if (max_load <= busiest_load_per_task) + goto out_balanced; /* Don't want to pull so many tasks that a group would go idle */ max_pull = min(max_load - avg_load, max_load - busiest_load_per_task);