This is the mail archive of the
ecos-discuss@sources.redhat.com
mailing list for the eCos project.
Scheduler bug found and fixed
- To: ecos-discuss at sourceware dot cygnus dot com
- Subject: [ECOS] Scheduler bug found and fixed
- From: andrew dot lunn at ascom dot ch (Andrew Lunn)
- Date: Wed, 20 Sep 2000 14:02:17 +0200 (MET DST)
Hi Nick and others.
I found a bug in the mlqueue schedular. It goes something like this...
An alarm function calls cyg_thread_suspend() on the current thread, ie
the one that was running when the timer ISR went off.
cyg_thread_suspend correctly takes the thread off the run_queue[pri]
and clears the queue_map bit. The alarm func finishes, all the
remaining DSR are called.
After this it decided to do a timeslice on the clock tick. The
timeslice function then rotates the current run queue. Unfortunatly,
the run queue is now empty, so it follows a null pointer into random
memory and makes that the head of the queue. It then forces a
reschedule. This works becasue the queue_map bit is not set on the now
corrupt run_queue. The correct thread gets to run and the system seems
happy.
Then something causes an add_thread onto the now corrupt run_queue. In
my cause a call the cyg_thread_kill(). add_thread checks the
consitancy of the run_queue. It finds the queue is not empty, but the
queue_map bit is not set and throws a CYG_ASSERT.
The fix is to change the timeslice function to check the queue it
wants to rotate actually has at least one thread on it. If not ask for
a reschedule.
Here is a patch for the fix and to put an assert into the rotate
function to check for rotating and empty queue.
Andrew
Index: ChangeLog
===================================================================
RCS file: /cvs/ecos/ecos/packages/kernel/current/ChangeLog,v
retrieving revision 1.43
diff -c -r1.43 ChangeLog
*** ChangeLog 2000/09/19 05:53:59 1.43
--- ChangeLog 2000/09/20 12:01:30
***************
*** 1,3 ****
--- 1,8 ----
+ 2000-09-20 Andrew Lunn <andrew.lunn@ascom.ch>
+
+ * src/sched/mlqueue.cxx (timeslice) don't rotate an empty queue.
+ * src/sched/mlqueue.cxx (rotate) Assert queue is not empty
+
2000-09-13 Jesper Skov <jskov@redhat.com>
* tests/kexcept1.c (cause_exception): Use separate cause_fpe function.
Index: src/sched/mlqueue.cxx
===================================================================
RCS file: /cvs/ecos/ecos/packages/kernel/current/src/sched/mlqueue.cxx,v
retrieving revision 1.7
diff -c -r1.7 mlqueue.cxx
*** mlqueue.cxx 2000/09/11 02:43:00 1.7
--- mlqueue.cxx 2000/09/20 12:01:30
***************
*** 283,297 ****
cyg_priority pri = thread->priority;
Cyg_SchedulerThreadQueue_Implementation *queue = &sched->run_queue[pri];
!
! queue->rotate();
!
! if( queue->highpri() != thread )
sched->need_reschedule = true;
!
timeslice_count = CYGNUM_KERNEL_SCHED_TIMESLICE_TICKS;
}
-
CYG_ASSERT( queue_map & (1<<CYG_THREAD_MIN_PRIORITY), "Idle thread vanished!!!");
CYG_ASSERT( !run_queue[CYG_THREAD_MIN_PRIORITY].empty(), "Idle thread vanished!!!");
--- 283,298 ----
cyg_priority pri = thread->priority;
Cyg_SchedulerThreadQueue_Implementation *queue = &sched->run_queue[pri];
! if ( !queue->empty() ) {
! queue->rotate();
!
! if( queue->highpri() != thread )
sched->need_reschedule = true;
! } else {
! sched->need_reschedule = true;
! }
timeslice_count = CYGNUM_KERNEL_SCHED_TIMESLICE_TICKS;
}
CYG_ASSERT( queue_map & (1<<CYG_THREAD_MIN_PRIORITY), "Idle thread vanished!!!");
CYG_ASSERT( !run_queue[CYG_THREAD_MIN_PRIORITY].empty(), "Idle thread vanished!!!");
***************
*** 656,662 ****
Cyg_ThreadQueue_Implementation::rotate(void)
{
CYG_REPORT_FUNCTION();
!
queue = queue->next;
CYG_REPORT_RETURN();
--- 657,664 ----
Cyg_ThreadQueue_Implementation::rotate(void)
{
CYG_REPORT_FUNCTION();
! CYG_ASSERT(queue != 0, "Rotating an empty queue");
!
queue = queue->next;
CYG_REPORT_RETURN();