This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: 64-bit emacs crashes a lot


On 16/08/2013 12:34 AM, Ryan Johnson wrote:
On 15/08/2013 10:35 PM, Ken Brown wrote:
On 8/15/2013 4:55 PM, Ryan Johnson wrote:
At this point I'm pretty confident it's memory corruption of some kind.
Consider the following semi-STC:
1. Invoke: emacs-nox -Q; echo -e "att $(jobs -p)\nc" > /dev/clipboard; fg
2. ^Z
3. (switch to window running gdb and hit [shift]+[insert] to paste from
clipboard)
5. (switch to window running emacs): M-x compile C-a C-k ls [ret]
6. C-x o (to switch to the compilation output window)
7. Hit 'g' to keep repeating the "compilation" until gdb picks up a crash.

I tried a simpler version of this (without gdb and without suspending/resuming):

1. Invoke 'emacs-nox -Q' in mintty.

2. M-x compile C-a C-k ls RET

3. C-x o

4. Hit 'g' repeatedly.

I got it to abort with Fatal error 6 after slightly over 100 repetitions.

I then tried the same thing with emacs-X11 (running under X, not in mintty). I hit 'g' 200 times without a problem. I repeated this with emacs-w32, again 200 times without a problem.

So there's a bug somewhere. But if it's an emacs bug, it's strange that it only occurs with emacs-nox and not with either of the GUI versions of emacs.
Well, at least I'm not (necessarily) crazy or BLODA-infested... out of curiosity, can you repro with 32-bit emacs-nox? I don't remember 32-bit being so crash-happy, which makes me wonder if something about 64-bit cygwin interacts poorly with emacs.

This is really weird... I got a crash in emacs compiled with `-g -O0', but it makes no sense:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 7160.0xf70]
0x0000000100535d0f in regex_compile (pattern=0x6000ac580 "\\(?:^\\|:: \\|\\S ( \\)\\(/[^ \n\t()]+\\)(\\([0-9]+\\))\\(?:: \\(warning:\\)?\\|$\\| ),\\)", size=75, syntax=3408388, bufp=0x10095dc30 <searchbufs+6512>) at regex.c:3627
3627                  || pending_exact + *pending_exact + 1 != b
bt
#0 0x0000000100535d0f in regex_compile (pattern=0x6000ac580 "\\(?:^\\|:: \\|\\S ( \\)\\(/[^ \n\t()]+\\)(\\([0-9]+\\))\\(?:: \\(warning:\\)?\\|$\\| ),\\)", size=75, syntax=3408388, bufp=0x10095dc30 <searchbufs+651\
2>) at regex.c:3627

The variable pending_exact has value 0x0, which would be a Bad Thing... except that the code looks like this:
          if (!pending_exact

              /* If last exactn not at current position.  */
=>            || pending_exact + *pending_exact + 1 != b

... with corresponding assembly code looking very reasonable:
   0x0000000100535cfa <regex_compile+34482>:    cmpq   $0x0,0x3f8(%rbp)
0x0000000100535d02 <regex_compile+34490>: je 0x100535eca <regex_compile+34946>
   0x0000000100535d08 <regex_compile+34496>:    mov 0x3f8(%rbp),%rax
=> 0x0000000100535d0f <regex_compile+34503>:    movzbl (%rax),%eax
   0x0000000100535d12 <regex_compile+34506>:    movzbl %al,%eax
   0x0000000100535d15 <regex_compile+34509>:    lea 0x1(%rax),%rdx
   0x0000000100535d19 <regex_compile+34513>:    mov 0x3f8(%rbp),%rax
   0x0000000100535d20 <regex_compile+34520>:    add %rdx,%rax
   0x0000000100535d23 <regex_compile+34523>:    cmp %rbx,%rax
0x0000000100535d26 <regex_compile+34526>: jne 0x100535eca <regex_compile+34946>

Something apparently set 0x3f8(%rbp) to NULL during the very small window between the cmpq and the mov two instructions later.

A second crash hit here:
#1 0x000000010052d589 in re_iswctype (ch=80, cc=RECC_ALPHA) at regex.c:2087

The default branch was taken even though cc should have matched the RECC_ALPHA case:
  switch (cc)
    {
    case RECC_ALNUM: return ISALNUM (ch) != 0;
    case RECC_ALPHA: return ISALPHA (ch) != 0;
    case RECC_BLANK: return ISBLANK (ch) != 0;
    ....
    case RECC_ERROR: return false;
    default:
=>    abort ();
    }

This time there's a jump table involved at machine code level, so I couldn't easily go deeper into why the wrong jump target was chosen.

A third crash:
#1 0x0000000100541930 in re_match_2_internal (bufp=0x10095ce20 <searchbufs+2912>, string1=0x0, size1=0, string2=0x6fffff00028 "-*- mode: compilation; default-directory: \"~/\" -*-\nCompilation started at Fri Aug 16 01:32:19\n\nls\n#message-20130808-090732#\t emacs-crash.txt\t\tmusic\n6b8ob06a.default.tar.xz\t\t emacs-nox.exe."..., size2=355, pos=254, regs=0x10095def0 <search_regs>, stop=317) at regex.c:6217
6217              abort ();
This time, p (the subject of the case statement) points to 0x76b3b6c7, which is the middle of a function (ntdll!RtlFillMemory, though the memory map places that address smack in the middle of kernel32.dll instead). This time it makes perfect sense that the switch statement should fail, but how did p go so wrong?

Even more strangely, it seems to be deterministic: a second crash there had exactly the same address as before.

The fifth crash was a repeat of the NULL pending_exact scenario that came first.

One last observation, or perhaps just superstition: if gdb reports a single thread being created at some point during the compile-fest, a crash usually follows soon after. If no threads are created after gdb attaches and continues, or if two threads are created in quick succession , the crash never comes (where "never" = 300+ successful compiles). I have no idea why that would mean anything, though...

I'm officially stumped at this point... any ideas?

Ryan




--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]