This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Bug in mbsrtowcs?


On Feb 13 15:54, Jeff Johnston wrote:
> Corinna Vinschen wrote:
>> On Feb 13 14:36, Jeff Johnston wrote:
>>   
>>> Corinna Vinschen wrote:
>>>     
>>>> Hang on.  If _mbrtowc_r encounters an incomplete MB char then it does
>>>> not form an invalid character so there's no reason to return with -1 and
>>>> set errno to EILSEQ.  However, it also doesn't form a *valid* character,
>>>> it's just incomplete.  Thus it must be the start of the last character
>>>> at the end of the input string.
>>>>
>>>>         
>>> This code is there because it means that the character has redundant 
>>> shift state.  From mbrtowc:
>>>
>>> (*size_t*)-2
>>>    If the next /n/ bytes contribute to an incomplete but potentially
>>>    valid character, and all /n/ bytes have been processed (no value is
>>>    stored). When /n/ has at least the value of the {MB_CUR_MAX} macro,
>>>    this case can only occur if /s/ points at a sequence of redundant
>>>    shift sequences (for implementations with state-dependent encodings).
>>>
>>> In our case, n is MB_CUR_MAX so it must be redundant shift sequence.  The 
>>> state is stored so if we increase the src pointer, it should continue 
>>> where it left off.
>>>     
>>
>> Uhh, that was what I was missing, now I understand.  However, it's still
>> at the end of the string so we can return from the function at this
>> point.
>>
>>   
> No, it isn't at the end of the string.  It has processed all MB_CUR_MAX 
> bytes we asked it to successfully and not completed the character.  The 
> reason is that there are stupid shift states added (e.g. Out In Out which 
> eats up extra bytes without doing anything and so if we only ask it to 
> process MB_CUR_MAX bytes we don't see the end of the last shift sequence).  
>  End of string will cause a different error.  Does that explain it better?

But that's not the case for _mbsnrtowcs_r.  It always calls _mbrtowc_r
with the current value of nms.  Thus, if the return value is -2, we
can skip nms bytes *and* return, because nms is 0 afterwards anyway.


Corinna

-- 
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]