Would it make sense to use attribute ((noinline)) for functions like _M_realloc_insert?

Mon Aug 24 10:05:53 GMT 2020

On Mon, 24 Aug 2020, Groke, Paul wrote:

>> Could you share which version of gcc you are testing with, what flags you are using, and ideally even some code to reproduce this? With current gcc, I don't see that happening very often.
>
> I stumbled upon this while looking at the generated code with godbolt.org. I tried GCC 9.2 and 10.2. Flags were -std=c++17 -O2, architecture was amd64.
>
> The code where I saw this was
>
> https://godbolt.org/z/oG9EMM
> // -------------------------------------------------
>
> #include <vector>
>
> struct Foo {
>    Foo(int a, int b) : a(a), b(b) {}
>    int a;
>    int b;
> };
>
> class Bar {
> public:
>    void test(int a, int b);
>
> private:
>    std::vector<Foo> m_fooVec;
> };
>
> void Bar::test(int a, int b) {
>    m_fooVec.emplace_back(a, b);
> }

Thanks. I am a bit surprised it gets inlined even at -O2, while a similar 
example with just 1 int isn't inlined even at -O3.

>> Ideally, the inliner would already be clever enough not to inline the function except in special cases.
>
> After writing this mail, I found out that it only happens if the function is called just once in the translation unit.

That definitely affects inlining decisions, indeed.

> I wish we could use LTCG/LTO with our project. Maybe I should give that 
> a try, see how badly it will affect our build times. But since our link 
> times are already kind-of high, I don't have high hopes.

-flto=auto (or any variant that enables parallelism) doesn't necessarily 
increase the build time, sometimes it can even decrease it. It seems worth 
a try.

> Just out of curiosity, going back to...
>
>> If someone inserts an element in a newly created vector, it does make sense to inline _M_realloc_insert, especially if we want any hope of making some small local vectors use the stack, or removing some unused small vectors.
>
> Is that something that we can realistically expect in the next few 
> years? I've seen manual new/delete calls removed by the optimizer in 
> very simple examples, but I've never seen that happen with any kind of 
> container. Seems like as soon as std::allocator is involved, the calls 
> will not be optimized out.

   std::allocator<int> a;
   a.deallocate(a.allocate(42),42);

gets simplified to nothing.

An unused

   std::vector<int> a={42};

doesn't get optimized, we are left with something like

   int* p = operator new(4);
   *p = 42;
   operator delete(p, 4);

and don't remove the store in *p before delete as dead (whether we are 
allowed to do that in general is not obvious).

To enable more simplifications, I often add in my projects

   inline void* operator new(std::size_t n){return malloc(n);}
   inline void operator delete(void*p)noexcept{free(p);}
   inline void operator delete(void*p,std::size_t)noexcept{free(p);}
etc

(it is illegal, but I don't care)
and then the vector disappears completely.

-- 
Marc Glisse

Would it make sense to use __attribute__ ((noinline)) for functions like _M_realloc_insert?

Would it make sense to use attribute ((noinline)) for functions like _M_realloc_insert?