This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512


On Wed, Jul 24, 2013 at 9:52 PM, OndÅej BÃlka <neleai@seznam.cz> wrote:
> On Wed, Jul 24, 2013 at 08:25:14AM -1000, Richard Henderson wrote:
>> On 07/24/2013 05:23 AM, Richard Biener wrote:
>> > "H.J. Lu" <hjl.tools@gmail.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> Here is a patch to extend x86-64 psABI to support AVX-512:
>> >
>> > Afaik avx 512 doubles the amount of xmm registers. Can we get them callee saved please?
>>
>> Having them callee saved pre-supposes that one knows the width of the register.
>>
>> There's room in the instruction set for avx1024.  Does anyone believe that is
>> not going to appear in the next few years?
>>
> It would be mistake for intel to focus on avx1024. You hit diminishing
> returns and only few workloads would utilize loading 128 bytes at once.
> Problem with vectorization is that it becomes memory bound so you will
> not got much because performance is dominated by cache throughput.
>
> You would get bigger speedup from more effective pipelining, more
> fusion...

ISTR that one of the main reason "long" vector ISA's did so well on
some workloads was not that the vector length was big, per se, but
rather that the scatter/gather instructions these ISA's typically have
allowed them to extract much more parallelism from the memory
subsystem. The typical example being sparse matrix style problems, but
I suppose other types of problems with indirect accesses could benefit
as well. Deeper OoO buffers would in principle allow the same memory
level parallelism extraction, but those apparently have quite steep
power and silicon area cost scaling (O(n**2) or maybe even O(n**3)),
making really deep buffers impractical.

And, IIRC scatter/gather instructions are featured as of some
recent-ish AVX-something version. That being said, maybe current
cache-based memory subsystems are different enough from the vector
supercomputers of yore that the above doesn't hold to the same extent
anymore..


--
Janne Blomqvist


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]