...
Found the issue. It's not a polyvariant call - it's actually a polymorphic call. In richards, there is a call to the task run method call that can go to one of 4 task run methods (IdleTask, DeviceTask, WorkerTask, or HandlerTask). The DeviceTask ends up being the most commonly called, but it's the last to be called. So with call edge profiling we detect that we should emit: if (callee = DeviceTask run) inline DeviceTask run else polymorphic call This ends up producing a healthy speed-up. But with the new polymorphic call inline caching, we see the first three device types to be called (IdleTask, WorkerTask, and HandlerTask) and then we see a very high slow path count because the DeviceTask is what we usually end up calling. This *might* be an example of why we want per-callee frequencies. But it might also just be a good reason to increase the number of callees we allow in the poly call IC. I'll mess with that.
(In reply to comment #1) > Found the issue. It's not a polyvariant call - it's actually a polymorphic > call. In richards, there is a call to the task run method call that can go > to one of 4 task run methods (IdleTask, DeviceTask, WorkerTask, or > HandlerTask). The DeviceTask ends up being the most commonly called, but > it's the last to be called. So with call edge profiling we detect that we > should emit: > > if (callee = DeviceTask run) > inline DeviceTask run > else > polymorphic call > > This ends up producing a healthy speed-up. But with the new polymorphic > call inline caching, we see the first three device types to be called > (IdleTask, WorkerTask, and HandlerTask) and then we see a very high slow > path count because the DeviceTask is what we usually end up calling. > > This *might* be an example of why we want per-callee frequencies. But it > might also just be a good reason to increase the number of callees we allow > in the poly call IC. I'll mess with that. WRONG BUG.