50k rule limit --- as technical limitation it mostly as inconvenience to developers. 50k is simply not enough to provide sufficient coverage, so in most cases developers simply add multiple lists of less than 50k. AdGuard for instance uses 6 Safari extensions - and even this if not enough to include all block lists.
<rdar://problem/58312879>
The factors that affect the limit are compile speed for a rule list, memory use for a compiled rule list, and memory use while compiling (if too extreme; we don't want the compile step to result in a memory use kill. And I guess match speed, but that doesn't impose as strict a constraint. We could test on more recent hardware to find other reasonable values.
Assuming there is a limit, what limit do you think would be reasonable?
<rdar://problem/36115489>
Well, the compilation speed and memory usage is an issue indeed, it is quite problematic even with the current limit. We may try looking into it, but with the current prefixes-tree implementation I don't think this would be easy to solve. Would you consider an alternative implementation of the `trigger.url-filter` syntax (maybe with a different name)? Instead of regular expressions, we could use a different syntax, the one that's supported by almost every content blocker: https://kb.adguard.com/en/general/how-to-create-your-own-ad-filters#examples In Manifest V3 Google uses pretty much the same syntax. We have a C++ implementation that we can try adapting for WebKit if we agree on details. > Assuming there is a limit, what limit do you think would be reasonable? 300k
Request to support an alternate syntax probable needs to be a separate bug. The alternate syntax does not look like it would be easier to compile or evaluate, among other things it includes regexes as a subset. So it's probably a separate topic.
(More efficient compilation and/or evaluation welcome of course, even if it uses different algorithms.)
(In reply to Maciej Stachowiak from comment #6) > Request to support an alternate syntax probable needs to be a separate bug. > The alternate syntax does not look like it would be easier to compile or > evaluate, among other things it includes regexes as a subset. So it's > probably a separate topic. Got it, I'll file a feature request and try to explain all the pros and cons. We'll need some time to do that as I'd like to prepare a dirty patch with the proposed syntax to demonstrate the difference.
The current shipping limit has been increased from 50k to 150k. Not quite 300k but a lot more than it used to be.
(In reply to Maciej Stachowiak from comment #9) > The current shipping limit has been increased from 50k to 150k. Not quite > 300k but a lot more than it used to be. Awesome news, thank you!