Google's tcmalloc is fully threadsafe but just as fast as dlmalloc. This means we can avoid having two copies and won't need to follow weird threadsafety disciplines.
Created attachment 4057 [details] Do it (also move a bit more stuff into kxmlcore)
Created attachment 4060 [details] improved version I forgot to fall back to normal malloc in debug builds. Also fixed a few things that were ifdef'd wrong.
Comment on attachment 4060 [details] improved version I talked with mjs about this on IRC.