I respect your craft anon for you have posted a siterip and not just requested. Let me share with you what I know
1) porno siterips - lots on here, otherwise use bt4rg and just search for "siterip" and you get a lot, mostly prons
2) Archive Team.
Definitely check out (2). Archive Team is a community of autists that rips many sites (non-pron) and uploads to
archive.org. Most of their stuff is in WARC. So you have off the top of my head, pastebin is on there, a few others.
You may also want to check out "the common crawl" which is fuel for LLM/AI stuff, but it is a siterip of the whole internet (a webcrawl). There are releases per each year and it's recommended if you want to do it right, you download every single release and compact them into one file, bit of a pain.
You can use bt4rg to find whatever, for example I have stack exchange, some twitter, all of reddit (pre API change), etc.
But really, again, look up Archive Team, those guys really go hard and have a massive amount of data, insane. it's clunkified and buried inside
archive.org so you need to learn up on using that site and some python download tools, but if you go down that rabbit hole you'll find tons and tons and tons of wild shit.