• 0 Posts
  • 6 Comments
Joined 2 months ago
cake
Cake day: June 4th, 2025

help-circle
  • From the original post:

    Sure, they have huge GPU clusters, but there must be more going on - model optimizations, sharding, custom hardware, clever load balancing, etc.

    What engineering tricks make this possible at such massive scale while keeping latency low?

    The amount of hardware they’re using is far more than you’re imagining, x1000. To answer your question as simply as possible: Money. Money to buy nuclear power plants to power this specific technology. Money to market & sell the idea. Money to lobby for legislation so it can keep sucking up copyrighted works. Money, money, and money is the reason why these companies can run this stuff while OP cannot.

    Also, and mostly just my own paranoia - I don’t trust anything these LLM companies are saying. Their entire product is running on hype right now. Every one of these companies is dumping absurd amounts of money into something that barely works, and there’s no incentive for them to be honest with any of it. If the hype doesn’t perpetuate, they’ve wasted that money.