This isn’t open source versus closed source. Closed source software doesn’t try to take ownership of the data it touches. This is something more. OpenAI, for example, is clear(ish) that users own the outputs of their prompts, but that users can’t use those outputs to train a competing model. That would violate OpenAI’s terms and conditions. This isn’t really different from Meta’s Llama being open to use—unless you’re competing at scale.
And yet, it is different. OpenAI seems to be suggesting that its input (training) data should be open and unfettered, but the data others use (including data that competitive LLMs have recycled from OpenAI) can be closed. This is muddy, murky new ground, and it doesn’t bode well for adoption if enterprise customers have to worry—even a little bit—about their output data being owned by the model vendors. The heart of the issue is trust and customer control, not open source versus closed source.
Exacerbating enterprise mistrust
RedMonk cofounder Steve O’Grady nicely sums up enterprise concern with AI: “Enterprises recognize that to maximize the benefit from AI, they need to be able to grant access to their own internal data.” However, they’ve been “unwilling to do this at scale” because they don’t trust the LLM vendors with their data. OpenAI has exacerbated this mistrust. The vendors that will end up winning will be those that earn customers’ trust. Open source can help with this, but ultimately enterprises don’t care about the license; they care about how the vendor deals with their data. This is just one of the reasons AWS and Microsoft were first to build booming cloud businesses. Enterprises trusted them to take care of their sensitive data.