Chinese GenAI startup DeepSeek in the last few days of January 2025 skyrocketed to the top of app stores in the United Kingdom, United States and across the globe, displacing ChatGPT and sending shockwaves across the stock markets with its open source large language model DeepSeek R1. For law firms and legal organisations more widely, its arrival presents geopolitical tensions, doubts around its provenance that could impact downstream adoption, and, if you use the web-based consumer version, massive privacy issues.
However, the risks need to be contextualised and understood. There are, perhaps, currently incomprehensible benefits that will flow downstream from this release in terms of development opportunity and cost.
Regardless of where you sit in the debate or what happens now, the GenAI floodgates are truly open.
Your privacy fears are, within reason, founded
If you, your colleagues or staff are using the web-based consumer version of DeepSeek, the first thing to note is that any data you put in can be used to improve the model and, at its bluntest, be read by the Chinese government.
The public app does indeed allow DeepSeek to collect all your personal information, IP address and the prompts you enter to improve their service. That includes monitoring interactions and usage across your devices and analysing how people are using it. DeepSeek retains the right to share your information with third parties. Oh, and there is no opt out.
Isabella Bedoya, co-CEO of Infinite Artificial Intelligence, was one of the first to make people aware that DeepSeek tracks your keystroke patterns, commenting on LinkedIn: “Didn’t we just have a whole issue last week with TikTok because it was owned by China? My friend Tara Thompson made me aware of their terms & conditions and did you know they track your keystoke patterns? No one is talking about this.”
Given that just a minute ago Tik Tok was going to be banned in the US on privacy and national security grounds, it’s surprising indeed that there wasn’t an immediate outcry from the US, although on Tuesday 28th of January, U.S. officials said they are looking at the national security implications. Italian regulators were quick off the bat to demand DeepSeek inform them of what personal data is collected for what purposes and on what legal basis. Others regulators will no doubt follow.
In the UK, law firms are looking at how to guide their staff on use of the public app, which within days of its launch had around three million downloads. Not only does it take us back to the early days of ChatGPT, with all the client confidentiality implications, but there are additional geopolitical concerns.
Speaking to Legal IT Insider, Elliot White, director of innovation and legal technology at UK top 50 firm Addleshaw Goddard, which is one of the most progressive law firms when it comes to GenAI exploration and adoption, said: “There are a few factors we have to look at that are the same as when ChatGPT first came out. So there’s the provenance of the model, so who built it and where it has come from? If we just take the consumer view for a second, there’s the normal things you’d have to consider on that anyway, such as where’s your data going? What are they doing with your data as you feed it into it, and that’s probably slightly more complex because now it’s feeding into a Chinese model, a Chinese app that is hosted out of China, so there’s probably even less control on what’s going on with your data in that sense. I’m not necessarily saying there is anything wrong with that but these factors should be considered.
“Then, if you think about it from a business perspective, the questions around provenance and open source means we have to scrutinise this even more and it potentially raises even more security considerations than we have had to deal with when considering the use of OpenAI.”
It’s important to be pragmatic and Andrew Dunkley, director of data services at Consilio, who was previously a data scientist as UK law firm BLM, says: “Privacy concerns are legitimate but probably don’t change much for enterprise. If you are already nervous of your data being stored in China, you won’t like this either – in which case don’t use it and stop your employees doing so either. Accidental employee exposure of data is the big risk here – and it does mean that companies that have never really had to think about whether their data can go to China will now need a policy on it.”
Another concern if you’re using the public version of DeekSeek is State supervision. Dunkley says: “We know DeepSeek is at the very least anticipating this because the model won’t return results around Tiananmen Square. We simply don’t know what controls are built into the model to protect Chinese state interests. This is a paranoid take, but a worst case scenario would be the model unexpectedly failing in some circumstances when implemented for critical tasks and/or interference to bias results in favour of Chinese parties in legal type use cases.”
Provenance and Distillation
In the past few days, the internet has, apparently, collectively become expert on highly technical questions such as GPU power and how much it costs to train a large language model.
In reality, of course, many of us don’t have a clue, and that’s neither here nor there, as long as we understand why those things matter: the cost of training a model has just fallen dramatically and this release is the start of an open playing field that will usher in new players from every quarter.
Speaking to people who do know what they’re talking about is key. White says: “I’ve seen different reports as to what it’s trained on, but it seems that it was less powerful Nvidia GPU hardware because of the trade restrictions that the US put in place on those chips. It’s super interesting that they’ve done it on older tech but have been able to innovate within those confines and produce a model that is much cheaper. They’ve said it’s around $6m to train it, which is ridiculously impressive when you think that other models have cost from $100m to a billion dollars to train. DeepSeek have broken the paradigm in terms of the training side of things.”
The founder and CEO of LLM governance platform Lega, Christian Lang, adds: “We now know that you don’t have to have access to the best chips to build these amazing models; you can train a model for $5m as opposed to $500m. When you don’t need the absolute top-end chips or $500M to train a cutting-edge model, we’re about to have an open season with new, amazing capabilities coming to market every minute.”
There are questions over the veracity of DeepSeek’s claims, and also suggestions that DeepSeek in fact distilled its model using OpenAI, which would take less compute and be in breach of OpenAI’s terms of service. Why is that important if you’re not America, China, Nvidia or OpenAI? Because if organisations are unsure of the provenance, it may have implications downstream. What’s slightly funny about all of this is that OpenAI arguably distilled its own large language model by eating the internet, with all the potential copyright implications of that.
Open source: The game changer
The big game changer in all of this is that DeepSeek has open sourced its model, providing the weights and training model for free on Github. Shawn Curran, founder of Jylo, which he spun out of UK law firm Travers Smith, told Legal IT Insider: “People will be able to look at the entire stack of the model and whether it passes muster in terms of being completely safe.”
The benefit of open source is that we can see if there are vulnerabilities. Curran points out that legal service providers and vendors will be able to take the code and put it on a UK server inside of their own environment, where the privacy risks are nil, to build their own models, the ramifications of which are enormous. Curran says: “The genie is out of the bottle now and I don’t think it can be stopped.”
White agrees, commenting: “DeepSeek has changed the game even more in the sense of how much compute it takes to host this type of open source model. The big difference here is that you could go and buy the hardware and do it yourself now. None of us were capable of doing that before because of the huge compute requirements so we all essentially connect into OpenAI or Claude or Anthropic and use their services.”
White adds: “Let me caveat all of this by saying that we are not suddenly going to start using this model, because we haven’t got deep enough into it. But the fact that it’s open source should allow people to get under the skin of it and start to understand it. I think what we’ll see is lots of people copying this model, and it’s going to be good that you’ll potentially have open source models that companies can use that will be as powerful as the kind of chat function that OpenAI are currently running.”
Digital strategist Tara Waters, who is the former chief digital officer of UK top 20 law firm Ashurst, also says that caution and common sense are clearly required, but she notes: “In the meantime, exploring the underlying model (which has been open-sourced) in safe and ring-fenced environments seems to be a good way forward—and a path many have already started on fortunately!”
Conclusion
The bigger picture of the release of R1 is profound. Dunkley says: “Regardless of the privacy position, the genie is out of the bottle re low compute cost for GenAI. DeepSeek has shown that costs can be radically lowered. Microsoft, Google and Amazon will respond by building new models that follow suit, and fast. When they do we can expect that AI will become even more viable for high volume use cases and AI will permeate even deeper into tools and our lives generally.”
For anyone hoping that the future of GenAI will be less complex, dream on. Lang says: “R1 is significant not just for what it is, but for what it signals: a future where the rapid evolution of new AI models—models that excel at different tasks, run in different environments, and operate at different cost points—creates both complexity and opportunity. To turn that opportunity into competitive advantage, forward-thinking law firms must seamlessly manage models from multiple providers, rapidly test and adopt new capabilities without sacrificing data control, and ensure compliance with internal, regulatory, and client requirements.”
R1 is just the beginning. White says: “It’s crazy how fast this is moving. I read an article the other day saying this is the equivalent of someone producing a $50 iPhone. If people want to understand the scale of what this means, that’s a really great way to describe it.”