Article: Harvey Trains Open Source Models To Encode Law Firm Workflows

Harvey CEO, Winston Weinberg, has confirmed to Artificial Lawyer that they are now doing Proof of Concept studies with law firms to train open source LLMs to ‘encode’ their way of working on certain complex matters.

Weinberg underlined that the value was not just in encoding how complex work streams take place inside a law firm practice group, but there was an opportunity to include how that work is done with specific longstanding clients.

I.e. you encode the entire experience, from law firm to client, so that automation can be applied in a more precise and customised way to this recurring type of need.

The development comes as Kirkland & Ellis is working with Palantir – after its now famous ‘$500m AI investment’ announcement – which stressed that they were seeking to ‘bottle their secret sauce’ as it were, for certain areas of work. In fact, Kirkland made quite a lot out of the idea that what they were doing would be ‘unique’ and set them apart from what other law firms were capable of doing when working with legal AI platforms. And Artificial Lawyer has some thoughts on the whole secret sauce idea, more on that below.

After that announcement, AL found that Kirkland is hiring AI infrastructure experts with experience of working with GPU clusters – which strongly hinted at their own open source training strategy that will, presumably, work alongside their partnership with Palantir – but, we shall see.

Meanwhile, Thomson Reuters – see here – has also been working with open source LLMs to train them on the data giant’s huge store of legal information, which then will act as an additional AI-backed resource for their research offering.

In short, the idea of post-training open source is coming back. Previously, the strategy had been rejected on the basis that general models alone would become so good they would make such niche and customised approaches pointless. But, it seems the market is now moving back to this idea.

There are several reasons why this is so. One is a simple one: data security. A second one is that clearly legal AI experts believe that you can get better performance from specific training now. And third, that one of the triggering elements here may be agentic flows. I.e. this isn’t just about tapping general language understanding of an LLM, but putting in place more narrowly customised areas of training to support equally specific workflows, bringing together reference data, complex playbooks – i.e. digital twins of methodologies, along with the fine-tuned open source LLM that is pointed at specific legal work product for certain clients.

In short, this is all about customisation. And then through that more holistic customisation you can achieve a real improvement over the general models.

The Harvey move is part of a wider project to widen what can be achieved with legal AI. It was outlined by co-founder Gabe Pereyra last night on X.

Read more

Harvey Trains Open Source Models To Encode Law Firm Workflows

280 X180 white to post to websites – 2026-06-21T081135.523 copy