AI2 develops a large language model optimized for science

PALM 2. GPT-4. The list of text-generating AI is practically growing by the day.

Most of these models are walled off behind APIs, making it impossible for researchers to see exactly what drives them. But increasingly, community efforts are yielding open source AI that is just as advanced, if not more so, than their commercial counterparts.

The latest of these efforts is the Open Language Model, a large language model that will be released sometime in 2024 by the non-profit Allen Institute for AI Research (AI2). Open Language Model, or OLMo for short, is being developed in collaboration with AMD and the Large Unified Modern Infrastructure consortium, which provides supercomputing power for training and education, as well as Surge AI and MosaicML (providing data and training code).

“The research and technology communities need access to open language models to advance this science,” Hanna Hajishirzi, the senior director of NLP research at AI2, told BlogRanking in an email interview. “With OLMo, we are working to close the gap between public and private research capacities and knowledge by building a competitive language model.”

You might ask – including this reporter – why AI2 felt the need to develop an open language model when there are already several to choose from (see Bloom, Meta’s LLaMA, etc). As Hajishirzi sees it, while the open source releases to date have been valuable and even groundbreaking, they have missed the mark in several ways.

AI2 sees OLMo as a platform, not just a model – a platform that allows the research community to use or try to improve any part AI2 creates. Everything AI2 creates for OLMo will be openly available, Hajishirzi says, including a public demo, training dataset and API, and documented with “very limited” exceptions under “appropriate” licenses.

“We are building OLMo to create more access for the AI ​​research community to work directly on language models,” said Hajishirzi. “We believe the wide availability of all aspects of OLMo will enable the research community to take what we create and improve it. Our ultimate goal is to build together the best open language model in the world.”

The other differentiator of OLMo, according to Noah Smith, senior director of NLP research at AI2, is a focus on enabling the model to better leverage and understand textbooks and academic papers, as opposed to, say, code. Other attempts have been made, such as Meta’s infamous Galactica model. But Hajishirzi believes AI2’s work in academia and the tools it has developed for research, such as Semantic Scholar, will help make OLMo “ideally suited” for scientific and academic applications.

“We believe OLMo has the potential to become something very special in the field, especially in a landscape where many are rushing to monetize their interest in generative AI models,” said Smith. “AI2’s unique ability to act as outside experts gives us the opportunity not only to work with our own world-class expertise, but also to partner with the strongest minds in the industry. Therefore, we believe our rigorous, documented approach will lay the groundwork for building the next generation of safe, effective AI technologies.”

That sure is a nice feeling. But what about the thorny ethical and legal issues surrounding training — and releasing — generative AI? The debates raging over the rights of content owners (among other concerned stakeholders), and myriad nagging issues, have yet to be settled by the courts.

To address concerns, the OLMo team plans to work with AI2’s legal department and yet-to-be-determined outside experts, stopping at “checkpoints” in the model building process to reassess privacy and intellectual property rights .

“We hope that through an open and transparent dialogue about the model and its intended uses, we can better understand how to reduce bias and toxicity and shed light on outstanding research questions within the community, ultimately resulting in one of the strongest available models. ‘ said Smith.

What about the potential for abuse? Models, often toxic and biased to begin with, are ripe for bad actors looking to spread disinformation and generate malicious code.

Hajishirzi said AI2 will use a combination of licensing, model design and selective access to the underlying components to “maximize scientific benefits while reducing the risk of harmful use.” To steer policy, OLMo has an ethics review committee with internal and external advisors (AI2 wouldn’t say who exactly) who will provide feedback during the modeling process.

We’ll see how much difference that makes. For now, a lot is up in the air – including most of the model’s tech specs. (AI2 has revealed it will have about 70 billion parameters. Parameters are the parts of the model learned from historical training data.) Training will begin on LUMI’s supercomputer in Finland – the fastest supercomputer in Europe, starting in January – in the coming months.

AI2 invites collaborators to contribute to – and critique – the model development process. Those interested can contact the OLMo project organizers here.

Leave a Comment