Meta built a code-generating AI model similar to Copilot

Meta says it has created a generative coding AI tool similar to GitHub’s Copilot.

The company made the announcement at an event focused on its AI infrastructure efforts, including custom chips that Meta is building to accelerate the training of generative AI models. The coding tool, called CodeCompose, isn’t publicly available – at least not yet. But Meta says its teams use it internally to get code suggestions for Python and other languages ​​as they type IDEs like VS Code.

“The underlying model is built on public research from [Meta] that we tailored to our internal use cases and code bases,” said Michael Bolin, a software engineer at Meta, in a pre-recorded video. “On the product side, we can integrate CodeCompose into any surface where our developers or data scientists work with code.”

The largest of several CodeCompose models Meta has trained has 6.7 billion parameters, just over half the number of parameters in the model Copilot is based on. Parameters are the parts of the model learned from historical training data and essentially determine the model’s proficiency for a problem, such as text generation.

CodeCompose is aligned with Meta’s own code, including internal libraries and frameworks written in Hack, a Meta-developed programming language, so it can incorporate it into its programming suggestions. And the basic training dataset was filtered for bad coding practices and errors, such as outdated APIs, to reduce the chance that the model recommends a problematic piece of code.

Image Credits: meta

In practice, CodeCompose makes suggestions such as annotations and import statements as a user types. The system can complete single lines of code or multiple lines, optionally filling in very large chunks of code.

“CodeCompose can take advantage of the surrounding code to make better suggestions,” Bolin continues. “It can also use code comments as a signal when generating code.”

Meta claims that thousands of employees accept suggestions from CodeCompose every week and the acceptance rate is over 20%.

However, the company did not address the controversies surrounding code-generating AI.

Microsoft, GitHub and OpenAI are being sued in a class action lawsuit accusing them of violating copyright law by allowing Copilot to regurgitate portions of licensed code without providing credit. Liability aside, some legal experts have suggested that AI like Copilot could put companies at risk if they unknowingly include copyrighted suggestions from the tool in their production software.

It’s unclear whether CodeCompose was also trained on licensed or copyrighted code – even by accident. When reached for comment, a Meta spokesperson had this to say:

“CodeCompose is trained on InCoder, which was released by Meta’s AI research division. In a paper describing InCoder, we note that to train InCoder, “we collect a corpus of (1) public code with permissive, non-copyrighted, open source licenses from GitHub and GitLab and (2) StackOverflow- questions, answers and comments. The only additional training we do for CodeCompose is Meta’s internal code.”

Generative encryption tools can also introduce insecure code. According to a recent Stanford study, software engineers using code-generating AI systems are more likely to cause security vulnerabilities in the apps they develop. While the study didn’t specifically look at CodeCompose, it stands to reason that developers using it would fall victim to it.

Bolin emphasized that developers don’t have to follow CodeCompose’s suggestions and that security was a “most important consideration” when creating the model. “We are extremely excited about our progress on CodeCompose to date, and believe our developers are best served by bringing this work in-house,” he added.

Leave a Comment