SuperAnnotate wants to help companies manage their AI data sets

4 days ago 2

High-quality data may be the key to high-quality AI. With studies finding that data set curation, rather than size, is what really affects an AI model’s performance, it’s unsurprising that there’s a growing emphasis on data set management practices. According to some surveys, AI researchers today spend much of their time on data prep and organization tasks.

Brothers Vahan Petrosyan and Tigran Petrosyan felt the pain of having to manage lots of data while training algorithms in college. Vahan went so far as to create a data management tool during his Ph.D. research on image segmentation.

A few years later, Vahan realized that developers — and even corporations — would happily pay for similar tooling. So the brothers founded a company, SuperAnnotate, to build it.

“During the explosion of innovation in 2023 surrounding models and multimodal AI, the need for high-quality datasets became more stringent, with each organization having multiple use cases requiring specialized data,” Vahan said in a statement. “We saw an opportunity to build an easy-to-use, low-code platform, like a Swiss Army Knife for modern AI training data.”

SuperAnnotate, whose clients include Databricks and Canva, helps users create and keep track of large AI training data sets. The startup initially focused on labeling software, but now provides tools for fine-tuning, iterating and evaluating data sets.

With SuperAnnotate’s platform, users can connect data from local sources and the cloud to create data projects on which they can collaborate with teammates. From a dashboard, users can compare the performance of models by the data that was used to train them, and then deploy those models to various environments once they’re ready.

SuperAnnotate also provides companies access to a marketplace of crowd-sourced workers for data annotation tasks. Annotations are usually pieces of text labeling the meaning or parts of data that models train on, and serve as guideposts for models, “teaching” them to distinguish things, places and ideas.

To be frank, there are several Reddit threads about SuperAnnotate’s treatment of the data annotators it uses, and they aren’t flattering. Annotators complain about communication issues, unclear expectations, and low pay.

For its part, SuperAnnotate claims it pays fair market rates and that its demands on annotators aren’t outside the norm for the industry. We’ve asked the company to provide more detailed information about its practices and will update this piece if we hear back.

There are several competitors in the AI data management space, including startups like Scale AI, Weka and Dataloop. San Francisco-based SuperAnnotate has managed to hold its own, however, recently raising $36 million in a Series B round led by Socium Ventures, with participation from Nvidia, Databricks Ventures, Play Time Ventures and Defy.vc.

The fresh capital, which brings SuperAnnotate’s total raised to just over $53 million, will be used for augmenting its current team of around 100, for product R&D, and for growing SuperAnnotate’s customer base of roughly 100 companies.

“We aim to build a platform capable of fully adapting to enterprises’ evolving needs and offering extensive customization in data fine-tuning,” Vahan said.

Read Entire Article