Mistral launches a moderation API

2 weeks ago 3

AI startup Mistral has launched a new API for content moderation.

The API, which is the same API that powers moderation in Mistral’s Le Chat chatbot platform, can be tailored to specific applications and safety standards, Mistral says. It’s powered by a fine-tuned model (Ministral 8B) trained to classify text in a range of languages, including English, French, and German, into one of nine categories: sexual, hate and discrimination, violence and threats, dangerous and criminal content, self-harm, health, financial, law, and personally identifiable information.

The moderation API can be applied to either raw or conversational text, Mistral says.

“Over the past few months, we’ve seen growing enthusiasm across the industry and research community for new AI-based moderation systems, which can help make moderation more scalable and robust across applications,” Mistral wrote in a blog post. “Our content moderation classifier leverages the most relevant policy categories for effective guardrails and introduces a pragmatic approach to model safety by addressing model-generated harms such as unqualified advice and PII.”

AI-powered moderation systems are useful in theory. But they’re also susceptible to the same biases and technical flaws that plague other AI systems.

For example, some models trained to detect toxicity see phrases in African-American Vernacular English (AAVE), the informal grammar used by some Black Americans, as disproportionately “toxic.” Posts on social media about people with disabilities are also often flagged as more negative or toxic by commonly used public sentiment and toxicity detection models, studies have found.

Mistral claims that its moderation model is highly accurate — but also admits it’s a work in progress. Notably, the company didn’t compare its API’s performance to other popular moderation APIs, like Jigsaw’s Perspective API and OpenAI’s moderation API.

“We’re working with our customers to build and share scalable, lightweight, and customizable moderation tooling,” the company said, “and will continue to engage with the research community to contribute safety advancements to the broader field.”

Kyle Wiggers is a senior reporter at TechCrunch with a special interest in artificial intelligence. His writing has appeared in VentureBeat and Digital Trends, as well as a range of gadget blogs including Android Police, Android Authority, Droid-Life, and XDA-Developers. He lives in Brooklyn with his partner, a piano educator, and dabbles in piano himself. occasionally — if mostly unsuccessfully.

Subscribe for the industry’s biggest tech news

Related

Read Entire Article