Content area

Abstract

PlayStation 2's online network gives its users the opportunity to play against others from across the country and around the globe, all from the comfort of their own homes. To help protect its younger users by providing a family-friendly experience, SCEA needed a way to be able to filter out vulgarity in real time from all the instances where users are given the opportunity to submit text. To implement a family-friendly filter robust enough and capable of working in near-real time, SCEA turned to Teragram. When Teragram's software identifies a vulgar word or phrase, it can then remove the offensive term or the entire text thread. By all accounts, SCEA's implementation of Teragram's software has been an unqualified success.

Full text

Turn on search term navigation
 

A subsidiary of Sony Corporation of America, Sony Computer Entertainment America (SCEA) markets the PlayStation (PS) family of products and develops, publishes, markets, and distributes software for the PS 1 console and the PS 2 computer entertainment system. Additionally, SCEA has developed and currently manages a global online gaming network for its PlayStation 2 console. In the five years since its introduction, Sony has shipped more than 100 million PlayStation 2 units worldwide.

www.us.playstation.com

BUSINESS CHALLENGE

Every month, PlayStation 2's online network draws millions of users from across a wide range of demographics, bringing together players of varying ages and cultural backgrounds. Once in the system, users generate their own screen names, input titles for their games, and can communicate with fellow players via text messaging. All this text is potentially visible to the entire online community, creating the need for a way to monitor these lines of text and filter out anything vulgar or that wouldn't be considered family-friendly. "We want to work with teams that can help us protect our users from any of the normal Internet badness that you get on PCs," says Glen Van Datta, director of online technology, SCEA.

VENDOR OF CHOICE: TERAGRAM CORPORATION

Founded in 1997, Teragram Corporation has grown to be a leader in multilingual natural language processing technologies. "The name of the company reflects our mission," says Yves Schabes, president and co-founder of Teragram. "Gram reflects something written down. Tera refers to a large scale. Ergo, we are a provider of linguistic technology that works at an extremely large scale."

Teragram's customers include such Web giants as Yahoo!, AOL, and Ask.com; major publishing companies and news organizations like the New York Times, Elsevier, and Forbes.com; and major corporations like HP, Toshiba, and SCEA. Teragram's business is split into two separate but interrelated halves. "The company has two assets. One is our dictionaries that we write and maintain in more than 30 languages. The other half is writing software that uses those dictionaries at a very large scale, using a lot of patternmatching and trying to get at the meaning of the words," says Schabes.

www.teragram.com

THE PROBLEM IN DEPTH

PlayStation 2's online network gives its users the opportunity to play against others from across the country and around the globe, all from the comfort of their own homes. The reach of this network, combined with the console's broad and diverse customer base, brings together users of all ages and from many different cultures. To help protect its younger users by providing a family-friendly experience, SCEA needed a way to be able to filter out vulgarity in real time from all the instances where users are given the opportunity to submit text. "It's something we believe is important to ensure our community is safe and enjoyable," says Van Datta.

Implementing such a filter is easier said than done, though, especially with PlayStation 2's global footprint. "The big players are the U.S., which has essentially one language we deployed in; then Japan and Korea, which also only have one language each; and then there's Europe, where we had to encompass 21 different languages," says Van Datta.

There are also the pitfalls inherent in trying to identify what is and is not family-friendly, considering the slipperiness of any individual language. "The question of classifying if some interaction is family-friendly or not can be pretty tricky sometimes. A lot of words can be ambiguous. A lot of things that don't look to be family-friendly at first can turn out to be fine, and vice versa. Additionally, meanings can change all the time," says Van Datta. "XXX, for example, used to not be family-friendly, until a movie came out called XXX by Sony."

View Image - Using this network adapter, the 100 million PS 2 owners worldwide are able to play against each other on SCEA's global online network.

Using this network adapter, the 100 million PS 2 owners worldwide are able to play against each other on SCEA's global online network.

And if that weren't enough, there's also the issue of having to deal with the linguistic creativity of users eager to route around anything they see as infringing on their ability to speak openly. "You'd be surprised how creative people can be in trying to circumvent certain words," says Van Datta. So it's not just a matter of filtering out certain words, but also all the various iterations of letters, numbers, and symbols that can be put together to refer to words that are decidedly not appropriate for the whole family.

THE SOLUTION

To implement a family-friendly filter robust enough and capable of working in near-real time, SCEA turned to Teragram. "We gave them a basic classification engine and also some initial content and rules for some languages. Sony then took our toolkit and built on top of that," says Schabes. "What we provide is the technology. We don't really suggest any possible use for it. It's completely up to our customer to decide what they're going to do."

For SCEA, that meant taking Teragram's software and installing it across their network of servers. "We have game servers that are used for matching players between each other. Even if the game is itself a peer-to-peer game, you have to come to our serves to do some matching," says Van Datta. "In those servers we've integrated in the Teragram software. For every message that conies to the server that's in text form, we then look and see if there's something in there that's some kind of text that could be considered vulgar."

When Teragram's software identifies a vulgar word or phrase, it can then remove the offensive term or the entire text thread. "We can either reject the entire message or we can reject just the word and replace it with something else," says Van Datta. "We replace that text with asterisks." But the level of filtering that Teragram's software is capable of goes beyond simply identifying a single offensive word; it can even recognize how words are being used in context. "It's very sophisticated. You can't use the words girl and sex in the same sentence, but you can say girl and sex. You can't use 69 unless there's a reference to a car in there. The Teragram software allows you to set rules like that up," says Van Datta.

Teragram's robust set of dictionaries allows SCEA to accommodate the cultural differences that can exist between how one country perceives a word versus another. "One of the beauties of Teragram's solution is that we can put any kind of language in the same dictionary, so if one word is vulgar in the UK but not in the U.S. we have a vulgarity dictionary that can distinguish between the two and react based on what country a particular user is playing in," says Van Datta.

View Image - The game 'SOCOM 3: U.S. Navy Seals' is rated Mature 17+, but Sony still endeavors to keep the content of this Web-based game family-friendly by using extensive filtering to Identify various permutations of objectionable language.

The game 'SOCOM 3: U.S. Navy Seals' is rated Mature 17+, but Sony still endeavors to keep the content of this Web-based game family-friendly by using extensive filtering to Identify various permutations of objectionable language.

Teragram's software can also be set up so that it is adaptive, helping SCEA to identify new uses of formerly safe words in vulgar ways. "We basically have set some procedures that key us in on if somebody's using a new vulgarity in a new way. So once we've identified that, we can go in and change our rules to address that," says Van Datta.

THE OUTCOME

By all accounts, SCEA's implementation of Teragram's software has been an unqualified success. "We've done some real number crunching on the Teragram stuff to figure out if we should do it ourselves or buy third-party software. In this case, Teragram's software is literally taking nanoseconds to do this filtering in the servers," says Van Datta. "It's the fastest, most efficient thing we've found. They're definitely doing everything we want and more. They're actually the only third-party software in our network; we've designed everything else completely from the ground up."

As language is a constantly evolving thing, it's up to SCEA to keep on top of this evolution to ensure the protection of its users. "The ongoing part of this is, are we doing an effective job of matching what we're doing to keep up with how the communities and our users are trying to circumvent it?" says Van Datta. "We started off with about 20 rules. We're up to 45 now. I'm pretty sure we've probably tripled or quadrupled our actual words or combination of words in our dictionary. I'm sure we don't catch 100%, but it's a constant thing to try and keep up."

But while SCEA's filters are currently running 100% of the time, there still is a major loophole in their ability to maintain a family-friendly environment. "If you want to do something that circumvents the filter, it'd be doing something with voice," says Van Datta about its online network's voice chat capabilities. "If you want to speak vulgarities in voice, that's something we can't filter."

Teragram's plans for its software and dictionaries, while to date focused primarily on large companies with huge amounts of data and documents to sort through, does include expanding out to the masses. "That's the next step, having this technology trickle down to the consumer level, and before that mediumsized businesses," says Schabes. "There is a trend towards an exponential increase in information at all levels everywhere, so the need for technology such as ours to sort through and categorize that information will become acute in the coming years."

Sidebar
AuthorAffiliation

GEOFF DAILY ([email protected]) IS A WASHINGTON, D.C.-BASED FREELANCE WRITER.

COMMENTS? EMAIL LETTERS TO THE EDITOR TO [email protected].

Copyright Information Today, Inc. May 2006