Configuring speech

Rapid advances in voice technology have edged the application into the mainstream. Habib Talhami says the development will change the way businesses communicate.

  • E-Mail
By  Caroline Denslow Published  July 3, 2005

|~|main_buid_interview.jpg|~|Habib Talhami’s latest work as the head of the institute of infomatics at the BuiD involved the indexing of Arabic speech databases using speech recognition.|~|The proliferation of personal computing, the internet and mobile phones have opened up application access to millions of users. However, while most applications are data-centric, there is a growing push from the likes of Microsoft and IBM to advance voice technology and integrate them with existing business applications. Habib Talhami, senior lecturer and acting head of the institute of informatics at the British University in Dubai (BuiD), claims speech technology is starting to become mainstream and is changing the way businesses communicate internally and externally. Talhami, whose latest work involved the indexing of Arabic speech databases using advanced speech recognition technology, talked to IT Weekly about the issues related to creating a bilingual voice portal in the Arab world, the business value of speech technology, and the lack of R&D initiatives in the region. Is there a business value in using speech technology? Speech recognition now is a mainstream technology. There are a lot of deployed applications in various places, mainly in the US and Europe. These are now called voice portals. What they do is try to minimise the cost of using operators for call centres. It adds value and reduces cost to international call centres. The adding value bit is important in addition to reducing the costs — meaning having less operators and more automated services anytime, 24x7. The value-adding comes in terms of adding new services. That does not require human intervention. For instance, if you are inquiring about car insurance, you don’t have to contact somebody for information. You simply get on and talk to a computer and the computer has a database link that responds to your queries, regarding the kind of information you need. What business opportunities are there for speech technology? The main one is on security — you’re trying to listen to different conversations and you want to pick key words in that telephone conversation. In the US now, there’s a major activity — especially when it comes to Arabic speech recognition — to try to eavesdrop on a conversation. Normally these audio files are recorded and then you have people listening to them and sorting them out. The ones that need attention go to the appropriate people. But if you develop a technology that can do that for research — always listening and indexing, and automatically identifying the ones that need attention — it can become the most powerful tool. Governments will pay big bucks for that. However, the technology has not been deployed yet. That’s where research comes in. That’s the objective; to listen to any conversation all the time and then flag up the ones that are suspicious. What are the complexities involved in developing such applications? The complexity used to be an issue until recently because of standards. The technology’s main players, including Microsoft, have already developed standards. Microsoft has developed a standard for developing speech recognition applications called SALT (Speech Application Language Tags) and is pushing hard for people to adopt the technology. There is another standard supported by other companies including IBM. That is the Voice XML standard. It’s really like the HTML language but it’s for voice recognition. It makes development of applications very easy. Also, in terms of the overhead you require, it really cuts down the complexity of the whole operation. Another factor is that hardware is so cheap these days. All you need for speech recognition applications is a decent server. And servers have gone down in price dramatically in the last two years. So it’s not an issue. You also have software that you can just install on the server, which can help you develop applications. How is research on speech technology developing? It is progressing. There’s always progress and in terms of high-tech research, we are moving into things like detecting emotions — when a voice gets happy, angry, sad, etc. At the moment, computers cannot tell whether I’m happy or angry. But it does help if computers can detect emotions automatically and respond accordingly. Are there any challenges or issues that confront the advancement of speech technology? There are certainly issues that still need to be researched. One of them is on natural language processing. When we attempt to communicate as human beings, we do not use short phrases. We communicate naturally using natural language. People are likely to adopt this technology if it is natural — if it is the natural way you communicate with the computer. I think that’s one of our challenges, developing what we call natural language interfaces. The other challenge of course is the environment. The noise environment, for example, is a big challenge for us in speech recognition. There are linguistic issues like the variations in accents. You know that English does not have just one form. It’s a mixed bag of accents. Even within Australian English, for example, you have regional variations in accents. That also remains a challenge. You need to capture all these in the speech recognition application. How about developing a bilingual voice application, such as combining English and Arabic? Is that proving to be a challenge for developers? Correct. If you use Microsoft Word and try to combine English and Arabic you’ll understand what I’m talking about. It’s the way it’s written. First of all, it’s the direction of writing. With regards to speech recognition, there are also challenges. One of them is switching from English to Arabic. The system, like a human being, should be able to listen to you and determine whether you are talking to me in English or Arabic and respond accordingly. It’s called language identification. There is a challenge there. Another challenge for Arabic is that Arabic is written normally without vowels. It’s important for speech recognition and text-to-speech, which is the speech synthesis part, to insert the vowels in the right place before it does anything else. The vowelisation technology is still not fully mature and needs further research. And the Arabic language in terms of structure is more complex than English. It needs special attention from a linguistic point of view. How do you find the adoption, so far, of speech technology in the region? It’s not widespread but it is developing. Unlike everything else in the Middle East, it could develop very quickly — all of a sudden. There’s no rationale behind how certain things develop. I think now there are a few deployed applications. The main one is the Al Jazeera voice portal where you get to read the news about the Middle East and the world through your mobile, and you interact with the computer system. It uses speech recognition as the interface. Microsoft is pushing the SALT standard, or what’s called the multi-modal approach to the internet, meaning that you can browse your web page if you like. But if you like to talk to your web page — interact using voice — you can do so. You can develop an application whereby you click and talk at the same time. That’s what’s called multi-modal. In my opinion, the next generation of internet applications will depend heavily on voice interface. It’s a more natural way of communicating with the internet. Do you see an interest from local developers to come up with applications based on speech recognition? Yes, we do see that especially in terms of adding value. There are two fronts. One is adding value to current technology. So you take current technology and then you shape it, refine it for Arabic. But there are also some basic gaps in the underlying Arabic technology. That needs to be developed. There’s very little being done with that respect — developing tools, developing Arabic automatic speech recognition or Arabic audio indexing — in this region. In the US there are basically companies and university departments that are working on developing the technology. There are a lot of developers and small operators who would like to add value or develop the technology further. Then how come there’s not a lot of progress when it comes to research and development (R&D)? It’s mainly the funding, the support for research and development. That is missing. A lot of the developed countries invest a percentage of their budget on R&D. In the Arab world, there is very little investment. Normally, you have two sources of funding: one is the government and the other is the investors or the local industry. We have a big problem in that the governments in the region are not investing a lot in R&D That’s problem number one. The second is the industry is so small and R&D industry is almost non-existent. There is very little going on and therefore they’re unlikely to invest in R&D. They’re small to survive. This is the case with most software companies in the region. It’s a catch 22. The government might say “get funding from the industry.” The industry says, “We do not have funding to give out for R&D, we need your support.” I think that’s the situation at the moment. There are a number of call centres being developed in this region. Surely there are business opportunities out there? Yes, the business case is there — there’s the Ajman call centre. There are lots of technologies being developed. Unfortunately they’re imported. Everything is imported. The technology itself as well as the expertise to operate the technology is imported usually from the US and from the West. That of course doesn’t help the R&D community here because you have to think of a way of transferring knowledge and keeping it in here. And this is not happening at all. In general, are you saying there’s lack of R&D support in the region? Yes, you can generalise that for the IT sector and yes, it’s almost non-existent. It lacks support, especially from the multinational companies — the big companies that establish business or have their regional headquarters here. They do not see the business case for R&D at all. They do their R&D in China or Ireland but they do not do it in Dubai. And I don’t see why not, to be honest. Is there an association or group that is advocating the development of an R&D industry in the region? I’m on the committee called the Arabian Knowledge Economy Association (AKEA). Our objective is to develop what everyone wants to develop in R&D, and seek the support of governments and the private sector, financial or otherwise, but more importantly, to push the awareness. Everyone is moving towards the knowledge economy. We should be really very active at all levels in this region, especially in the UAE because it’s a rich country and it’s also developing very quickly. There have been some programmes being initiated by several IT companies here to develop the local IT sector. Are these not helping the community? I’ve been working with various companies, and there’s no effort. There are so-called programmes and there’s a lot of talk but there isn’t anything happening on the ground in a serious way. What about government initiatives? The intention is there and you notice certainly the slogans are there, but you have to translate that into mechanism. You have to translate them into something that would happen on the ground — projects in universities, projects in companies, projects in Dubai government. The only one I’m aware of is the e-government. The reason it is actually being implemented is because it came from His Highness General Sheikh Mohammed bin Rashid Al Maktoum [Dubai Crown Prince and UAE Defence Minister] himself. He said: “It has to be implemented, you have no choice. And I give you until 2010 to put 90% of the services online.” We need more of that. We need the government championing the implementation of technology projects. You need a champion like HH Sheikh Mohammed, who is the champion of e-government. There should also be a champion of the knowledge economy, of using IT effectively, of innovation, of creativity and all that. HH Sheikh Mohammed is just one person. What we need is the government putting a clear policy on where we want to be by 2010.||**||

Add a Comment

Your display name This field is mandatory

Your e-mail address This field is mandatory (Your e-mail address won't be published)

Security code