checkAd

    Spracherkennung, bei der Depot-Strukturierung nicht vergessen! - 500 Beiträge pro Seite

    eröffnet am 29.12.00 16:28:48 von
    neuester Beitrag 29.12.00 22:44:28 von
    Beiträge: 3
    ID: 321.627
    Aufrufe heute: 0
    Gesamt: 179
    Aktive User: 0


     Durchsuchen

    Begriffe und/oder Benutzer

     

    Top-Postings

     Ja Nein
      Avatar
      schrieb am 29.12.00 16:28:48
      Beitrag Nr. 1 ()
      Gefunden in den Tiefen des Netzes!
      © 2000 National Court Reporters Association:


      Only a few years ago, speech recognition technology was in the research and development phase. Now, quite simply, it works, thanks to the efforts of such companies as Dragon Systems, IBM, Philips, Lernout & Hauspie and Microsoft. However, though the technology proves useful for such functions as automated operator and interactive voice response systems - current applications of this technology include voice-automated trading, Yellow Pages searches, airline seat reservations and Web browsing - the goal of continuous speech at high speeds remains elusive. As a result, most vendors are focusing on using speech recognition technology to control a computer and some ancillary functions.

      Both Forrester Research and the Gartner Group believe that speech-to-text and text-to-speech technology will drive the computer industry in the years ahead. In its 1999 report "Technology Radar Screen 1999: 10 Technologies to Watch in the New Year," Gartner included speech recognition.
      The creation of better speech recognition products and capabilities hinges on the development of faster and more powerful computer processors. The move to continuous speech clearly demonstrates this point. Earlier products were limited to discrete speech, in which the user had to pause between words. But with continued hardware improvements, continuous speech, in which the computer translates the speech into text almost as fast as it is spoken, is the industry standard. For example, in 1999 Intel released the Katmai processor, which is based on the company`s MMX technology. This particular chip enhanced the performance of speech recognition applications by speeding up the front-end audio processing and increasing the throughput of the search algorithms involved in pattern matching. This led to reduced error rates and quicker response times.



      These improvements are being hastened by alliances among the different companies. Sun Microsystems developed its Java speech application programming interface with the help of IBM, Dragon Systems, Philips, Novell, Texas Instruments, AT&T and 12 other companies. This particular speech API specifies a single interface for the development and distribution of speech technology applications on the desktop computer, in portable devices and in telephony servers. By setting standards, these companies can move forward with product innovations at a faster pace.

      Developments in the speech recognition field certainly bear watching with an eye toward how new capabilities will affect court reporters. Voice-based command line editing while writing realtime is certainly a legitimate possibility. The NCRA Board of Directors undertook research into the current status of speech recognition and its implications for the reporting profession. What follows is an excerpt from their findings.

      Primary Vendors/Products

      There are three major companies vying for dominance in the speech recognition market: IBM (ViaVoice Pro), Lernout & Hauspie (Voice Xpress and Dragon Systems` NaturallySpeaking), and Philips (FreeSpeech). In addition, there are dozens of smaller players offering products that range from headsets to specialized dictionaries. For example, Eloquently Stated is a product developed by Dr. Eric Fishman, a practicing orthopedic surgeon and president of 21st Century Eloquence. It offers a variety of insertable library modules based on the user`s medical specialty that are integrated with speech recognition technology to help in the creation of medical reports and templates.

      The top-of-the-line speech recognition products sold by these companies tend to offer the same features and capabilities, though there are differences with respect to specific functions. For example, you can dictate into any Windows application, and they all support command macros and multiple users. However, all the products transcribe from hand-held recorders except for Philips` FreeSpeech.

      The vendors differentiate between the various versions of their products by the intended audience: home, office or business professional. The promised accuracy rates range from 95 to 98 percent at 140 wpm, as long as you speak clearly, have sufficient PC speed, and a good sound card and microphone. The prices for these standard systems in September 2000 range from $100 to $180, though the cost increases when specialized professional dictionaries are required. Lernout & Hauspie`s Voice Xpress for Legal, which is a specific module for attorneys, costs $799. The company`s general medicine software costs $499, while the company`s specialty medicine software, for emergency medicine, mental health, pathology, primary care, and radiology, each cost $799.

      Quality Assessment

      Speech recognition has come a long way in just the last few years. "Speech recognition is moving closer to the mainstream and is certainly finding its niche in the medical and legal communities, where specialized vocabularies are used," said Greg Alwang of PC Magazine. "But don`t give up your keyboard and mouse just yet. These programs are meant to supplement traditional means of input, not replace them. They can provide a big productivity boost for users with limited typing skills. But for those with disabilities, repetitive stress injuries or who just always have their hands full, these products are a boon."

      Several systems now purport to require only a five-minute training session before you can get started. They all offer multiuser support and remote access through a hand-held unit. Advances with natural language allow the systems to distinguish between commands and words. For example, if you say "Open Microsoft Word," it will execute that command rather than typing the phrase. And, of course, these products can save time and money during transcription. Most provide an accuracy level of 95 to 98 percent at 130-140 wpm, with some users reportedly reaching 180-200 wpm after 15 to 30 hours of training. Still, there are several hurdles to be overcome.

      Accuracy



      As computer technology continues to advance, faster processors and more intelligent language models will lead to higher accuracy levels. As a special report in the February 1998 Business Week noted, because of the many challenges the English language presents to speech recognition systems, 95 percent accuracy is impressive, but this is only achieved under ideal circumstances. When
      dealing with spontaneous speech - natural speech among two or more people - the accuracy percentage plummets. D. Raj Raddy, dean of the school of computer science at Carnegie Mellon University, explained, "All of a sudden, error rates shoot from a respectable level of 10 percent all the way up to 50 percent. That means every other word is wrong." Also, a speech recognition system will always return a word, even if it`s the wrong word, which makes proofreading a much more difficult task.






      In a March 2000 review of voice recognition software in the Washington Post, John Breeden offered this assessment: "Unlike humans, a computer does not have the ability to use context to fill in the blanks in conversations caused by misunderstood or half-heard words. Instead, it must rely on interpreting the speaker`s exact sounds. This is complicated by the fact that no one pronounces a word exactly the same way every time. Plus, the English language has lots of words that sound the same but have different meanings - such as `red` and `read` or `to` and `too.`"

      Noise levels

      The level of background noise present has a negative effect on speech recognition. The software engine can`t tell the difference between a person speaking or music coming from a radio - it`s all simply noise. Higher-quality sound cards and better microphone technology are required to enable the ease of use and robust capture of speech in noisy environments. As David Wold, Dragon Systems` senior technical advisor, noted on National Public Radio`s "All Things Considered," "We are not at the point where we can handle perfect transcription of an arbitrary person speaking in an arbitrary fashion. We are not at the point where we can deal with arbitrary acoustic conditions. ... Speech recognition of general ambient background noise and everything else that can interfere with it is much harder. You can do it, but you will not get good transcription out of that."


      Open mike

      Allowing for speech recognition in an open mike situation is a difficult proposition. You have to establish what the software will recognize and how it will recognize it. For example, in an open classroom, will the speech engine only recognize the teacher or will it also recognize the students asking questions?


      Speaker dependency

      This aspect determines whether or not someone needs to train the computer to understand his or her voice in order to get the best results. Currently, continuous speech engines do not allow for independent speaker profiles, working effectively only with the voice that trained the system.

      Overlapping speech

      Obviously, with the current speech engines` continued dependence on speaker profiles, overlapping speech would greatly affect the accuracy of these systems. And a multiple user system is not to be confused with a concurrent multiple speaker system, which has not yet been developed. A multiple user system means the speech engine can recognize more than one speaker profile.

      Correcting mistakes

      This capability can still be a bit cumbersome with certain products. For example, IBM`s ViaVoice suggests using a combination of the mouse, keyboard and voice to edit, rather than just using voice commands. However, as attorney Bruce Dorner noted in the February 2000 Law Technology News, "With voice, you need to correct the errors that will inevitably appear. You can`t just type over them. The tools learn by training, or the machine will repeat your errors believing that the wrong word or phrase is what you really intended."


      Application to Reporting

      Breakthroughs in speech recognition technology will continue, driven in part by alliances among the major players. IBM, Intel, Philips, e.Digital, Norcom Electronics and Olympus have formed VoiceTIMES, whose goal is to coordinate the technical requirements needed for companies to build and deploy solutions using voice technologies and hand-held mobile devices. There`s also the Voice XML Forum, which counts Novell, Qualcomm, Sprint and Sun among its members. This group envisions using Web site speech technology for customer service operations.

      Speech recognition technology has the potential to affect all areas of the reporting profession. For example, some doctors are using portable versions of this technology. Nevertheless, because of problems with respect to accuracy, the services of a medical transcriptionist are still required for review and cleanup. And, as demonstrated by voice writers (formerly called stenomaskers), speech recognition can work in the legal setting. However, this is clearly an example of people harnessing the technology rather than technology replacing people. The speaker-dependent system requires and is trained by an experienced, competent voice writer. Looking more directly at stenographic reporting, speech recognition may allow reporters to edit their transcripts while still taking down the testimony, control the basic functions of their software programs and manage ancillary court reporter functions, such as marking information useful for Reporter Electronic Data Interchange as it appears in the proceedings with "voice annotations."






      In fact, Johnny Jackson, president of Stenovations, reported that the company holds a patent regarding a CAT system working in unison with a speech recognition system. The patent covers a reporter writing into a CAT system that translates the steno into a stream of text, and a speech recognition system on the same computer that translates the same dialogue into a stream of text within a certain time buffer. The two text streams are compared against each other, and where they are the same they are combined into one stream of text. Where they are different a conflict is created. The reporter can immediately resolve the conflict and thereby train the speech recognition system and correct the output or do the editing later and update the speech recognition dictionary. The patent is intended to take advantage of the reporter`s strengths and speech recognition`s strengths as they complement each other, Jackson explained. The speech recognition should be better on polysyllabic words and numbers, and the reporter should be better at homonyms such as "no" and "know," word-boundary problems and the alphabet, such as "P" and "pee" and "B" and "be."

      The captioning industry is also examining the application of speech recognition. In a July-August 2000 JCR interview, Judy Brentano, general manager for VITAC`s MetroCaption division, noted, "[R]ealtime using voice recognition embraces many of the same principles used in realtime reporting on a steno machine. That is, you hear a word, process it mentally in context and then `write` the homonym or `speak` it differently for accurate translation into your realtime system. The use of voice recognition systems will most probably have application for multilingual translation as the software programs become more sophisticated. Voice recognition will be an alternative method for preparing program text for offline captioned work. At present producing realtime text with voice recognition is being tested on the Internet. As the processors and `engines` become more powerful, it may well have application for on-air captioning. This should not cause great concern, as voice recognition will still require a human being to operate it and it will require like training and education to competently operate voice recognition systems in any of these scenarios."

      Conclusion

      At the 1995 international voice technologies conference sponsored by the American Voice Input/Output Society, Dr. Kai-Fu Lee, director of interactive media at Apple Computer, asked a question that is still valid today: What makes a successful product? He pointed out that voice applications, or any others, won`t become successful products unless they provide: 1) technical differentiation - they offer better technology than an existing product at the same cost, the same technology at a lower cost or entirely new technology; and 2) customer value - they solve a real problem better than before, are usable and have an attractive price.

      In the area of technical differentiation, history shows that improvements in technology occur incrementally, rather than as breakthroughs. And voice applications prove useful only in situations where other input modes are worse or nonexistent and will require reasonable expectations on the part of users; in other words, a tolerance for errors.

      So, until speech recognition can provide a better product than court reporting for the same cost or the same product at lesser cost, and is a usable system, it won`t replace court reporters. To maintain the distinction between what a court reporter can offer to a client vs. speech recognition technology, or any alternative technology for that matter, the profession needs to define more clearly the "product" and "better product" in its customers` minds.

      An important area where the court reporting profession can have an influence in the development of such speech recognition applications involves driving the definition of what is acceptable nonverbatim output (if anything is) in this context, and, alternatively, educating the profession`s customers (including the public) about the real problems of anything less. Keep in mind, court reporters have been ahead of the rest of the legal system in applying digital technology in the workplace. Computer-aided transcription, realtime translation and video-text integration - reporter-based technologies - have enhanced the functioning of the judicial system for several years in both headline trials and everyday cases.

      Speed and efficiency

      Reporters have always been known for their ability to write fast, with all certified reporters required to have a minimum writing speed of 225 wpm. That speed translates to other areas as well. For example, for a computerized court reporter, finding a section of the testimony for readback takes only a few seconds.

      With realtime, judges and attorneys have immediate access to the testimony. During the trial, the judge and attorneys can review and mark portions of testimony and make notes within their copies of the transcript on their computer screens without interrupting the proceedings. They can perform searches for specific words, phrases, roots of words and other more complicated information in one or more documents instantaneously. Searches through a file can be made forward or backward, or the search can be set to tag or highlight certain words as the trial proceeds. Also, some realtime-based programs allow participants to organize marked testimony. This feature enables them to get more accurate notes than a pad and pencil would allow. In addition, with a cut-and-paste feature, individual notes taken from the court reporter`s transcript can be put together on screen or in a printed report.

      At the end of the proceeding, the reporter can give the attorneys an uncertified rough draft transcript, with the official copy following sometimes just a few hours later - a practice common with many high-profile trials.

      Versatility

      Court reporters can move from one courtroom to another with ease. This portability means they can provide their services whenever and wherever necessary. For example, a court reporter can turn a standard courtroom into a computer-integrated courtroom in a matter of minutes, requiring only realtime cables to hook up to the judge`s and attorneys` computers.

      Communication access

      The Americans with Disabilities Act of 1990 mandates equal access to the courtroom for all Americans. Hearing-impaired judges, attorneys, litigants and jurors can read the reporter`s realtime text from a computer screen and monitor and participate fully in judicial proceedings. This method has proven particularly effective in the TV industry, as realtimers provide captions for the more than 28 million hearing-impaired people in the United States.

      Security

      As the guardians of the record, reporters are charged with ensuring the accuracy of the testimony. Certified court reporters pass challenging, stringent national exams to earn their designations, and NCRA`s continuing-education requirements ensure that its members stay up-to-date on technology, ethics, professional practices and other critical areas. And with the application of new technology, reporters can maintain that role while improving the services offered to thecourts and clients. For example, reporters can e-mail transcripts to attorneys or file them electronically with the court, protecting the record by applying a digital signature. This ensures the integrity of the record and that only the appropriate people have access to the transcript.


      Reliability

      The levels of redundancy in reporter technology guard against a failure. While taking the testimony, a realtime reporter has three copies of the record: the realtime text appearing on the screen and being saved to the computer hard disk; the same text being saved to the steno machine computer disk; and the reporter`s stenographic paper notes as a final backup.

      Control

      Court reporters have control of the proceedings. If a witness mumbles, the reporter can ask the individual to speak up. If the attorneys are talking over one another, the reporter can stop the proceedings to ensure an accurate record is made.

      Ease of research

      In-court computerization opens a world of research capabilities. Attorneys and judges can call up depositions to compare with current testimony. CD-ROM technology enables attorneys to bring volumes of legal research into the courtroom on a thin disk. In addition, attorneys can send the trial proceedings off-site, access online legal research programs or communicate with co-counsel and consult with expert witnesses remotely.






      Flexibility

      Realtime technology can be used to synchronize video with computerized court reporting. Stenographic text of the proceedings taken down by the court reporter is translated by realtime and integrated with a simultaneously created videotape so that the text of the testimony appears on the screen with the video record of courtroom events. An internal clock in the video camera or VCR is synchronized with the court reporter`s computer to ensure that the video and text records of trial or deposition proceedings match. This allows a specific portion of the video record to be found by searching the text record, which is a much more efficient and thorough method than a video-only search.

      Teamwork

      The computer-integrated courtroom gives participants instant access to all court proceedings. Reporter Electronic Data Interchange is an enhancement of the CIC. REDI links the court reporter`s and the clerk`s systems, enabling both to enter data into the court`s electronic database and reducing duplicate functions. Common information such as case number and name, presiding judge and attorneys present, witness examination, date and time is captured by the court reporter. At the end of the day, the court reporter converts the rough realtime transcript text to ASCII and sends it over the court network server. The clerk or other designated user can access the text via computer and download relevant information for inclusion on the docket sheet, minute order or calendar.

      Human presence

      As impartial guardians of the record, court reporters have custody of the record between the proceeding and when the transcript is filed.

      They certify the validity of the transcript, and they certify that copies obtained from them are true and accurate. But court reporters do more than safeguard, prepare and certify transcripts. During the trial, they make every effort to ensure the best possible record of judicial proceedings.

      These are just some of the ways court reporters have adapted to and made use of new technology to improve the functioning of the courts and the judicial system. Obviously, the true effect of speech recognition technology on court reporting cannot be determined yet. Nevertheless, it is something that should continue to be examined carefully as it develops, particularly in terms of its functionality in making the reporter`s job easier and more efficient.
      Avatar
      schrieb am 29.12.00 17:14:44
      Beitrag Nr. 2 ()
      da hast du wohl recht mit deinem gedankengang
      diese sparte wird noch zukunft haben, obwohl sie momentan
      ein schattendasein an der börse durchlebt,
      ich habe da seit 1 1/2 jahren eine aktie im auge, mit der ich immer wieder gute gewinne einfahre, (leider ab und an auch dicke verluste), aber sie ist immer für überraschungen gut,
      hat Cooperationen mit IBM am laufen, und ist in punkte Sprachsoftware führend , ich meine General Magic 894165 (GMGC)
      macht seit 4 Jahren eine regelrechte Odysee , wird immer wenn sie wieder mal von 1,50 Us.D. nach oben kommt von CAPITAL mit
      Kursziel 25 ? empfohlen, (ok das war vor der grossen korrektur), ziehen wir die hälfte ab sinds immer noch 12,50
      und ist momentan ein bisschen zu ner zockeraktie verkommen, sie macht täglich sprünge von bis zu 20 % aber.... langfristig,seh ich sie bei 20 ? (momnentaner Kurs 1,45 ? FFE)
      denn ausser guten nachrichten hat sie dieses jahr keine gehabt, und der kurs hat im schlepptau der IT`s und sonstigen Internet-Aktien sehr gelitten
      nur ein Gedanke
      good trades
      cdr74
      Avatar
      schrieb am 29.12.00 22:44:28
      Beitrag Nr. 3 ()
      und up ++


      Beitrag zu dieser Diskussion schreiben


      Zu dieser Diskussion können keine Beiträge mehr verfasst werden, da der letzte Beitrag vor mehr als zwei Jahren verfasst wurde und die Diskussion daraufhin archiviert wurde.
      Bitte wenden Sie sich an feedback@wallstreet-online.de und erfragen Sie die Reaktivierung der Diskussion oder starten Sie
      hier
      eine neue Diskussion.
      Spracherkennung, bei der Depot-Strukturierung nicht vergessen!