To Design Voice Control Keyboard System using Speech Application Programming Interface

Sipon Miah

With the passing of days men are more dependent on electronic devices. The Main objective of this project is to design and develop a voice Control Keyboard Systems, fully controlled by a computer, and display output on the display device with predefined time. So this project will work as a helping system for those person who has small knowledge about computer system even those person who are illiterate they can operate computer system. We can implement this developed system in other system for example voice control car system. This project voice is the input, sound application programming interface(SAPI 4.1) recognize this voice transfer the command to the microprocessor according to the programming code and display device displays the output. Keywords: PC with Pentium microprocessor, a microphone, HMM and sound application software (SAPI).

IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010 ISSN (Online): 1694-0814 www.IJCSI.org 269 To Design Voice Control Keyboard System using Speech Application Programming Interface Md. Sipon Miah1, and Tapan Kumar Godder2 1 Department of Information and Communication Engineering, Islamic University, Kushtia, 7003, Bangladesh. 2 Department of Information and Communication Engineering, Islamic University, Kushtia, 7003, Bangladesh. Abstract With the passing of days men are more dependent on electronic devices. The Main objective of this project is to design and develop a voice Control Keyboard Systems, fully controlled by a computer, and display output on the display device with predefined time. So this project will work as a helping system for those person who has small knowledge about computer system even those person who are illiterate they can operate computer system. We can implement this developed system in other system for example voice control car system. This project voice is the input, sound application programming interface(SAPI 4.1) recognize this voice transfer the command to the microprocessor according to the programming code and display device displays the output. Keywords: PC with Pentium microprocessor, a microphone, HMM and sound application software (SAPI). 1. Introduction Day by day our life becomes busy. We have to do a lot of work everyday. For these purpose we use many kinds of computer control system. If we control the Computer system using voice then we can save enough time to do other sophisticated work. By doing this we can makes our work easier and faster. There are several different possibilities for environment control for physically disabled persons. In many cases speech is the most convenient and easy-tolearn alternative. In this study, we wanted to explore the possibility of voice control without being involved in major hardware system development. Thus, our guiding principle was that the system should connect easily to standard products for environmental control, be expandable and use standard equipment to the largest possible extent. The person that volunteered to try out this application already had a sip-and-puff operated system for limited environmental control through infra-red (IR) techniques. Rather than replacing this system, we decided to expand it with voice control and new possibilities. In this way, we could use the old system as back-up in case of system break-downs and we also got an immediate evaluation of relative drawbacks and merits of the two control modes. In developed countries most of the sectors are computerized but in Bangladesh, computer is used mostly in education, office work and printing purpose. If we can control our computer using voice, we lead a smart and faster life. So we have to try to use this advantage. Because the Voice-control is trained to recognize the individual voice pattern, operation by a third person is only possible by means of command keys. Speaking a command is the same as selecting a command with the scroll key and confirming it with the OK key on the pilot or on an external keyboard. By connecting special operating peripherals ( suck/blow switch, foot switch, head switch,...) to the keyboard interface, the Voicecontrol can also be operated by people with a speech impediment. The Voice-control is delivered together with configuration software, allowing the Voice-control to be adjusted to suit the individual. The software can be run on a standard PC under MS-Windows. (CPU Intel 80386 or higher, RAM 4 MB, MS-Windows V3.1 or higher) The selected commands are trained on the PC with the help of the Voice-control. Each word is spoken several times so that a common voice pattern can be analyzed. The voice pattern of the word is part of a neural network that allows speech to be recognized in operation. The voice patterns are stored in the pilot. Individual words can be retrained on the Voice-control even without a PC and configuration software. 2. Voice Control System Voice-controlled systems have become more and more popular in the last few years, but in many cases the focus has been on what is technically possible to do, more than what people really want from their systems. We will focus on the human aspect, and try to figure out what the most intuitive way of communicating with a voice- IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010 ISSN (Online): 1694-0814 www.IJCSI.org controlled system is. We are also interested in finding out how people adapt the way they talk when they are talking to a computer. Perhaps people want a shorter reply from the system than in other cases and also want to express themselves in shorter sentences? In order to build human friendly systems in the future we need to find out how people want their systems to work and perform instead of building more and more technically advanced systems that nobody asked for. We want to explore how different people talk when they talk to a computer compared with when they talk to a human. How does the communication differ with respect to syntax, pragmatics, phonetics and semantics, depending on whether they talk to a computer system or a human? Does a voice-controlled computer system have to be able to handle everything that a human can understand? Do people talk with longer sentences or do they choose to give short commands? What seems to be the intuitive way to speak? focus on voice control would be to find out how people really talk with the systems instead of how the systems work. 2.1 Dialogue Systems A dialogue is a spoken, typed or written interaction in natural language between two or more agents. It is divided into units called turns, in which a single speaker has temporary control of the dialogue and speaks/writes for some period of time. Within a turn, the speaker may produce several spoken or typed utterances1. A turn can consist of more than one move. A move is a speech act in the sense that it is an act that a speaker performs when making an utterance, such as questions, warnings, statements etc2. It has a functional relation to the conversation of which it is a part. A dialogue system is a system that allows a human, the user, to use natural language in the interaction with the computer. In the same way, the computer replies with natural language. The natural language can be either spoken or written, either complete sentences or fragments of sentences. An example of a dialogue system is the SJ3 system. 2.2 Speech Recognition Speech recognizers are computer systems that process human speech into something a computer can recognize and act on. There are several advantages in using speech in an application. For example, you can enter data when no keyboard or screen is available. It is also very convenient to use speech when hands or eyes, or both, are busy, and in difficult environments such as darkness or cold. Some areas where speech recognition is useful are: help for functional disabled people who are not able to type using their hands, telephone services where you have a very limited keyboard and when you need free hands, for example talking in your cell phone while 270 driving speech recognizers into different categories. First divide them into speaker dependent or speaker independent. Natural-language speech recognition refers to computer systems that recognize and act on unconstrained speech. That is, the user does not need to know a predefined set of command words in order to use the system successfully (Boyce, 2000). Good speech recognition can be quite hard to achieve. This makes it difficult to find the word boundaries, compared to finding them in written text, which makes the words difficult to recognize. There is a great variability in speech between speakers depending on age, sex and dialect and speech within a speaker depending on mood, stress etc. External conditions such as background and recording device also make a difference. 2.3 Speech Synthesis Speech synthesis is when the computer transforms textual or phonetic description into speech. A Text-ToSpeech (TTS) synthesizer is a computer-based system that should be able to read any text aloud. There is a fundamental difference between this kind of system and any other talking machine (a cassette-player for example) because of the automatic production of new sentences that a TTS can perform. 2.4 Existing Voice-Controled Systems Most car companies do some kind of research about voice-controlled systems today. BMW for example has voice-control in most of its vehicles7. Volvo is another car company that works with voice-controlled solutions in the car. 2.5 Syntactic Analysis We did these transcriptions in order to be able to classify everything our test persons said into the different functions they wanted performed with their utterances, or the functions their utterances were related to. An utterance is a string of speech found between breath and pauses. Every test person’s utterances were divided into stereoand address book functions. The utterances which were related to the stereo, were divided into these categories: ฀Change the volume ฀Change tune/CD ฀Turn on/turn off The utterances which were related to the address book, were divided into these categories: ฀Missed calls ฀Check address book IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010 ISSN (Online): 1694-0814 www.IJCSI.org ฀Add to address book ฀Delete ฀Change We also divided the utterances to the computer system and the utterances to the human system into different columns. We were mostly interested in the utterances made to the computer system, since this is what we believe to be what a voice controlled system has to handle. This, we believe, is also very likely to be the way people will interact with a voice-controlled computer system. We were also interested in comparing the utterances to the computer system with the utterances to the human system. As can be seen from the comments to Table No.2 the utterances from the computer system and the test leader are very similar. Table 1: An example of utterances from a test person divided into respective functions 271 7 FV AdvP ({NP, NP PP}) 5 FV NP PP (NP) 4 AdvP NFV NP FV (AdvP) PP 4 AdvP FV NP ({AdvP, NP AdvP}) 3 AdvP NFV NP NFV FV {AdvP, PP, AdvP NP} 3 NP FV {AdvP, NP, AdvP NP } 2 NFV NP FV (AdvP) 2 InterjP 2 NP NFV NFV FV {NP PP, PP} 2 FV NP (NP) PP NP NP NFV NP 1 AdvP NFV NP FV NP NP NFV PP 1 AdvP FV NP AdvP NP PP NP NP NP NFV NP 1 AdvP FV NP PP 1 AdvP FV PP NP NP NFV NP PP 1 NFV FV AdvP NP PP 1 NFV FV NP 1 NP NFV AdvP NFV FV PP 1 NP NFV NFV FV AdvP NP NP 1 NP NFV FV PP 1 FV NP PP NP NP PP NP NP 1 FV PP NP NP NFV NP 56 Fig. 1 The individual difference in words/utterance between the two tests. 3. Speech Recognition 2.6 Syntax of sentences uttered to computer system 46 NP ({PP, NP, FV NP}) 39 FV PP ({NP, PP, AdvP, PP NP}) 36 FV NP ({NP, AP, PP, AdvP, NP NP, NP PP}) 121 11 FV (AdvP) NP Speech recognition has a history of more than 50 years. With the emerging of powerful Computers and advanced algorithms, speech recognition has undergone a great amount of progress over the last 25 years. The earliest attempts to build systems for automatic Speech recognition (ASR) was made in 1950s based on acoustic IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010 ISSN (Online): 1694-0814 www.IJCSI.org phonetics. These systems relied on spectral measurements, using spectrum analysis and pattern matching to make Recognition decisions, on tasks such as vowel recognition [1]. Filter bank analysis was also utilized in some systems to provide spectral information. In the 1960s, several basic ideas in speech recognition emerged. Zero-crossing analysis and speech segmentation were used, and dynamic time aligning and tracking ideas were proposed [2]. In the 1970s, speech recognition research achieved major milestones. Tasks such as isolated word recognition became possible using Dynamic Time Warping (DTW). Linear Predictive Coding (LPC) was extended from speech coding to speech recognition systems based on LPC spectral parameters. IBM initiated the effort of large vocabulary speech recognition in the 70s [3], which turned out to be highly successful and had a great impact in speech recognition research. Also, AT&T Bell Labs began making truly speaker-independent speech recognition systems by studying clustering algorithms for creating speaker-independent patterns [4]. In the 1980s, connected word recognition systems were devised based on algorithms that concatenated isolated words for recognition. The most important direction was a transition of approaches from template-based to statistical modeling especially the Hidden Markov Model (HMM) approach [5]. HMMs were not widely used in speech application until the mid-1980s. From then on, almost all speech research has involved using the HMM technique. In the late 1980s, neural networks were also introduced to problems in speech recognition as a signal classification technique. Recent focus is on large vocabulary, continuous speech recognition systems. 272 3.1 Acoustic Model by Hidden Markov Model Any acoustic unit, such as word, syllable, diphone, triphones or phone can be modeled by Hidden Markov Models (HMMs). An HMM is a finite state machine which can be viewed as a generator of random observation sequences according to probability density functions. The model changes state once at each time step and at time t a state j is entered, a speech vector ot is generated from the probability density aij. The values of aij should satisfy N  j 1 a ij  1 (1) where N is the number of states. They provide the temporal information in the HMM [6]. The quantity P(Y/W) is the probability of an acoustic vector sequence Y given a word sequence W to find the most probable word sequence. A simplistic approach to achieve this would be to obtain several samples of each possible word sequence, convert each sample to the corresponding acoustic vector sequence and compute a statistical similarity metric for the given acoustic vector sequence Y to the set of known samples. For large vocabulary speech recognition this is not feasible because the set of possible word sequences is very large. Instead words may be represented as sequences of basic sounds. Knowing the statistical correspondence between the basic sounds and acoustic vectors, the required probability can be computed. Fig. 3 Trip hones HMM 3.2 Deccoder or Recognizer Fig. 2 Messages Encoding and Decoding in a ASR A decoder is a searching algorithm, which finds the corresponding word sequence. We given the maximum a posteriori probability P (W |O) for a spoken utterance O and its corresponding word string W. In the HMM based IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010 ISSN (Online): 1694-0814 www.IJCSI.org 273 recognition system, decoding is controlled by a recognition network. A recognition network consists of a word-level network, a dictionary and a set of HMMs. A word network describes the sequence of words that can be recognized and, for the case of sub-word systems, a dictionary describes the sequence of HMMs that constitute each word. A word-level network will typically represent either a finite-state Task Grammar which defines all of the legal word sequences explicitly or a Word Loop which simply puts all words of the vocabulary in a loop and therefore allows any word to follow any other word. Word-loop networks are often augmented by a stochastic language model (LM). A recognition network ultimately consists of HMM states connected by transitions. Fig. 5 Steps Sequence in Converting a speech signal into a set parameters suitable for ASR Fig. 4 Recognition Network Model 3.3 Parameterization For input into a digital computer the continuous speech signal must first be converted into discrete samples which are then converted into a set of representative feature vectors. This parameterization process is often called the front-end of the speech recognition system. The steps involve in converting speech signal into a set of parameters are shown in Figure 5. The main purpose of the digitization process is to produce a sampled data representation of the speech signal with as high a Signal to Noise ratio (SNR) as possible [11]. The process of grouping digitalized speech into a set of samples, called frame, typically represented between 20 and 30 ms of speech. The digitized speech signal is blocked into overlapping frames [1]-[20] as shown in Fig. 10. The overlap decreases problems that might otherwise occur due to signal data discontinuity. A one coefficient digital filter, known as a Preemphasis filter[11]. This stage spectrally flattens the frame using a first order filter. The transformation may be described as[12]: speech sample in the frame. Here, X[n] refers to the Sphinx uses α= 0.97 and the sampling rate is typically 8K or 16K 16-bit samples per second. Windows are functions defined across the time record which are periodic in the time record. They start and stop at zero and are smooth functions in between. When the time record is windowed, its points are multiplied by the window function, time bin by time bin, and the resulting time record is by definition periodic. It may not be identical from record to record, but it will be periodic (zero at each end). In the frequency domain, a window acts like a filter. The amplitude of each frequency bin is determined by centering this filter on each bin and measuring how much of the signal falls within the filter. If the filter is narrow, then only frequencies near the bin will contribute to the bin. A narrow filter is called a selective window, it selects a IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010 ISSN (Online): 1694-0814 www.IJCSI.org small range of frequencies around each bin. However, since the filter is narrow, it falls off from center rapidly. This means that even frequencies close to the bin may be attenuated somewhat. If the filter is wide, then frequencies far from the bin will contribute to the bin amplitude but those close by will probably not be attenuated much. The net result of windowing is to reduce the amount of smearing in the spectrum from signals not exactly periodic with the time record. The different types of windows trade off selectivity, amplitude accuracy, and noise floor. Today, in speech recognition, the Hamming window is almost exclusively used. The Hamming window is a specific case of the Hanning window [11]. In this stage a Hamming window is applied to the frame to minimize the effect of discontinuities at the edges of the frame during FFT. The transformation is[12]: 274 logic 1 is electrically represented by 5 V and logic 0 is represented by 0 V (digital ground). Whenever the Data and Clock line is not used, i.e. is idle, both the Data and Clock lines are left floating, that is the host and the device both set the outputs in high impedance. Externally, at the PCB, large (about 5 k ) pull-up resistors keep the idle lines at 5V (logic 1). The FPGA/keyboard interface is shown in figure 7. When the FPGA “reads” the Data or Clock inputs both PS2Data_out and PS2Clk_out are kept low which puts the tri-state buffers in high impedance mode. When the FPGA "writes" a logic 0 on an output, the corresponding x_out (x = PS2Data or PS2Clk) signal is set high which pulls the line low. When “writing” logic 1 the FPGA simply sets the x_out signal low. The vector H[n] is computed using the following equation[12]. The constants used in the H[n] transform were obtained from the Sphinx source code. In practice, it is desirable to normalize the window so that the power in the signal after windowing is approximately equal to the power of the signal before windowing. The purpose of the window is to weight, or favor, samples towards the center of the window [11]. Speech coding is the compression of speech into a code using audio signal processing and speech processing techniques. To encode the speech signal into a suitable set of parameters three basic classes of techniques are being used:    Fourier transformations Filtering through digital filter-banks Linear prediction Fig. 7 FPGA/ Keyboard Interface 4.1 Protocol for receiving data from the keyboard Data is received from the keyboard as illustrated in Figure 8. Fig. 8 PS/2 protocol Since the speech signal is not stationary, speech analysis for encoding must be performed on short-term windowed segments, usually, a duration of 20 to 30 ms with a frame period of 10 to 15 ms. Therefore, for short period of time (~10ms) the speech signal is quasistationary and this allows us to represent the signal over this period by a single feature vector. Fig. 9 PS/2 Timing 4. Keyboard Interface The PS/2 interface is a bit serial interface with two signals Data and Clock. Both signals are bi-directional and Data is sent in bit serially. start bit, logic 0. Then 8 bits significant bit first. The data is (odd parity). The parity bit is The first bit is always a are sent with the least padded with a parity bit set if there is an even IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010 ISSN (Online): 1694-0814 www.IJCSI.org number of 1's in the data bits and reset (logic 0) if there is an odd number of 1's in the data bits. The number of 1's in the data bits plus the parity bit always add up to an odd number (odd parity.) This is used for error detection. A stop bit (logic 1) indicates the end of the data stream. 4.2 The keyboard scan-codes The keyboard sends packets of data, scan codes, to the host indicating which key has been pressed. When a key is pressed or held down a make code is transmitted. When a key is released a break code is transmitted. Every key is assigned a unique make and break code so that the host can determine exactly what has happened. There are three different scan code sets, but all PC keyboards use Scan Code Set 2. A sample of this scan code set is listed in table 2. Please refer to the lab homepage for the full scan code set. 275 4.3 Displaying scan codes Receive the scan codes from the keyboard and display the corresponding code in hexadecimal format on the XSV board digit LEDs. The LEDs should be updated at a rate of approximately 1 Hz. Following we will partition this problem into more manageable pieces. We will partition the design into a data path and a control path. A block diagram of the complete design is shown in figure 10. Table 2: Scan Code Set 2 (sample) KEY MAKE BREAK A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1C 32 21 23 24 2B 34 33 43 3B 42 4B 3A 31 44 4D 15 1B 1B 2C 3C 2A 1D 22 35 1A FO, 1C FO, 32 FO, 21 FO, 23 FO, 24 FO, 2B FO, 34 FO, 33 FO, 43 FO, 3B FO, 42 FO, 4B FO, 3A FO, 31 FO, 44 FO, 4D FO, 15 FO, 2D FO, 1B FO, 2C FO, 3C FO, 2A FO, 1D FO, 22 FO, 35 FO, 1A Fig. 10 A block diagram of a PS/2 Keyboard Interface All flip-flops in the design are clocked with a 20 MHz clock and the rising edge is the active edge. Four types of positive edge-triggered D-flip-flops are shown in figure 4.FD No reset/set FDC Asynchronous reset FDP Asynchronous set FDCE Clock enable and asynchronous reset The PS2Data and PS2Clk signals are sampled with FDP flip-flops. The PS2Data is fed into a shift register. When 8 data bits (scan code) and one parity bit has been shifted into the shift register the scan code is written to a synchronous FIFO. When the FIFO is not empty the output is read in intervals of approximately 1 second. The FIFO output is connected to two identical ROMs. The ROMs decode the scan code data so that a character “0” – “F” is displayed on each digit LED.A short description of IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010 ISSN (Online): 1694-0814 www.IJCSI.org each module in the data and control path follows. The keyboard scanning process is illustrated in Figure 8. First the column result is scanned. The three lower bits (column part) of the scan code is incremented until the low column line is found. Then, the port directions are inverted and the row result is scanned. The row part of the scan code is incremented until the low row bit is found. Finally, the scan code processing function is called. 5. Speech Application Programming Interface The Speech Application Programming Interface (SAPI) is defined by Microsoft. Version 4 was published in the beginning of 1998 and its successor, version 5, was published in the beginning of 2001. Maybe the biggest weakness in SAPI 4 is the fact that there is no centralized control panel for speech synthesizer parameters. This means that user is required to select his or her favorite voice and adjust it individually for every speaking application. SAPI 5 adds the Speech item to the control panel. This control panel item is used by the user to define his or her favourite voice and other speech parameters SAPI (Speech Application Programming Interface) was first introduced to Windows 95. This API provides a unified interface for dynamic speech synthesis and recognition. Over the years new versions were developed and now it is version 4.0 with WinXP. Unfortunately the API wasn't really maturated and supported only C++ (later Visual Basic and COM), so it was quite widely used. Microsoft redesigned the version 5.0 from scratch and changed critical parts of the interface. However the latest stable version 5.1 is still a native code DLL, but with the next one, which is considered to be part of Windows Vista (a complete redesign again), A full support for managed .NET code will be expected [7]. Right now it is only possible to take advantage of the current SAPI interface via C# by using COM Interop, which is .NET technique to use native COM object Programming interfaces expose the full power of complex software engines. SAPI 5 is Microsoft’s fifth iteration towards a speech application programming interface to enable programmers to build speech-enabled applications for Microsoft Windows platforms. It incorporates lessons that Microsoft has learned over many years and many APIs, in an effort to make the best comprehensive API possible. Even so, most programmers should do at least their prototyping and in most cases their product using a tool suite that does most of the SAPI 5 programming for them. Why? As always, to improve value and reduce costs. Microsoft SAPI is a little known speech recognition and synthesis engine that is supported in all versions of Windows after Windows 95. The Microsoft Speech SDK (System Development Kit) version 5.1 is available for free download from the Microsoft Speech Technologies Website. Speech SDK 5.1 276 is compatible with a number of programming languages, but most of the documentation focuses on examples written in C++. In general, any programming environment that supports OLE automation will work for writing SAPI applications. 6. Result and Discussion In our project work, an attempt has been made to develop voice control computer system. Here computer is the central device. Microphone is the input device and voice is the input. For this system a microphone is connected to the pc via sound cord and then software was developed to accept the input processed by sound application programming interface(SAPI) software. Input was processed by microprocessor and display this. We do believe that a voice-controlled system must be able to handle some kinds of disturbances in the utterance. The system must not interpret them as speech or as the ending of an utterance. This is especially important when the system is going to be used in a car since this is an environment where these disturbances can often occur since your attention is on the traffic and everything around you. When something happens that catches your attention you might, for example, make “unmotivated” pauses in your utterance to the computer, which the computer system does not have a clue why you are doing and probably will interpret it as if you are finished with your utterance. To actually build a voice-controlled system on the basis of our communication model and evaluate it would also be a great challenge. Thereafter, to do a study with a large group of test persons to find out how our system would work, would have been challenging. To do a study with a large group of test persons would also be interesting since we could see if the results could be statistically secure. When we created our communication model we worked on the basis of the specific environment of the car. Therefore our solution is custom-made for the car. However, it might be possible to use it for other areas where you use service gateways. For example when you want to use voice-controlled computers in your home environment. System performance totally depends on the output of the system. The percentage of success rate and failure rate has been calculated using the following equations: Success: Rate= Total number of success  100% Total number of Test IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010 ISSN (Online): 1694-0814 www.IJCSI.org Failure: Rate= Total number of failure  100% Total number of Test The performance is related to success rate and failure rate. If the success is high then the performance of the system is good. Success rate and Failure rate are contradiction of each other. So when success rate is high then failure rate is low. In the two terms the performance of the system is depended. 7. Conclusions We have demonstrated a voice control system, for physically disabled persons, that operates with a minimum of specialized hardware. Some of the chosen solutions are dictated by the wish to build on the equipment already used by our test subject. For a more mobile person, the same solution could be used in connection with a portable computer. The telephone capability could be given a less expensive and more flexible solution through a standard modem. The use of a standard PC gives potential access to a wide variety of programs. Future expansions include improved software for text processing. Acknowledgments The authors would like to thank student Pulack Chakma [M.Sc.(Tech.)] for her assistance during the computer simulations and Md. Farzan Ali [M.Sc. (Mat.)] for the language checking. References [1] ”Visual Basic v6 (Part 1-Database and Multimedia Programming)” By Mahabubur Rahaman [2] ”Visual Basic v6 (Part 2- Database and Multimedia Programming)” By Mahabubur Rahaman [3] ”Mastering Visual Basic 6 “ By Michael j. Young [4] ”Mastering Visual Basic 6” By Evangelos petroutsos [5] “Microprocessor and interfacing programming and hardware” By Douglas Hall,Tata McGraw-Hill Edition [6] “Microprocessors and Microprocessors based system design” By Dr. M.Rafiquzzaman [7] Spectral contrast normalization and other techniques for speech recognition in noise, . C. Bateman, et. Al [8] “Speech enhancement based on masking properties of the auditory system” [9] H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech,” The Journal f the Acoustical Society of America, vol. 87, no. 4, pp. 17-29, 1987 [10] B. Atal and M. Schroeder, “Predictive coding of speech signals”, Proc. of 6th International Congress on Acoustics, 277 Tokyo, pp. 21-28, 1968 [11] H. Matsumoto and M. Moroto, “Evaluation of Mel-LPC cepstrum in a large ocabulary continuous speech recognition,” Proc. ICASSP, pp.117-120, 2001 [12] P. H. Lindsay and D. A. Norman, “Human information processing: An introduction to psychology”, 2nd Ed., pp. 163, Academic Press, 1977 [13] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech and Signal Processing vol. 27, no. 2, pp. 113-120 [14] H. W. Strube, “Linear prediction on a warped frequency scalle”, J. Acoust. Soc. Am., vol. 68, no. 4, pp. 1071-1076, 1980 [15] H. Matsumoto, Y. Nakatoh and Y. Furuhata, “An efficient Mel-LPC analysis method for speech recognition”,Proc. ICSLP ’98, pp. 1051-1054, 1998 [16] H. Matsumoto and M. Moroto, “Evaluation of Mel-LPC cepstrum in a large vocabulary continuous speech recognition,” Proc. ICASSP, pp.117-120, 2001 [17] Sub Auditory Speech Recognition" by Kim Binsted and Charles Jorgensen [18] "Sub Auditory Speech Recognition Base on EMG/EPG Signals" by Chuck Jorgensen, Diana D. Lee, and Shane Gabon [19] S. Itahashi and S. Yokoyama, “A formant extraction method utilizing mel scaleand equal loudness contour”, Speech Transmission Lab.-Quarterly Progress and Status Report (Stockholm) (4), pp. 17-29, 1987 [20] Various websites On Internet Md. Sipon Miah received the Bachelor’s and Master’s degree in Information and Communicatio Engineering from Islamic University, Kushtia, 2006 and 2007, respectively. He is currently Lecturer in the department of Information and Communication Engineering, Islamic University, Kushtia,Bangladesh.Science 2003, he has been a Research Scientist at the Communication Research Laboratory, the department of ICE, Islamic University, Kushtia, where he belongs to the spread-spectrum research group.He is pursing research in the area of internetworking in wireless communication. He has three published papers in international and national journals in the same area but some papers under process. His areas of include database system, interfacing, programming, optical fiber communication and wireless communications. areas of interest communication. Tapan Kumar Godder received the Bachelor’s, Master’s and MPh degree in Applied Physics & Electronics from Rajshahi University, Rajshahi. In 1994, 1995 and 2007, respectively. He is courrently Associate Professor in the department of ICE, Islamic University, Kushtia-7003, Bangladesh. He has seventeen published papers in international and national journals. His include internetworking, AI & mobile

Log In

To Design Voice Control Keyboard System using Speech Application Programming Interface

To Design Voice Control Keyboard System using Speech Application Programming Interface

Related Papers

RELATED PAPERS