VOICE TEXTING

Lou Marvin T. Bautista, Sherwin Ian L. Bedia, Kelwin G. Chua
Shei Audrey R. Fuertez, Jaimee F. Marquez, Bonn Jonel P. Pua
Tristan H. Calasanz and Carlos M. Oppus

 

Abstract—This project involves the design and implementation of a voice-texting system that allows the sending of SMS messages by way of voice recognition and computer interfacing, and supports user-notification of incoming messages. This is intended to facilitate the easier transmission of commonly sent messages saved as templates simply through voice identification rather than typing the whole message in the mobile phone’s keypad, which takes more time and energy, and involves greater susceptibility to committing typographical errors.

In this system, the user speaks to a microphone connected to a personal computer. A program written in Visual Basic accesses the voice from the sound card and saves it as a wav file. The voice is then compared with pre-recorded voices already stored in the computer in terms of the amplitude of the frequencies. When a match occurs, the program accesses the corresponding text file or template containing the specific message to be sent by SMS. The message is then passed on to the mobile phone interfaced with the computer through the serial port. A hardware circuit connected to and controlled by the parallel port indicates whether the message has been sent or not, as well as the number of messages sent and received by the mobile phone. At the same time, the program alerts the user when a message has been received. The computer then proceeds to voicing out and displaying the message on the screen, and creates a log file of all the messages received by the mobile phone.

Index Terms—Visual Basic, Short Messaging Service, Voice Recognition, Dynamic Link Library

  1. INTRODUCTION
  2. he mobile phone is one of the fastest rising technologies today. The invention of mobile phones paved the way for better and faster means of communication. The application of mobile phone technology however is not only limited to the field of communications. Its application spans other fields like entertainment, government and media. In fact, newer technologies relating to mobile phones are continuously being developed. Examples include the use of mobile phones in the control of remote systems.

    Communication through Short Messaging Service (SMS), more commonly known as texting, is at its peak nowadays being a more affordable form of communication compared to placing a voice call using your mobile phone. In the Philippines, millions of text messages are sent daily from one mobile phone to another, earning it the distinction of being the texting capital of the world. Thus, researches on the enhancement in the usage of mobile phones are currently in progress.

    Most of the mobile phones in circulation today have the ability of placing a call using what is referred to as voice dialing. This feature enables the user to record voice tags for the people he/she calls the most and dial a number simply by speaking out the name of the person he/she wishes to call, thus saving him time and energy. The Subscriber Identity Module (SIM) card containing all the information about your phone, such as the phone number and phone book information is, in turn, automatically searched for the mobile phone number of the person you wish to call. In this regard, this project was developed with the goal of extending this same flexibility to the sending of SMS messages, thereby extending the use of mobile phones beyond its intended purpose.

  3. Statement of the Problem
  4. One of the core features of mobile phones nowadays is its SMS or texting capability. Through this capability, messages are constructed and relayed to its target destination/s with just one or several presses of a key. However, for relatively long messages, texting may prove to be an inconvenience maximizing on time and energy. Moreover, with relatively long messages comes greater propensity to committing typographical errors.

    There are also cases when one does not have the freedom of using his/her hands for texting. Such cases include driving and doing tasks that require the use of both hands. This project hopes to address this problem by offering the same functionality at greater expediency.

    1. Objectives of the Study
    2. The main task of this study is to create a software that will control an interfaced mobile phone through a personal computer. The computer program must be able to accomplish voice recognition and consequently, be able to communicate with the mobile phone. As such, it can instantiate several functions of the mobile phone, which includes the sending and receiving of SMS messages.

      This study aims to accomplish the following:

      1) To record several voice samples and compare them based on the amplitude of the frequencies;

      2) To interface a mobile phone with a personal computer;

      3) To construct a hardware circuit that will display the status and number of sent and received messages; and,

      4) To send and receive messages via the mobile phone connected to a computer.

    3. Methodology
    4. Realizing the issues discussed above, the study proposed to create a software using Microsoft Visual Basic 6.0. Microsoft Visual Basic is an easy-to-learn programming language. It is very flexible and can be easily integrated with other programming languages. It allows the simple construction of a user-interface front panel.

      The project was divided into two modules. The first module is in charge of the following:

      (1) It should be able to access the voice data inputted by the user in the soundcard;

      (2) It should be able to compare this voice data with the voice samples pre-stored in the database; and,

      (3) When a match is found, it should be able to determine and access the template corresponding to the voice input, and save this template to a default text file (text message.txt) that is to be accessed by the second module.

      The second module, on the other hand, is in charge of the following:

      (1) It should be able to interface the mobile phone with the personal computer;

      (2) It should be able to access the text file (text message.txt) and send the contents of this file to a mobile phone number;

      (3) It should be able to determine if a SMS message has been received by the mobile phone. The message received must be simultaneously displayed on the screen and spoken out loud;

      (4) It should be able to determine the status and number of SMS messages sent and received by the mobile phone.

      The basic flow of the actual voice texting procedure is illustrated in the flowchart shown in Figure 1. The purpose of this procedure is for the user to send commonly used SMS messages saved in the computer as templates by virtue of voice recognition of keywords pre-recorded by the user.

      The user first speaks out a keyword through the microphone. This sound data input is then accessed from the sound card and compared with existing voice samples stored in the database. Aside from the voice samples, templates containing commonly used messages sent by the user are also stored in the computer.

       


      Fig. 1. Flow chart of the Voice Texting Procedure

      A one-to-one correspondence exists between each voice sample (pre-recorded keyword) and each template. Comparison of the voice samples and the sound data input is based on the amplitude of the frequencies. When the amplitude of the frequencies of the sound data input lies within 10% that of the existing voice samples, a match is said to be found. The template corresponding to this keyword is then accessed and the contents of this template sent. A hardware circuit indicates when the message was successfully sent or not, and displays the total number of sent messages.

    5. Scope and Limitations

    The project mainly focused on the comparison of voice samples and the accessing of the mobile phone connected to the personal computer. The program was implemented using the Visual Basic software. Further implementations of the project in other programming languages are recommended.

    The basis for the comparison of the voice samples was limited to the amplitude of the frequencies only. Other properties of the voice samples were not explored. Moreover, the magnitude of the frequencies was not normalized. The processing of the voice samples was limited to the determination of the three peaks of the voice samples by Discrete Fourier Transform (DFT).

    It must also be noted that in the interfacing of the mobile phone with the personal computer, a Nokia mobile phone and a compatible standard Nokia GSM data cable were used. Should other phones be used, a compatible data cable must be employed.

    This project exploits the SMS or texting capability of mobile phones. There are no implementations undertaken to incorporate the handling of incoming and outgoing calls. Furthermore, it is important to note that in receiving a SMS message, only the mobile phone number of the sender is displayed. With regards to the Microsoft speech object library which supplies the text-to-speech capability included in the program, only words constructed in formal English are allowed and spoken out clearly.

    Lastly, the hardware circuit was limited to displaying the number of sent and received messages to one digit each.

  5. HARDWARE SETUP AND Configuration
  6. The study was divided into two components: hardware and software development. The hardware part was tasked to create a circuit that will determine the status and number of messages sent and received.

    The hardware part of the study is mainly composed of light emitting diodes (LEDs) and seven-segment LEDs. The light emitting diodes shall exhibit a particular pattern when a message was successfully sent and received by the mobile phone, and a different pattern is displayed when message sending has failed. It also involves the interfacing of the circuit with the computer’s parallel port through which signals are sent by the computer and received from the circuit.

     

    Fig. 2. Hardware Circuit

    Figure 2 above shows the circuit constructed for this project. The objectives of the circuit include displaying the number of messages successfully sent and received, and indicating through LED patterns if a message has been successfully sent or not, or if a message was received.

    The components of the circuit include four light-emitting diodes (LEDs), two seven-segment LEDs, two MN4511 ICs (BCD to 7-segment latch/decoder) and inputs from the output of the parallel port of the computer. The ICs require an external power supply of +5V (Vcc).

    The parallel port of the computer has three parts: the data port, the status port, and the control port. The pins of the parallel port that are used in this circuit are eight pins from the data port (pins 2 to 9) and four pins from the control port including the low-active STROBE (pin 1), the low-active Autofeed (pin 14), the INIT (pin 16), and the low active Select input (pin 17). Pin 25 of the parallel port was used as ground. The data port output of the PC was divided into two sets of four. Pins 2 to 5 were allocated for the displaying of the number of messages sent, while pins 6 to 9 were used for the displaying of the number of messages received. Pins 2 to 5 were used as inputs to a MN4511 chip. The eight outputs of the chip were connected to a seven-segment LED for the final display of the number of messages sent. Pins 6 to 9 were used as inputs to the other MN4511 chip, and the output of the chips were connected to the other seven-segment LED for the final display of the number of messages received. Pins 1, 14, 16, and 17 were each connected to a red LED. These four LEDs serve as indicators of whether message sending was successful or not. If the sending of the message is successful, this pattern will be seen on the LEDs – all four LEDs will sequentially light up starting from the leftmost LED to the rightmost LED, and this pattern will repeat four times. Then all LEDs light up for some time and then all are turned off. On the other hand, if the sending of the message has failed, this pattern will be seen on the LEDs – all LEDs will blink five times and then turned off.

    Visual Basic was used for programming the parallel port outputs of the PC. A DLL file was used (Win95io.dll) which enabled the function "vbout portnumber,data". The port number of the data port is 888 in decimal, while the port number of the control port is 890 in decimal. The corresponding data that will output a binary word needed for the circuit will be used in the vbout function.

  7. software setup and configuration
  8. To speed up and simplify the development of the software for this project, the program was divided into subprograms (forms in Visual Basic) in charge of handling particular functions. These forms were then integrated together into one final executable program. Discussed in this part of the paper will be the different subprograms and menus used in the design of the software. The different parts of the software are (1) Main Menu, (2) Template Menu, (3) Training and Record Menu, (4) Voice Texting Menu, (5) Sending a SMS Message, and (6) Receiving a SMS Message.

      1. Main Menu
      2. Fig 3. The Main Menu Front Panel.

        The main menu contains several submenus corresponding to the different functions available to the user. These functions include the customization of templates, the training and actual recording of voice samples that shall serve as the basis for voice recognition, and the voice texting function which handles voice recognition and the sending of SMS messages. A button is also provided allowing the user to view the Receive Panel which handles incoming SMS messages. Figure 3 shows the Main Menu’s front panel.

      3. Template Menu
      4. This menu allows the user to customize the templates that the program accesses when sending a message. The Template Menu front panel displays the existing templates in the database and several other options. These include creating a new template, editing an existing template, and deleting an existing template as shown in Figure 4 below.

        Fig 4. The Template Menu Front Panel.

        When creating a new template, the user inputs the message and the name for the template. The program then creates a text file containing the message supplied by the user and adds this file to the master list of existing templates. In the case that a file of the same name already exists, the user will be prompted accordingly, and has the option of overwriting the existing file or not.

        When editing an existing template, the user supplies the name of the template whose message he wants to edit. The program opens the corresponding text file and displays its contents on the screen. The user can then make the necessary revisions and save them. Should the user specify a file that does not exist in the database, the program notifies the user and provides him with the option of creating such a template.

        Finally, when deleting an existing template, the user inputs the name of the template he wishes to delete. If such a file exists, this template is automatically deleted from the master list of existing templates. Otherwise, the user will be prompted that no such template can be deleted.

      5. Training and Record Menu
      6. In accessing the voice data in the soundcard, a program using Visual Basic was developed. In this program, DirectX7 was utilized to access the audio data stored in the sound card. The audio data retrieved from the sound card is of the wav form. It is important to note here that the wav file has several constant properties that were set by the DirectX7 program. The properties that are of particular interest are the 8-bit mono data per sample and the 22050 Hz sampling rate. The voice input data also has a specific time limit of input.

        In comparing the inputted wav file from the soundcard, we first create the database of wav files. To create a wav file and store it in the database, the user needs to go through the Training Menu. The Training Menu Panel is shown in Figure 5.

        Fig 5. The Training and Record Menu Front Panel.

        The user first inputs the filename of the wav file to be recorded. This is in turn added to the list of wav files in the database. The user then records 10 wav file samples for that keyword. When the voice is recorded, it is subjected to Discrete Fourier Transform (DFT). In this manner, we are able to get the different sine and cosine coefficients of the component waves of the input voice. After getting the coefficients, their corresponding magnitudes are taken. It was observed that the 3 highest values of the magnitudes could be constantly found in 3 regions. The highest values in each region are referred to as peaks and are the important values for each sample. Should two or more different wav files have peaks that are within 10% of each other, these are deleted from the database to avoid the probability that the program will find more than one match for a specific sound data input. After the samples have been recorded, the program is now ready to compare the input voice with the wav files in the database.

      7. Voice Texting Menu

    The Voice Texting Menu is in charge of voice recognition and the sending of SMS messages when a match is found between the sound data input and one of the existing voice samples in the database.

    Since the wav database files are characterized by their set of peaks, this serves as the means for comparing them with the input wav. If at least two of the three peaks of the input wav are within 10% difference of the peaks of each of the wav file in the database, then the input wav is identified as identical to that particular wav file in the database.

    Figure 6 below shows the panel that indicates whether a match is found between the input wav and the wav files in the database. In this panel, we can see the list of existing wav files in the database, the three peaks of the input wav, the name of the wav file that the input wav matched to, and the name of the template corresponding to the matched wav file. When a match is found, the user has the option of sending the template by pressing the Send button. Consequently, the text template to be accessed is that corresponding to the wav file in the database. The contents of the template is rewritten to a text file named text message.txt. The next module is then prompted that the SMS message to be sent is contained in this text file.

    Fig 6. Panel that indicates whether a match is found between the input wav and the existing wav files in the database. This panel contains the relevant information pertaining to the results of the comparison between the wav files.

    Should the program find a wrong match, the user also has the option of inputting a new keyword by pressing the Try Again button. The user supplies a keyword which will be compared to the wav files as before. On the other hand, when a match is not found, the textboxes pertaining to the matched wav file and the corresponding template would be empty. The user has no option other than trying again or exiting the Voice Texting Menu.

    5) Sending a SMS Message

    Sending an SMS message through the mobile phone is another feature of this software. The user has the option of using the templates existing in the program or writing his own message directly through a dialog box. In order to use the templates, the user should go through the process of voice recognition available in the Voice Texting Menu, and attain authorization for accessing the template. However, the write message dialog box is readily available anytime on the user-interface of the program. A sample Write Message dialog box is shown below in Figure 7.

    Fig 7. Write Message Dialog Box.

    The process of sending a message starts from the Receive Message user-interface. Several buttons are located on the Receive Message form. Two of these can lead to the Send Message form. First, using the Main Menu button, the user can access the templates by speaking the keyword through the microphone. If the program recognizes the keyword, access to the corresponding template is granted to the user. The Send Message form then appears on the screen. The other way to send a message would be by using the Send button on the Receive Message form. This button immediately pops up the Send Message form. From this form, the user can write his own message to send. Figure 8 shows the Send Message form.

    Before the user can send a message, he must provide a destination number for the message. The destination number can be directly inputted through a dialog box. However, for frequently used mobile phone numbers, the user can store the numbers in the program directory. The directory has features such as searching, adding and deleting mobile phone numbers. The directory has a capacity of 1000 entries.

    With the send message feature, the user can reply to an SMS message with a pre-constructed message (template) or with a customized message. With this software, sending a message is made quick and easy.

    Fig 8. The Send Message Front Panel

    6) Receiving a SMS Message

    The first thing that the program actually does is to connect the computer to the mobile phone. This is achieved by using a dynamic-link library file included in a cellular data suite. This DLL then supplies the program with a component that has three functions where one is useful in this project, the function shortsmsreceived. This function is executed whenever an incoming SMS message has been received and thus is termed as an event.

    Fig 9. Log File Dialog Box.

    An initial window similar to the one shown in Figure 9 above appears before the user, prompting the user to specify a file path and filename that would act as the log file for the incoming messages. The received messages are being saved one by one in this log file. This enables the user to review any received messages at any time. This also makes sure that the received message will not be lost if the user accidentally deletes the messages in his cellular phone. If there is no filename specified and the user pressed the OK button, the program displays an error message and immediately exits the program. The program continues if the filename to be created for output is valid.

    Once the computer has been connected to the cellular phone, the program then continuously polls for an incoming SMS message. This is achieved when the component is initialized to start listening to events. The window that would be displayed just right after initializing is shown in Figure 10.

    The upper box would contain the received message and the lower box would contain all the received messages since the program was started. Clicking on the Send command button connects the user to the Send Message panel.

    Once a message is received, the shortsmsreceived function is called and the program gets the message received in the cellular phone memory and displays it to the user in the received message text box. The sender’s mobile phone number is also displayed near the received message text box. The program also activates the hardware visual indicators that shows blinking lights to let the user know that a message is received. This is useful when the user is far away from the setup. The total number of messages received since opening the program is also displayed in the hardware circuit.

    Fig 10. Receive Message Front Panel.

    If the received message box is not empty, then the message is voiced-out to the user. This is done using a Microsoft speech object library. This DLL is also included in the project as a reference and thus gives the program text-to-speech capabilities. The database/dictionary of the speech object is limited to formal English construction of words and thus, it cannot clearly pronounce words in Filipino or any other language other than English. Moreover, the use of abbreviations and compressed words is not recommended. The user also has the option to voice-out any message in the message text box. This is achieved by clicking the Speak It command button that calls the speak function, thus voicing-out the displayed text.

    Once the displaying and voicing-out of the message is done, the program once again continuously polls for another incoming SMS message. Clicking on the Exit button terminates the connection of the program to the mobile phone, thus stopping it from listening to events. The whole program is then closed.

  9. conclusion

Short Messaging Service proves to be one of the core features of mobile phones, and being the more common form of communication nowadays, efforts at improving this feature are currently in progress.

This project offers to provide users with the same SMS or texting capability, only at greater ease and speed. It offers flexibility in sending commonly used messages simply by speaking out a keyword. With this, efforts at texting these messages will be minimized and the accuracy of the messages secured.

In this project, we were able to record voice data in the form of a wav file. A program written in Visual Basic is able to perform voice recognition based on the amplitude of the frequencies of the wav data. Interfacing of the mobile phone with the personal computer was implemented thereby making it possible to control the sending of SMS messages through the computer. At the same time, SMS messages can be received with the added feature of hearing the messages actually spoken out and a record of all the messages received kept intact.

References

  1. http://msdn.microsoft.com/library/default.asp?url=/workshop/components/activex/intro.asp
  2. http://ourworld.compuserve.com/homepages/JDebord/fft.htm
  3. http://www.ora.com/centers/gff/formats/micriff/index.htm
  4. http://www.vbtutor.net/vbtutor.html
  5. http://216.26.168.92/beginning/vbtutorial/
  6. http://download.com.com/3000-2401-2035992.html
  7. http://www.aaroncake.net/electronics/vblpt.html
  8. http://www.intersrv.com/~dcross/fft.html
  9. Audio Interchange File Format.htm
  10. COMP630 WAV file format.htm
  11. Microsoft WAVE soundfile format.htm
  12. Reading and writing WAV files.htm
  13. WAVE File Format.htm