Voice User Interfaces (VUI) are rapidly gaining popularity, revolutionizing user interaction with technology through the widespread adoption in devices such as desktop computers, smartphones, and smart home assistants, thanks to significant advancements in voice recognition and processing technologies. Over a hundred million users now utilize these devices daily, and smart home assistants have been sold in massive numbers, owing to their ease and convenience in controlling a diverse range of smart devices within the home IoT environment through the power of voice, such as controlling lights, heating systems, and setting timers and alarms. VUI enables users to interact with IoT technology and issue a wide range of commands across various services using their voice, bypassing traditional input methods like keyboards or touchscreens. With ease, users can inquire in natural language about the weather, stock market, and online shopping and access various other types of general information.However, as VUI becomes more integrated into our daily lives, it brings to the forefront issues related to security, privacy, and usability. Concerns such as the unauthorized collection of user data, the potential for recording private conversations, and challenges in accurately recognizing and executing commands across diverse accents, leading to misinterpretations and unintended actions, underscore the need for more robust methods to test and evaluate VUI services. In this dissertation, we delve into voice interface testing, evaluation for privacy and security associated with VUI applications, assessment of the proficiency of VUI in handling diverse accents, and investigation into access control in multi-user environments.
We first study the privacy violations of the VUI ecosystem. We introduced the definition of the VUI ecosystem, where users must connect the voice apps to corresponding services and mobile apps to function properly. The ecosystem can also involve multiple voice apps developed by the same third-party developers. We explore the prevalence of voice apps with corresponding services in the VUI ecosystem, assessing the landscape of privacy compliance among Alexa voice apps and their companion services. We developed a testing framework for this ecosystem. We present the first study conducted on the Alexa ecosystem, specifically focusing on voice apps with account linking. Our designed framework analyzes both the privacy policies of these voice apps and their companion services or the privacy policies of multiple voice apps published by the same developers. Using machine learning techniques, the framework automatically extracts data types related to data collection and sharing from these privacy policies, allowing for a comprehensive comparison.
Next, researchers studied the voice apps' behavior to conduct privacy violation assessments. An interaction approach with voice apps is needed to extract the behavior where pre-defined utterances are input into the simulator to simulate user interaction. The set of pre-defined utterances is extracted from the skill's web page on the skill store. However, the accuracy of the testing analysis depends on the quality of the extracted utterances. An utterance or interaction that was not captured by the extraction process will not be detected, leading to inaccurate privacy assessment. Therefore, we revisited the utterance extraction techniques used by prior works to study the skill's behavior for privacy violations. We focused on analyzing the effectiveness and limitations of existing utterance extraction techniques. We proposed a new technique that improved prior work extraction techniques by utilizing the union of these techniques and human interaction. Our proposed technique makes use of a small set of human interactions to record all missing utterances, then expands that to test a more extensive set of voice apps.
We also conducted testing on VUI with various accents to study by designing a testing framework that can evaluate VUI on different accents to assess how well VUI implemented in smart speakers caters to a diverse population. Recruiting individuals with different accents and instructing them to interact with the smart speaker while adhering to specific scripts is difficult. Thus, we proposed a framework known as AudioAcc, which facilitates evaluating VUI performance across diverse accents using YouTube videos. Our framework uses a filtering algorithm to ensure that the extracted spoken words used in constructing these composite commands closely resemble natural speech patterns. Our framework is scalable; we conducted an extensive examination of the VUI performance across a wide range of accents, encompassing both professional and amateur speakers. Additionally, we introduced a new metric called Consistency of Results (COR) to complement the standard Word Error Rate (WER) metric employed for assessing ASR systems. This metric enables developers to investigate and rewrite skill code based on the consistency of results, enhancing overall WER performance.
Moreover, we looked into a special case related to the access control of VUI in multi-user environments. We proposed a framework for automated testing to explore the access control weaknesses to determine whether the accessible data is of consequence. We used the framework to assess the effectiveness of voice access control mechanisms within multi-user environments. Thus, we show that the convenience of using voice systems poses privacy risks as the user's sensitive data becomes accessible. We identify two significant flaws within the access control mechanisms proposed by the voice system, which can exploit the user's private data. These findings underscore the need for enhanced privacy safeguards and improved access control systems within online shopping. We also offer recommendations to mitigate risks associated with unauthorized access, shedding light on securing the user's private data within the voice systems. / Computer and Information Science
Identifer | oai:union.ndltd.org:TEMPLE/oai:scholarshare.temple.edu:20.500.12613/10286 |
Date | 04 1900 |
Creators | Shafei, Hassan, 0000-0001-6844-5100 |
Contributors | Tan, Chiu C., Gao, Hongchang, Wang, Yu, Hiremath, Shivayogi |
Publisher | Temple University. Libraries |
Source Sets | Temple University |
Language | English |
Detected Language | English |
Type | Thesis/Dissertation, Text |
Format | 329 pages |
Rights | IN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available., http://rightsstatements.org/vocab/InC/1.0/ |
Relation | http://dx.doi.org/10.34944/dspace/10248, Theses and Dissertations |
Page generated in 0.0025 seconds