S Sidney Fels

Professor

Research Interests

Human Computer Interaction
biomechanical modeling of human anatomy
Machine Learning
new interfaces for musical expression
speech synthesis

Relevant Thesis-Based Degree Programs

Research Options

I am available and interested in collaborations (e.g. clusters, grants).
I am interested in and conduct interdisciplinary research.
I am interested in working with undergraduate students on research projects.
 
 

Research Methodology

3D modeling and simulation
artisynth 3D modeling and simulation environment

Recruitment

Master's students
Doctoral students
Postdoctoral Fellows
2025
2026
2027

see my lab website: hct.ece.ubc.ca

I support public scholarship, e.g. through the Public Scholars Initiative, and am available to supervise students and Postdocs interested in collaborating with external partners as part of their research.
I support experiential learning experiences, such as internships and work placements, for my graduate students and Postdocs.
I am open to hosting Visiting International Research Students (non-degree, up to 12 months).
I am interested in supervising students to conduct interdisciplinary research.

Complete these steps before you reach out to a faculty member!

Check requirements
  • Familiarize yourself with program requirements. You want to learn as much as possible from the information available to you before you reach out to a faculty member. Be sure to visit the graduate degree program listing and program-specific websites.
  • Check whether the program requires you to seek commitment from a supervisor prior to submitting an application. For some programs this is an essential step while others match successful applicants with faculty members within the first year of study. This is either indicated in the program profile under "Admission Information & Requirements" - "Prepare Application" - "Supervision" or on the program website.
Focus your search
  • Identify specific faculty members who are conducting research in your specific area of interest.
  • Establish that your research interests align with the faculty member’s research interests.
    • Read up on the faculty members in the program and the research being conducted in the department.
    • Familiarize yourself with their work, read their recent publications and past theses/dissertations that they supervised. Be certain that their research is indeed what you are hoping to study.
Make a good impression
  • Compose an error-free and grammatically correct email addressed to your specifically targeted faculty member, and remember to use their correct titles.
    • Do not send non-specific, mass emails to everyone in the department hoping for a match.
    • Address the faculty members by name. Your contact should be genuine rather than generic.
  • Include a brief outline of your academic background, why you are interested in working with the faculty member, and what experience you could bring to the department. The supervision enquiry form guides you with targeted questions. Ensure to craft compelling answers to these questions.
  • Highlight your achievements and why you are a top student. Faculty members receive dozens of requests from prospective students and you may have less than 30 seconds to pique someone’s interest.
  • Demonstrate that you are familiar with their research:
    • Convey the specific ways you are a good fit for the program.
    • Convey the specific ways the program/lab/faculty member is a good fit for the research you are interested in/already conducting.
  • Be enthusiastic, but don’t overdo it.
Attend an information session

G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.

 

ADVICE AND INSIGHTS FROM UBC FACULTY ON REACHING OUT TO SUPERVISORS

These videos contain some general advice from faculty across UBC on finding and reaching out to a potential thesis supervisor.

Graduate Student Supervision

Doctoral Student Supervision

Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.

Instructors' and students' needs for next generation video in education (2022)

As education moves towards a more digital experience, teachers and students are increasingly using video technology. This dissertation is composed of three studies that explored the use of video from both sides of the teaching and learning paradigm: an Instructor study, a Student study, and a Video Highlighting study.In the Instructor study, 16 instructors who teach with video were interviewed. Instructors use video because students are more likely to watch videos before class than read textbooks. Further, using a flipped classroom model and moving lectures into pre-class video enables active learning during class time. However, creating videos is not a trivial task, and there are limited ways that instructors can assess if their students have watched and/or understood the videos. Instructors are eager to leverage digital data from students' video use to generate both aggregate and individual level data about how students are using video for learning.In the Student study, we deployed a custom video player to five cohorts of an undergraduate chemistry class across three years. Students (n=248) used the video player to view nine videos per semester. Data were collected through activity traces generated from logs and a subset of students were interviewed. Students familiarised themselves with the content by watching sequentially and clarified their knowledge by re-watching. When students reviewed a video in preparation for a test, they searched through the video to find what they needed. Students optimised their use of video by spending more time on parts of the videos that were tied to their grades. Finally, in the Video Highlighting study, we introduced a method for highlighting a transcript and a filmstrip series of thumbnails of a video. A controlled laboratory study with 11 students revealed that for search tasks in video, users were able to find previously highlighted parts of video quickly, but transcripts were preferred over the filmstrip highlighting. The use of video in education is continuing to grow. Instructors use video to promote student engagement, yet future work is needed to make video easier to produce, evaluate, search and organise.

View record

Machine learnt treatment: machine learning and registration techniques for digitally planned jaw reconstructive surgery (2021)

The continuous advent of novel imaging technologies in the past two decades has created new avenues for biomechanical modeling, biomedical image analysis, and machine learning. While there still is relatively a long way ahead of the biomedical tools for them to be integrated into the conventional clinical practice, biomechanical modeling and machine learning have shown noticeable potential to change the future of treatment planning. In this work, we focus on some of the challenges in the modeling of the masticatory (chewing) system for the treatment planning of jaw reconstructive surgeries. Here, we discuss novel methods to capture the kinematics of the human jaw, fuse information in between imaging modalities, estimate the missing parts of the 3D structures (bones), and solve the inverse dynamics problem to estimate the muscular forces. This research is centered around the human masticatory system and its core component, the mandible (jaw), while focusing on the treatment planning for cancer patients. We investigate jaw tracking and develop an optical tracking system using subject-specific dental attachments and infrared markers. To achieve that, a fiducial localization method was developed to increase the accuracy of tracking. In data fusion, we propose a method to register the 3D dental meshes on the MRI of the maxillofacial structures. We use fatty ellipsoidal objects, which resonate in MRI, as fiducial landmarks to automate the entire workflow of data fusion. In shape completion, we investigate the feasibility of generating a 3D anatomy from a given dense representation using deep neural architectures. We then extend on our deep method to train a probabilistic shape completion model, which takes a variational approach to fill in the missing pieces of a given anatomy. Lastly, we tackle the challenge of inverse dynamics and motor control for biomechanical systems where we investigate the applicability of reinforcement learning (RL) for muscular force estimation. With the mentioned portfolio of methods, we try to make biomechanical modeling more accessible for clinicians, either via automating known manual processes or introducing new perspectives.

View record

Reconciling pixels and percept: improving spatial visual fidelity with a fishbowl virtual reality display (2020)

Virtual Reality (VR) has fundamentally changed how we can perceive three-dimensional (3D) objects in a virtual world by providing pictorial representations as 3D digital percepts rather than traditional 2D digital percepts. However, the way we perceive virtual objects is fundamentally different from the way we perceive real objects that surround us every day. Therefore, there exists a perceptual gap between the virtual and real world. The research described in this dissertation is driven by a desire to provide consistent perception between the two worlds. Bridging the perceptual gap between virtual and physical world is challenging because it requires both understanding technical problems such as modeling, rendering, calibration and sensing, but also understanding how human perceive 3D space. We focus on a Fishbowl VR display to investigate the perceptual gap by introducing new techniques and conducting empirical studies to improve the visual fidelity of digital 3D displays.To create a seamless high-resolution spherical display, we create an automatic calibration approach to eliminate artifacts and blend multiple projections with sub-millimeter accuracy for a multiple-projector spherical display. We also perform an end-to-end error analysis of the 3D visualization, which provides guidelines and requirements for system components. To understand human perception with the Fishbowl VR display, we conduct a user experiment (N=16) to compare spatial perception on the Fishbowl VR display with a traditional flat VR display. Results show the spherical screen provides better depth and size perception in a way closer to the real world. As the virtual objects are depicted by pixels on 2D screens, a perceptual duality exists between the on-screen imagery and the 3D percept which potentially impairs perceptual consistency. We conduct two studies (N=29) and show the influence of the on-screen imagery causing perceptual bias in size perception. We show that adding stereopsis and using weak perspective projection can alleviate perceptual bias. The explorations from this dissertation lay the groundwork for reconciling pixels with percept and pave the way for future studies, interactions and applications.

View record

Design principles for homecare documentation based on classifying and modeling workarounds (2019)

Computer systems are being used in healthcare at an increasing rate, especially in homecare nursing. However, mismatch between the technology and the clinical work has been a concern for clinicians and system designers. This mismatch is a barrier to nurses’ work and results in the need to work around the technology. The purpose of this dissertation is to identify design principles for interactive computer systems that reduce the mismatch. This study had four phases: 1) identification and classification of workarounds; 2) modeling and mapping of workarounds to design features; 3) creation of the mapped design features; and 4) refining and evaluation of the design features. An ethnographic study of homecare nurses who provide care for patients with wounds in Vancouver (n=33, 120 hours), indicated that they create and use workarounds. It is possible that this is a manifestation of unsuccessful adoption of an implemented wound documentation system. A user-centred design process was created to identify design principles for such interactive systems. A model from the literature was adapted, the work situation model, to identify and describe the most common workaround situations and their attributes such as tasks, and resources. The results were validated using a questionnaire (n=58). Furthermore, the identified workarounds were mapped to design principles from the literature, with the use of the workaround situation model attributes. This mapping used measures developed for applications of the technology acceptance model in healthcare, to identify a mapping fit for the workaround situations to a dimension of usefulness or ease of use. These dimensions include items such as increased productivity, and lowered mental/physical effort. The mapped design principles were evaluated and refined in iterations of exploratory prototyping (n=15), and experimental prototyping (n=12). A set of 9 design principles were used to create features for a prototype. This prototype used features such as speech recognition, wearable technology, and smart mobile devices. Results of qualitative data analysis (n=27) and questionnaires (n=11) indicated that the prototype was perceived to be useful, easy to use, and a good task-technology fit. This showed that the design informed by homecare nurses workarounds addresses key aspects of technology acceptance.

View record

Overcoming obstacles in biomechanical modelling: methods for dealing with discretization, data fusion, and detail (2019)

Biomechanical modelling has the potential to start the next revolution in medicine, just as imaging has done in decades past. Current technology can now capture extremely detailed information about the structure of the human body. The next step is to consider function. Unfortunately, though there have been recent advances in creating useful anatomical models, there are still significant barriers preventing their widespread use.In this work, we aim to address some of the major challenges in biomechanical model construction. We examine issues of discretization: methods for representing complex soft tissue structures; issues related to consolidation of data: how to register information from multiple sources, particularly when some aspects are unreliable; and issues of detail: how to incorporate information necessary for reproducing function while balancing computational efficiency.To tackle discretization, we develop a novel hex-dominant meshing approach that allows for quality control. Our pattern-base tetrahedral recombination algorithm is extremely simple, and has tight computational bounds. We also compare a set of non-traditional alternatives in the context of muscle simulation to determine when each might be appropriate for a given application.For the fusion of data, we introduce a dynamics-driven registration technique which is robust to noise and unreliable information. It allows us to encode both physical and statistical priors, which we show can reduce error compared to the existing methods. We apply this to image registration for prostate interventions, where only parts of the anatomy are visible in images, as well as in creating a subject-specific model of the arm, where we need to adjust for both changes in shape and in pose.Finally, we examine the importance of and methods to include architectural details in a model, such as muscle fibre distribution, the stiffness of thin tendinous structures, and missing surface information. We examine the simulation of muscle contractions in the forearm, force transmission in the masseter, and dynamic motion in the upper airway to support swallowing and speech simulations.By overcoming some of these obstacles in biomechanical modelling, we hope to make it more accessible and practical for both research and clinical use.

View record

A novel SPH method for investigating the role of saliva in swallowing using 4D CT images (2017)

The thesis presents novel computer methods towards simulation of oropha-ryngeal swallowing. The anatomy and motion of the human upper airwaywas extracted from dynamic Computed Tomography (CT) data using a noveltool and workflow. A state-of-the-art SPH method is extended to accommo-date non-Newtonian materials in the extracted geometries. A preliminarynumerical experiment of six human oropharyngeal swallows using SmoothedParticle Hydrodynamics (SPH) demonstrates that the methods are robustand useful for simulation of oropharyngeal swallowing.The presence of saliva is well known to be important for mastication,swallowing, and overall oral health. However, clinical studies of patientswith hyposalivation are unable to isolate the effect of saliva from other con-founding factors. The simulation presented in this thesis examines fluidboluses under lubricated and non-lubricated boundary conditions. Upon comparison with medical image data, the experiments suggest that salivadoes not provide a significant lubricative effect on the bolus transit times,but it may serve to reduce residue and therefore improve overall swallowingefficacy. Our findings, while preliminary, corroborate with existing clinicalresearch that finds that groups with hyposalivation do not have significantlydifferent transit times with control groups, but that residue may be increased in the hyposalivation group.Previous studies using computer simulation of fluid flow in the orophar-ynx typically make use of simplified geometries. Our work uses dynamic320-row Area Detector Computed Tomography (ADCT) images as the ba-sis for the simulations, and therefore does not require simplifying geometricassumptions. Since the data are dynamic, motion trajectories are all sup-plied by the ADCT data, and extrapolation from 2D sources such as bi-planevideofluoroscopy is not required. Processing the image data required the de-velopment of a novel workflow based on a new tool, which we call BlendSeg.We utilize and extend Unified Semi-Analytic Wall (USAW) SPH methodsso that orophrayngeal swallowing simulations may be performed. Theseextensions include the simulation of non-Newtonian boluses, and moving3D boundaries. Partial validation of the extended USAW SPH method isperformed using canonical flows.

View record

3D Subject-Specific Biomechanical Modeling and Simulation of the Oral Region and Airway with Application to Speech Production (2016)

The oropharynx is involved in a number of complex neurological functions, such as chewing, swallowing, and speech. Disorders associated with these functions, if not treated properly, can dramatically reduce the quality of life for the sufferer. When tailored to individual patients, biomechanical models can augment the imaging data, to enable computer-assisted diagnosis and treatment planning. The present dissertation develops a framework for 3D, subject-specific biomechanical modeling and simulation of the oropharynx. Underlying data consists of magnetic resonance (MR) images, as well as audio signals, recorded while healthy speakers repeated specific phonetic utterances in time with a metronome. Based on this data, we perform simulations that demonstrate motor control commonalities and variations of the /s/ sound across speakers, in front and back vowel contexts. Results compare well with theories of speech motor control in predicting the primary muscles responsible for tongue protrusion/retraction, jaw advancement, and hyoid positioning, and in suggesting independent activation units along the genioglossus muscle. We augment the simulations with real-time acoustic synthesis to generate sound. Spectral analysis of resultant sounds vis-à-vis recorded audio signals reveals discrepancy in formant frequencies of the two. Experiments using 1D and 3D acoustical models demonstrate that such discrepancy arises from low resolution of MR images, generic parameter-tuning in acoustical models, and ambiguity in 1D vocal tract representation. Our models prove beneficial for vowel synthesis based on biomechanics derived from image data. Our modeling approach is designed for time-efficient creation of subject-specific models. We develop methods that streamline delineation of articulators from MR images and reduce expert interaction time significantly (≈ 5 mins per image volume for the tongue). Our approach also exploits muscular and joint information embedded in state-of-the-art generic models, while providing consistent mesh quality, and the affordances to adjust mesh resolution and muscle definitions.

View record

Third-placeness: supporting the experience of third place with interactive public displays (2016)

In contemporary western cities, socialization often occurs in locations with a mix of public and private characteristics. Oldenburg defined these settings as “Third Places” because they provide a space of conviviality in between the privacy of home and the rigidity of work. Coffee shops and pubs are some of the prototypical Third Places providing a welcoming and neutral atmosphere for conversation that is essential to community development. Consumer computing and telecommunications have impacted how we socialize with each other and use Third Places. This brings about the question of how technology can support Third Places or if technology has a role at all in these settings.We propose an alternative paradigm called “Third-placeness” defined as a state of socialization, of which a Third Place is a physical embodiment. Third-placeness arises when information is uncensored, which minimizes inequalities and differences, and is characterized by low barriers to information access, regularity, lightheartedness and comfort. We identify aspects of Third-placeness and study how a particular type of technology, interactive public displays, could affect these aspects. Through our observations and lessons learned we identify social, public, and physical characteristics of interactive public displays that could support aspects of Third-placeness. Our research contributes a framework, the Sociality, Publicity and Physicality Framework, that organizes aspects and requirements of designing interactive public displays for Third-placeness. It also describes a way in which to communicate about these designs and a way such designs can be approached.

View record

Shaping Video Experiences with New Interface Affordances (2015)

Watching and creating videos have become predominant parts of our daily lives. Video is becoming the norm for a wide range of purposes from entertainment, to training and education, marketing, and communication. Users go beyond just watching videos. They want to experience and interact with content across the different types of videos. As they do so, significant digital traces accumulate on the viewed videos which provide an important source of information for designing and developing tools for video viewing interfaces.This dissertation proposes the next generation video management interfacewhich creates video experiences that go beyond just pushing the play button. It uses how people view and interact with contemporary video to design strategies for future video interfaces. This has allowed the development of new tools for navigating and managing videos that can be easily integrated into existing systems.To help define some design guidelines for the video interface, a behaviouralanalysis of users’ video viewing actions (n = 19) was performed. The resultsdemonstrate that participants actively watch videos and most participants tend to skip parts of videos and re-watch specific portions from a video multiple times. Based on the findings, new fast navigation and management strategies are developed and validated in search tasks using a single-video history (n = 12), a video viewing summary (n = 10) and multiple-videos history (n = 10). Evaluation of results of the proposed tools show significant performance improvements over the state-of-the-practice methods. This indicates the value of users’ video viewing actions.Navigating other forms of videos, such as interactive videos, introduces another issue with the selection of interactive objects within videos to direct users to different portions of the video. Due to the time-based nature of the videos, these interactive objects are only visible for a certain duration of the video, which makes their activation difficult. To alleviate this problem a novel acquisition technique (Hold) is created, which temporally pauses the objects while the user interacts with the target. This technique has been integrated into a rich media interface (MediaDiver) which made such interaction possible for users.

View record

Modeling the Fluid-Structure Interaction of the Upper Airway: Towards Simulation of Obstructive Sleep Apnea (2014)

Obstructive Sleep Apnea (OSA) is a syndrome in which the human Upper Airway (UA) collapses during sleep leading to frequent sleep disruption and inadequate air supply to the lungs. OSA involves Fluid-Structure Interaction (FSI) between a complex airflow regime and intricate mechanics of soft and hard tissue, causing large deformation of the complicated UA geometry. Numerical simulations provide a means for understanding this complex system, therefore, we develop a validated FSI simulation, composed of a 1D fluid model coupled with a 3D FEM solid solver (Artisynth), that is applied to a parameterized airway model providing a fast and versatile system for researching FSI in the UA.The 1D fluid model implements the limited pressure recovery model of Cancelli and Pedley [28] using a dynamic pressure recovery term, area function corrections allowing complete closure and reopening of fluid geometries, and discretization schemes providing robust behavior in highly-uneven geometries. The fluid model is validated against 3D fluid simulations in static geometries and simple dynamic geometries, and proves reliable for predicting bulk flow pressure. Validation of simulation methods in Artisynth is demonstrated by simulating the buckling, complete collapse, and reopening of elastic tubes under static pressure which compare well with experimental results.The FSI simulation is validated against experiments performed for a collapsible channel (a "2D" Starling resistor) designed to have geometry and characteristics similar to the UA. The observed FSI behaviors are described and compared for both experiment and simulation, providing a quantitative validation of the FSI simulation. The simulations and experiments agree quite well, exhibiting the same major FSI behaviors, similar progression from one behavior to another, and similar dynamic range.A parameterized UA model is designed for fast and consistent creation of geometries. Uniform pressure and dynamic flow FSI simulations are performed with this model for numerous parameters associated with OSA. Uniform pressure simulations compare well to clinical data. Dynamic flow results demonstrate airflow limitation and snoring oscillations. The simulations are fast, simulating 1 s of FSI in 30 minutes. This model is a powerful tool for understanding the complex mechanics of OSA.

View record

Designing online social networks to motivate health behaviour change (2013)

Eating nutritious foods and being more physically active prevents significant illnesses such as cardiac disease, stroke, and diabetes. However, leading a healthy lifestyle remains elusive and obesity continues to increase in North America. We investigate how online social networks (OSN) can change health behaviour by blending theories from health behaviour and participation in OSNs, which allow us to design and evaluate an OSN through a user-centred design (UCD) process.We begin this research by reviewing existing theoretical models to obtain the determining factors for participation in OSNs and changing personal health behaviour. Through this review, we develop a conceptual framework, Appeal Belonging Commitment (ABC) Framework, which provides individual determinants (Appeal), social determinants (Belonging), and temporal consideration (Commitment) for participation in OSNs for health behaviour change.The ABC Framework is used in a UCD process to develop an OSN called VivoSpace. The framework is then utilized to evaluate each design to determine if VivoSpace is able to change the determinants for health behaviour change. The UCD process begins with an initial user inquiry using questionnaires to validate the determinants from the framework (n=104). These results are used to develop a paper prototype of VivoSpace, which is evaluated through interviews (N=11). These results are used to design a medium fidelity prototype for VivoSpace, which is tested in a laboratory through both direct and indirect methods (n=36). The final iteration of VivoSpace is a high fidelity prototype, which is evaluated in a field experiment with clinical and non-clinical participants from Canada and USA (n=32). The results reveal positive changes for the participants associated with a clinic in self-efficacy for eating healthy food and leading an active lifestyle, attitudes towards healthy behaviour, and in the stages of change for health behaviour. These results are further validated by evaluating changes in health behaviour, which reveal a positive change for the clinical group in physical activity and an increase in patient activation. The evaluation of the high fidelity prototype allow for a final iteration of the ABC Framework, and the development of design principles for an OSN for positive health behaviour change.

View record

Byte your tongue : a computational model of human mandibular-lingual biomechanics for biomedical applications (2011)

Biomechanical models provide a means to analyze movement and forces in highly complex anatomical systems. Models can be used to explain cause and effect in normal body function as well as in abnormal cases where underlying causes of dysfunction can be clarified. In addition, computer models can be used to simulate surgical changes to bone and muscle structure allowing for prediction of functional and aesthetic outcomes. This dissertation proposes a state-of-the-art model of coupled jaw-tongue-hyoid biomechanics for simulating combined jaw and tongue motor tasks, such as chewing, swallowing, and speaking. Simulation results demonstrate that mechanical coupling of tongue muscles acting on the jaw and jaw muscles acting on the tongue are significant and should be considered in orofacial modeling studies. Towards validation of the model, simulated tongue velocity and tongue-palate pressure are consistent with published measurements.Inverse simulation methods are also discussed along with the implementation of a technique to automatically compute muscle activations for tracking a target kinematic trajectory for coupled skeletal and soft-tissue models. Additional target parameters, such as dynamic constraint forces and stiffness, are included in the inverse formulation to control muscle activation predictions in redundant models. Simulation results for moving and deforming muscular-hydrostat models are consistent with published theoretical proposals. Also, muscle activations predicted for lateral jaw movement are consistent with published literature on jaw physiology.As an illustrative case study, models of segmental jaw surgery with and without reconstruction are developed. The models are used to simulate clinically observed functional deficits in movement and bite force production. The inverse simulation tools are used to predict muscle forces that could theoretically be used by a patient to compensate for functional deficits following jaw surgery. The modeling tools developed and demonstrated in this dissertation provide a foundation for future studies of orofacial function and biomedical applications in oral and maxillofacial surgery and treatment.

View record

Understanding Image Registration: Towards a Prescriptive Language of Computer Vision (2011)

Vision researchers have created an incredible range of algorithms and systems to detect, track, recognize, and contextualize objects in a scene, using a myriad of internal models to represent their problem and solution. However in order to effectively make use of these algorithms sophisticated expert knowledge is required to understand and properly utilize the internal models used. Researchers must understand the vision task and the conditions surrounding their problem, and select an appropriate algorithm which will solve the problem most effectively under these constraints.Within this thesis we present a new taxonomy for the computer vision problem of image registration which organizes the field based on the conditions surrounding the problem. From this taxonomy we derive a model which can be used to describe both the conditions surrounding the problem, as well as the range of acceptable solutions. We then use this model to create testbenches which can directly compare image registration algorithms under specific conditions. A direct evaluation of the problem space allows us to interpret models, automatically selecting appropriate algorithms based on how well they perform on similar problems. This selection of an algorithm based on the conditions of the problem mimics the expert knowledge of vision researchers without requiring any knowledge of image registration algorithms. Further, the model identifies the dimensions of the problem space, allowing us to automatically detect different conditions.Extending beyond image registration, we propose a general framework of vision designed to make all vision tasks more accessible by providing a model of vision which allows for the description of what to do without requiring the specification of how the problem is solved. The description of the vision problem itself is represented in such a way that even non-vision experts can understand making the algorithms much more accessible and usable outside of the vision research community.

View record

Understanding and supporting transitions with large display applications (2010)

Interactive large displays offer exciting new opportunities for collaboration and work. Yet, their size will fundamentally change how users expect to use and engage with computer applications: a likely reality is that such displays will be used by multiple users for multiple simultaneous tasks. These expectations demand a new approach for application design beyond the conventional desktop application model, where applications are single-user, and intended to support a subset of user tasks. In this research, we develop such a framework based on the premise that large display applications should support transitions—users’ desires to shift between multiple tasks and activities. We build this framework from models of how traditional large surfaces such as whiteboards are used to facilitate multiple tasks—often simultaneously. Based on studies of users’ whiteboard use, we construct a classification scheme of users’ activities with whiteboards, and the role of whiteboards in supporting the transitions between these activities. From a study of meeting room activity, we then develop a classification for collocated activity around traditional surfaces. We further develop models of how users’ needs change during their use of large display applications, exploring two contexts: a digital tabletop application for focused collaboration, and a public large display. These studies reveal how users engage and disengage with one another during collaborative work, and the dynamic needs of bystanders. Next, we design and evaluate a prototype that supports transitions between tasks in a scheduling activity using viewing changes. The results demonstrate that users transition between related tasks during such activities, and that viewing changes can support these transitions. Finally, we describe a design space for supporting transitions in large display applications. Taken together, the findings of this research illustrate the fundamental need to develop a new framework for designing large display applications. This work provides a step in this direction by providing rationale and empirical evidence for supporting transitions in this framework. In so doing, it suggests that we realign designers’ efforts from the predominant desktop-centric model of application development, and instead to a model that engenders smooth transitions between multiple, related activities.

View record

Master's Student Supervision

Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.

Can you see me : how good is good enough in 3D teleconferencing? (2023)

The sudden pivot to remote communication triggered by COVID-19 has intensi- fied the need for advanced teleconferencing technologies that closely mimic phys- ical presence. Achieving direct eye contact, understanding physical context, and enabling coordinated non-verbal cues are essential for enriching virtual interac- tions. These prerequisites have led to increased interest in 3D teleconferencing tools that offer an immersive persona projection, enhancing the overall commu- nication experience. Current 3D experiences primarily rely on Virtual Reality (VR) and Augmented Reality (AR) technologies, utilizing Head-Mounted Dis- plays (HMDs), semi-transparent 2D displays, and standalone 3D devices. How- ever, evaluating the Quality of User Experience (QoE) in these immersive environ- ments poses significant challenges, as traditional subjective feedback and objective video quality evaluation methods fall short in addressing VR’s interactive nature. Our study addresses this gap by developing new QoE metrics for 3D teleconfer- encing, leveraging Fish Tank Virtual Reality (FTVR) to simulate presence using stereo and motion parallax depth cues. We report the Just Noticeable Difference (JND) for resolution, latency, jitter, and framerate as 50dpi, 50ms, 0.06mm, 18FPS, respectively, providing tangible thresholds for evaluating the performance and user experience of 3D teleconferencing tools. Moreover, we acknowledge the lack of accessible platforms for comprehensive, end-to-end tests in 3D teleconferencing. While proprietary systems such as Google’s Starline and Microsoft’s Holoportation show promise, their limited accessibility and high costs hinder broader research en- deavors. To rectify this, we have developed an open-source end-to-end platform, enabling researchers to perform comprehensive tests and integrate enhancements into the 3D teleconferencing pipeline.

View record

Into the TongueVerse : unraveling speech motor strategies via inverse atlas modeling (2023)

The intricate and interdigitated musculature of the human tongue presentsa formidable challenge in quantifying functional traits for lingual behaviors, notably in speech articulation. To advance speech research and address treatment strategies, a harmonized approach blending quantitative and qualitative analyses of muscle functions within specific tongue movements is essential. A novel biomechanical ”atlas” model, incorporating morphological features from a diverse range of speakers, has the potential to predict and analyze muscle behavior across distinct subject-specific speech kinematics. This work centers on a Finite Element Model (FEM)constructed from multi-subject atlas MRI data, capturing the biomechanical intricacies of tongue movements during speech tasks such as "a souk" and "a geese." The tongue atlas model is used to inversely predict the muscle activations of eight native English speakers. To illuminate articulation differences within the same consonant sound production in distinct vowel contexts, we perform a Wilcoxon signed-ranked difference test on the estimated muscle activation patterns for each speaker. Our findings reveal that nearly all muscles engaged in producing the same /s/ sound exhibit different activation patterns in "a souk" vs. "a geese" across all speakers.Furthermore, we temporally align the muscle activation patterns for each speaker using a dynamic time warp (DTW) function and define a similarity index to measure resemblances in the employed motor strategies. Our results suggest that some speakers are more likely to employ similar motorstrategies when uttering words than others.

View record

Supporting multitasking in online synchronous classes (2023)

In-class multitasking is a very common practice among university students as a strategy to tackle their many goals and commitments in life. However, research shows that in-class multitasking affects students' learning outcomes and academic performance negatively. Even though many students are aware of these potential drawbacks, in-class multitasking remains persistent in in-person and even more so in online classes. To learn more about students' motivations, perceptions, and challenges in in-class multitasking in online classes and to be able to propose solutions to their challenges, we conducted a small formative study (N=10) where we asked students to keep multitasking diaries in their online classes and followed up with semi-structured interviews. Through this study, we categorized students' challenges with in-class multitasking into 1. cognitive consequences which consist of content loss, context loss, and the overhead in time spent after the class to compensate for multitasking, and 2. metacognitive consequences which consist of the inhibition of time and attention management due to multitasking. Our insights from the preliminary study lead to the overarching requirement in design which is to support in-class multitasking by reducing its adverse effects on learning. Based on the cognitive consequences, our first design requirement is to enable immediate recovery of missed content in the class time frame. To satisfy this requirement, we propose a bichronous learning environment that enables accelerated guided viewing of past/missed content immediately. In an experimental evaluation (N=20) of these design elements, our prototype showed significant improvement in students' learning outcomes when they multitask, compared to a baseline system. Based on the metacognitive consequences, we define our second design requirement as providing assistance in metacognitive monitoring in in-class multitasking by enabling cognitive offloading. We propose topic, activity, and time-based offloading to help students manage their time and attention when multitasking under different circumstances. Our qualitative evaluation (N=9) of these design elements shows promise in their effectiveness in assisting students' self-regulation in in-class multitasking.

View record

It's over there: can intelligent virtual agents point as accurately as humans? (2021)

To support effective pointing interactions with an intelligent virtual agent (IVA), the first question to answer is how accurately users can interpret the direction of IVAs' pointing. In this thesis, we designed an IVA and investigated its capability to point to the real world as accurately as a real person. We used a spherical Fish Tank Virtual Reality (FTVR) display as it provides effective 3D depth cues and is situated in real-world coordinates, allowing IVAs to point to the real world. We first conducted an experiment to determine the pointing cue, a fundamental design factor of our IVA. Specifically, we evaluated the effect of head and hand cues on users' perception of the IVA's pointing. The findings provide design guidelines for selecting pointing cues in virtual environments. Following the guideline, we further determined our IVA's other design factors, including appearance, subtleties on how it points, with rationales elaborated. Using our designed IVA, we conducted an experiment to investigate the difference between the IVA and natural human pointing, measured by users' accuracy of interpreting the pointing to a physical location. Results show that participants can interpret the IVA's pointing to a physical location more accurately than the real person's pointing. Specifically, the IVA outperformed the real person in the vertical dimension (5.2% less error) and yielded the same level of accuracy horizontally. Our IVA design mitigated the pointing ambiguity due to the eye-fingertip alignment commonly found in human pointing, which may account for the IVA's higher pointing accuracy. Thus, our findings provide design guidelines for visual representations of IVAs with pointing gestures.

View record

Leveraging students' handwritten notes to link to watched instructional videos (2021)

Handwritten note-taking with pen and paper is still the preferred medium to achieve information seeking and comprehension from diverse learning objects. But students, especially in video-based learning settings, exercise laborious practices to re-find the corresponding video context when reviewing notes. We propose the orchestration of students' handwritten notebook content as interoperable links to retrieve previously watched instructional videos. This work articulates the research objectives in two phases. In phase 1, we analyzed the characteristic features of notebook content of watched videos. And, in phase 2, we investigated student expectations and requirements of the proposed video retrieval system. Analysis of quality handwritten notebook samples and the related video materials in a lab study with ten engineering students revealed distinctive characteristic representations of note content such as text, formula, figures, and chiefly a hybrid of all the 3. A box plot interpretation of notes and the watched video content confirmed that at least 75% of the identified note samples demonstrated a verbatim overlap of 50% or more with the related video content, hinting at its potential use as a query artifact. Additionally, the video references to collected note samples exhibited referencing at three temporal levels: point, interval, and whole video. A 12-student lab study indicated higher satisfaction for video matches returned at the `interval' level and showcased students' existing workarounds for linking back to videos. Overall, students rated a positive Mean score for the system's usability to re-find note-specific video context. A medium-fidelity prototype was built, using off-the-shelf computer vision algorithms, to deduce technology requirements associated with the proposed approach. When tested on the 181 identified note samples, the prototype system matched 77.5% of the samples to corresponding watched videos. The proposed method worked exceptionally well to find suitable videos for textual notes --- yielding a 98% accuracy. The note content overlap with the video results further highlights the fragmented nature of the evaluated accuracy across all three temporal levels. Overall, the presented work ascertains the prospect of augmenting prevalent Personalized learning (PL) strategies, such as handwriting notes for future reference, to easily re-find and connect to the watched videos.

View record

Moves made easy - deep learning-based reduction of human motor control efforts leveraging categorical perceptual constraint (2021)

The human speech motor control system takes advantage of the constraints in categorical speech perception space to reduce the index of difficulty of articulatory tasks. Taking this for inspiration, we introduce a perceptual mapping from speech-like complex multiple degree-of-freedom (DOF) movement of the hand to a controllable formant space, that allows us to leverage categorical perceptual constraints for reducing the difficulty level of hand motor control tasks. The perceptual network is modeled using long short term memory networks (LSTMs) aimed at optimizing a connectionist temporal classification (CTC) loss function. Our motor control mapping network consists of a graph convolutional neural network (GCNN) combined with LSTM encoder-decoder network, that is regularized with the help of the trained perception model. The mapping allows the user's hand to generate continuous kinematic trajectories at a reduced effort by altering the complexity of the task space. This is a human-in-the-loop system where the user plays the role of an expert assessor in evaluating the degree to which the network is capable of reducing the complexity of task space. Further, we quantitatively formulate the index of difficulty of the task space and the throughput of the user and demonstrate that our model is able to perform consistently well in generating trajectories with considerably reduced effort using mouse and data glove-based input devices.

View record

Reinforcement learning of a feedforward controller with soft actor-critic for a reaching task (2021)

Learning to control is a complicated process, yet humans seamlessly control various complex movements. Motor theory suggests that humans start motor learning by learning to act in a feedforward manner. However, it is still unclear how humans learn feedforward control strategies. We hypothesize that this mechanism is governed by the criterion of success (reinforcement) or failure (penalty) of the task. Taking this for inspiration, we investigate how we can learn a feedforward controller utilizing reinforcement learning. Additionally, we investigate how the factors such as the difficulty of the task and noise present in the motor system are related to human motor control.Hence, a one-dimensional muscle-based biomechanical model is built to create a reaching task setup. The model contains an actuator controlled by the antagonistic and agonistic muscle pair and a goal or target to reach. Then, an end-to-end reinforcement-learning-based feedforward controller is learned to estimate control signals while taking the difficulty levels of a reaching task and noise levels into account. To design the learning-based controller, we adapted the model-free RL algorithm ``Soft Actor-Critic". As a result, during training, we observed that the SAC-based feedforward controller has learned to prepare co-activation to reach a target in the kinematic space using a minimum number of controller predictions. Moreover, we found that the controller has learned to estimate high-amplitude muscle activations as a way to adapt to the noise levels in the motor system. Finally, we conducted information analysis similar to Fitts' analysis to determine how the difficulty of the task and noise affected the controller. The effect of the task's difficulty and the noise in the system is determined by finding the relationship between the number of controller predictions, task difficulty, and the amount of noise. Our analysis demonstrates that the number of controller predictions increases exponentially with the increase in the difficulty of the task with the amount of noise kept constant. A linear relationship exists between the number of controller predictions and the amount of noise with ID kept constant. Additionally, we found that the effect of target width is more dominant than the distance, which confirms Welford's observation.

View record

Snake-based tool for supporting interactive tooth segmentation from 3D mandibular meshes (2021)

Mandibular meshes segmented from computerized tomography (CT) imagescontain rich information of the dentition. Inconsistent dentition conditionsin healthy mandible data sets can impair data-driven premorbid shape predictionfor diseased mandibles. We developed a mesh segmentation methodthat includes a preprocessing algorithm using an off-the-shelf non-rigid registration,a surface mesh feature function, and an active contour model usinggeodesic distance. Constructive Solid Geometry (CSG) operations areemployed to separate the dentition area from the mandibular mesh. Aneasy-to-use interactive tool was implemented, allowing users to adjust thecontour position.We evaluated our method (preprocessing algorithm and user intervention)by comparing it with the traditional method of manual removal using3D Slicer. The results indicated that our method helped save the manualprocessing time by 40%, which largely improves the efficiency. From astatistical-shape-modeling-based shape completion test, we drew the conclusionthat edentulous mandibular data set could help make significantlybetter premorbid shape predictions (Z=-2.484,p=0.013) than data set withmixed dentition conditions.Besides tooth segmentation from 3D meshes, our research can assistvirtual planning for bone graft placement in the defective mandible andimplant placement in the bone graft. This work forms the underlying basisof a useful tool for coupling jaw reconstruction and restorative dentition forpatient treatment planning.

View record

Talking tube: a novel approach for vocal tract acoustic modelling using the finite-difference time-domain method (2021)

The human voice is a complex but unique physiological process. It involves the neuromuscular control of articulators to form an intricate upper vocal tract geometry, which yields different speech sounds. The existing computational vocal tract models have significant limitations concerning acoustic precision and simulation performance. The high-dimensional vocal tract models can compute precise acoustic wave propagation at the expense of simulation run-time. This thesis aims to fill these lacunae through two major contributions.Firstly, we introduce a novel vocal tract that extends the existing two-dimensional (2D) vocal tract modelling approach while having three-dimensional behaviour. The proposed model (2.5D FDTD) employs the Finite-Difference Time-Domain numerical scheme to discretize and compute acoustic components on a staggered grid. The simulated acoustic outputs of our new model are shown to match with the 2D FDTD vocal tract model at a low spatiotemporal resolution for open static vocal tract shapes. Contrary to 2D FDTD, the model adds tube depth as an additional impedance parameter to the acoustic wave solver by lumping off-plane waves. This technique offers an excellent balance between computational cost and acoustic precision while promising better geometrical flexibility for vocal tract modelling.Secondly, we tested the model's basic capabilities through the acoustic simulation of cardinal English vowel sounds. For realistic modelling of vowel sounds, we built a vocal tract radiation model. We also couple the 2.5D vocal tract with a self-oscillatory lumped-element vocal fold model to illustrate a fully connected articulatory speech synthesizer. This study offers a speech synthesis tool that can generate static vowel sounds and set up a new pathway for lightweight vocal tract modelling and other computational acoustic research.

View record

A comparison between XFEM and SPH in solving two-dimensional fracture mechanics problems, with applications in food breakdown modeling (2020)

Masticatory performance and the occlusal force are two of the main clinical metrics that are used to evaluate the masticatory function objectively. A comprehensive evaluation of masticatory function requires a correlative inspection of these two metrics. The complex multi-variant nature of the human mastication and the limitations of visualization and clinical measurement techniques, complicates the clinical investigation of masticatory function. A biomechanical model of oral food breakdown has the ability to bypass these difficulties. The currently available food breakdown models are either highly dependent on experimental data or are focused on food engineering applications. In this thesis, we attempted to solve these issues by building a two-dimensional fracture mechanics model to simulate the oral food breakdown. The different computational methods available to solve fracture mechanics problems have limits and strengths, which affects the accuracy of their solution. Extended Finite Element Method (XFEM) and Smoothed Particle Hydrodynamics (SPH) method use two very distinct approaches to solve fracture mechanics problems; comparing the effectiveness of these two methods can provide valuable insights into the computational possibilities. As the classical SPH formulation for solid mechanics suffers from numerical deficiencies, we first performed a set of modifications to build a corrected SPH model for solid and fracture mechanics. We solved fracture mechanics benchmark tests using XFEM and the modified version of SPH and investigated their strengths and weaknesses thoroughly. The SPH method eventually was selected to model the food breakdown procedure. We simulated the food breakdown following one chewing stroke using our two-dimensional SPH fracture model and measured the corresponding occlusal force and masticatory performance for a range of different food properties. The food breakdown model was able to simulate the experimental correlation between masticatory performance and the food properties. Although the simulated measurements for occlusal force were in accordance with the previous experimental studies, further detailed clinical investigation is required to validate the force pattern during chewing. The simplified biomechanical model of oral food comminution described in this work can be regarded as the first step toward building a patient-specific model to pre-assess the patient’s masticatory function following a maxillofacial reconstructive surgical plan.

View record

Predicting occlusal force and area through a biomechanical simulation of mastication and controlled study (2020)

Currently, most evaluations of patient outcome following mandibular reconstructive surgery are defined by a combination of qualitative analyses consisting of patient-reported functional ability and masticatory performance. Metrics such as occlusal pressure and jaw kinematics provide quantitative assessments of masticatory function, facilitating a more comprehensive evaluation of patient outcomes. This thesis proposes a novel virtual mastication framework for evaluating occlusal force and area based on metrics of masticatory force and kinematics taken in a clinical setting. Statistical shape modeling was used to develop a mandible atlas based on the morphological averages which contribute to both universal model creation and prediction of missing anatomy. The simulation was able to predict clinically verified maximum occlusal forces and contact areas based on data inputs of intraoral dentition scans and jaw constraints provided through a controlled study of healthy volunteers. In assembly with this framework, a validation study of an occlusal force and contact area measurement system (Dental Prescale II) was performed to gain principal masticatory function information with measured accuracy. This work serves as a foundation for implementing virtual tools within the maxillofacial reconstructive surgery clinical workflow.

View record

Video annotations in helping locate in-video information for revisitation (2019)

Rewatching video segments is common in video-based learning, and video segments of interest need to be located first for this rewatching. However, learners are not well supported in the process of locating in-video information. To fill this gap, the presented work explores whether video annotations are effective in helping learners locate previously seen in-video information. A novel interface design consisting of two components for learning with videos is proposed and tested in the task of locating in-video information: an annotating mechanism based on an integration of text with video, and an annotation manager which enables the learner to see all annotations he/she has made on a video and provides quick access to video segments. A controlled lab experiment with 16 undergraduate students as subjects was carried out. Experiment results suggested that the use of video annotations significantly reduced time spent on searching for previously seen video segments by about 5 seconds (p
View record

Development and application of a description-based interface for 3D reconstruction (2018)

Advancements in state-of-the-art 3D reconstruction algorithms have sped ahead of the development of interfaces or application programming interfaces (APIs) for developers, especially to those who are not experts in computer vision.In this thesis, we have designed a novel interface, specifically for 3D reconstruction techniques, which uses a description (covering the conditions of the problem) to allow a user to reconstruct the shape of an object without knowledge of 3D vision algorithms. The interface hides the details of algorithms by using a description of visual and geometric properties of the object. Our interface interprets the description and chooses from a set of algorithms those that satisfy the description. We show that this description can be interpreted to one appropriate algorithm, which can give a successful reconstruction result.We evaluate the interface through a proof of concept interpreter, which interprets the description and invokes one of three underlying algorithms for reconstruction. We demonstrate the link between the description set by the user and the result returned using synthetic and real-world datasets where each object has been imaged with the appropriate setup.

View record

Investigation of a quick tagging mechanism to help enhance the video learning experience (2018)

Video continues to be used extensively as an instructional aid within modern educational contexts, such as in blended (flipped) courses, self-learning with MOOCs (Massive Open Online Courses), informal learning through online tutorials, and so on. One challenge is providing mechanisms for students to efficiently bookmark video content and quickly recall and review their video collection. We have run a background study to understand how students annotate video content, focusing especially on what words they would use most to bookmark video content. From this study, we proposed to leverage a quick tagging mechanism in an educational video interface comprised of a video filmstrip and transcript, both presented adjacent to a video player. The 'quick' tagging is defined as an easy and fast way to mark course video parts with predefined semantic tags. We use the metaphor of marking and highlighting textbook to achieve our quick tagging interaction. This mechanism was evaluated in a controlled study with handwritten notes. We found that participants using our quick tagging interface spent around 10% longer watching and learning from video on average than when taking notes on paper. Our participants also reported that tagging is a useful addition to instructional videos that helps them recall video content and finish learning tasks.

View record

The effects of immersion and increased cognitive load on time estimation in a virtual reality environment (2018)

The perceived duration of a time interval can seem shorter or longer relative to real time (i.e., solar time or clock time) depending on what fills that time interval. Research has suggested that increased immersion alters a users ability to reproduce a given duration whilst doing a simple task or playing a game in an Immersive Virtual Environment (IVE). Virtual Reality (VR) allows users to experience virtual environments similar to the real world. The contribution of this experimentalresearch is to explore the effects of undertaking a cognitive spatial task and immersion within a VR environment on a persons perception of time. A VR experience using a cognitive task (maze navigation) was compared with a non VR (control) experience of the same task to explore if the effects exist and if the effects are more significant in an IVE compared to a screen-based simple multimedia experience. Also, a VR experience of the environment without any task was compared to the same environment with the cognitive task to establish the effect of a spatial cognitive task on temporal perception. More specifically, this study measured how much temporal distortion is achievable utilizing cognitive tasks in a VR experience. In this thesis the use of cognitive tasks and VR are the independent variables and the perceived duration of the experiment (time) is the dependent variable. Obtained data suggest that being immersed in a VR experience results in 16.10% underestimation of time, while a non-VR experience results in 7.5% overestimation of time. Moreover, navigating mazes that involve a high cognitive load results in 6.45% underestimation of time. Finally, the combination of VR and high cognitive load (navigating the mazes without guiding lines in a VR experience) result in 22.18% underestimation of time. Finally, the implications of this research are discussed at the end of this thesis.

View record

A feasibility study of template-based subject-specific modelling and simulation of upper-airway complex (2017)

The upper-airway complex is involved in a number of life-sustaining functions, such as swallowing, speech, breathing and chewing. Disorders associated with these functions can dramatically reduce the life quality of the suffers. Biomechanical modelling is a useful tool that can bridge the gap between the human knowledge and medical data.When tailored to individual patients, biomechanical models can augment the imaging data, to enable computer-assisted diagnosis and treatment planning. This thesis introduces a model-registration framework for creating subject-specific models of the upper-airway complex based on 3D medical images.Our framework adapts a state-of-art comprehensive biomechanical model of head and neck, which represents the generic upper-airway anatomy and function. By morphing this functional template to subject-specific data, we create upper-airway models for particular individuals. In order to preserve the functionality of the comprehensive model, we introduce a multi-structure registration technique, which can maintain the spatial relationship between the template components, and preserve the regularity of the underlying mesh structures. The functional information, such as the muscle attachment positions, joint positions and biomechanical properties, is updated to stay relevant to the subject-specific model geometry. We demonstrate the functionality of our subject-specific models in the biomechanical simulations.Two illustrative case studies are presented. First, we apply our modelling methods to simulating the normal swallowing motion of a particular subject based on the kinematics (of the airway boundary, jaw and hyoid) extracted from dynamic 3D CT images. The results suggest that our model tracks the oropharyngeal motion well, but has limited ability to reproduce the hyolaryngeal movements of normal swallowing. Second, we create two speaker-specific models based on 3D MR images, and perform personalized speech simulations of the utterance ageese. The models reproduce the speech motion of the tongue and jaw recorded in tagged and cine MRI data with sub-voxel tracking error, predict the muscular coordinating patterns of the speech motion.This study demonstrates the feasibility of using template-based subject-specific modelling methods to facilitate personalized analysis of upper-airway functions. The proposed model-registration framework provides a foundation for developing a systematic and advanced subject-specific modelling platform.

View record

The voice box: a fast coupled vocal fold model for articulatory speech synthesis (2017)

Speech is unique to human beings as a means of communication and many efforts have been made towards understanding and characterizing speech. In particular, articulatory speech synthesis is a critical field of study as it works towards simulating the fundamental physical phenomena that underlines speech. Of the various components that constitute an articulatory speech synthesizer, vocal fold models play an important role as the source of the acoustic simulation. A balance between the simplicity and speed of lumped-element vocal fold models and the completeness and complexity of continuum-models is required to achieve time-efficient high-quality speech synthesis. In addition, most models of the vocal folds are seen in a vacuum without any coupling to the vocal tract model. This thesis aims to fill these lacunae in the field through two major contributions. We develop and implement a novel self-oscillating vocal-fold model, composed of an 1D unsteady fluid model loosely coupled with a 2D finite-element structural model. The flow model is capable of handling irregular geometries, different boundary conditions, closure of the glottis and unsteady flow states. A method for a fast decoupled solution of the flow equations that does not require the computation of the Jacobian matrix is provided. The simulation results are shown to agree with existing data in literature, and give realistic glottal pressure-velocity distributions, glottal width and glottal flow values. In addition, the model is more than order of magnitude faster than comparable 2D Navier-Stokes fluid solvers while better capturing transitional flow than simple Bernoulli-based flow models.Secondly, as an illustrative case study, we implement a complete articulatory speech synthesizer using our vocal fold model. This includes both lumped-element and continuum vocal fold models, a 2D finite-difference time-domain solver of the vocal tract, and a 1D tracheal model. A clear work flow is established to derive model components from experimental data or user-specified meshes, and run fully-coupled acoustic simulations. This leads to one of the few complete articulatory speech synthesizers in literature and a valuable tool for speech research to run time-efficient speech simulations, and thoroughly study the acoustic outcomes of model formulations.

View record

An assessment of lattice Boltzmann method for swallowing simulations (2016)

Lattice Boltzmann is a fixed grid particle based method originated from molecular dynamics which uses a kinetic-based approach to simulate fluid flows. The fixed grid nature and simplicity of lattice Boltzmann algorithm makes it an appealing approach for preliminary swallowing simulations. However, the issues of compressibility effect and boundary/initial condition implementation can be the source of instability and inaccuracy especially at high Reynolds simulations. The current work is an assessment of the lattice Boltzmann method with respect to high Reynolds number flow simulations, compressibility effect of the method, and the issue of boundary and initial condition implementation. Here we investigate the stability range of the lattice Boltzmann single relaxation and multi relaxation time models as well as the issue of consistent boundary/initial condition implementation. The superior stability of multi relaxation time (MRT) model is shown on the lid-driven cavity flow benchmark as a function of Reynolds number. The computational time required for the SRT model to simulate the li-driven cavity flow at Re=3200 is about 14 times higher than the MRT model and it’s shown that computational time is related to the third power of lattice resolution. It is suggested that single relaxation time model is inefficient for simulations with moderately high Reynolds number Re>1000 and the use of multi relaxation time model becomes necessary. Compressibility effect is the next topic of study where the incompressible lattice Boltzmann method is introduced. The compressibility error of the method surpasses the spatial discretization error and becomes the dominant source of error as the flow Reynolds number increases. It is shown on a 2D Womersley flow benchmark that the physical time step required for LBM is about 300 times larger than the physical time step of the finite volume implicit solver while generating results with the same order of accuracy at Re=2000. Due to the compressibility error inherent to the method, lattice Boltzmann is not recommended for preliminary swallowing simulations with high Reynolds number, since implicit time advancement methods can generate results with the same order of accuracy in noticeably less computational time.

View record

Towards understanding how Touch ID impacts users' authentication secrets selection for iPhone lock (2015)

Smartphones today store large amounts of data that can be confidential, private or sensitive. To protect such data, all mobile OSs have a phone lock mechanism, a mechanism that requires user authentication in order to access applications or data on the phone, while also allowing to keep data-at-rest encrypted with encryption key dependent on the authentication secret. Recently Apple has introduced Touch ID feature that allows to use a fingerprint-based authentication to unlock an iPhone. The intuition behind such technology was that its usability would motivate users to use stronger passwords for locking their devices without sacrificing usability substantially. To this date, it is not clear, however, if users take an advantage of Touch ID technology and if they, indeed, employ stronger authentication secrets. It is the main objective and the contribution of this work to fill this knowledge gap. In order to answer this question we conducted three user studies (a) an in- person survey with 90 subjects, (b) an interview study with 21 participants, and (c) an online survey with 374 subjects. Overall we found that users do not take an advantage of Touch ID and use weak authentication secrets, mainly PIN-codes, similarly to those users who do not have Touch ID sensor on their devices. To our surprise, we found that more than 30% of subjects in each group did not know that they could use alphanumeric passwords instead of four digits PIN-codes. Others stated that they adopted PIN-codes due to better usability in comparison to passwords. Most of the subjects agreed that Touch ID, indeed, offers usability benefits such as convenience, speed and ease of use. Finally, we found that there is a disconnect between users desires for security that their passcodes have to offer and the reality. In particular, only 12% of participants correctly estimated the security PIN-codes provide while the rest had unjustified expectations.

View record

Design of a Casual Video Authoring Interface Based on Navigation Behavior (2014)

We propose the use of a personal video navigation history, which records a user's viewing behaviour, as a basis for casual video editing and sharing. Our novel interaction supports users' navigation of previously-viewed intervals to construct new videos via simple playlists. The intervals in the history can be individually previewed and searched, filtered to identify frequently-viewed sections, and added to a playlist from which they can be refined and re-ordered to create new videos. Interval selection and playlist creation using a history-based interaction is compared to a more conventional filmstrip-based technique. We performed several user studies to evaluate the usability and performance of this method and found significant results indicating improvement in video interval search and selection.

View record

A pursuit method for video annotation (2013)

Video annotation is a process of describing or elaborating on objects or events represented in video. Part of this process involves time consuming manual interactions to define spatio-temporal entities - such as a region of interest within the video.This dissertation proposes a pursuit method for video annotation to quickly define a particular type of spatio-temporal entity known as a point- based path. A pursuit method is particularly suited to annotation contexts when a precise bounding region is not needed, such as when annotators draw attention to objects in consumer video.We demonstrate the validity of the pursuit method with measurements of both accuracy and annotation time when annotators create point-based paths. Annotator tool designers can now chose a pursuit method for suitable annotation contexts.

View record

Investigation of gesture control for articulatory speech synthesis with a bio-mechanical mapping layer (2012)

In the process of working with a real-time, gesture controlled speech and singing synthesizer used for musical performance, we have documented performer related issues and provided some suggestions that will serve to improve future work in the field from an engineering and technician's perspective. One particular, significant detrimental factor in the existing system is the sound quality caused by the limitations of the one-to-one kinematic mapping between the gesture input and output. In order to solve this a force activated bio-mechanical mapping layer was implemented to drive an articulatory synthesizer, and the results were and compared with the existing mapping system for the same task from both the performer and listener perspective. The results show that adding the complex, dynamic bio-mechanical mapping layer introduces more difficulty but allows a greater degree of expression to the performer that is consistent with existing work in the literature. However, to the novice listener, there is no significant difference in the intelligibility of the sound or the perceived quality. The results suggest that for browsing through a vowel space force and position input are comparable when considering output intelligibility alone but for expressivity a complex input may be more suitable.

View record

The effects of muscle aging on hyoid motion during swallowing: a study using a 3D biomechanical model (2012)

The ability to swallow is crucial in maintaining adequate nutrition. However, there is a high prevalence of dysphagia among the elderly and a high associated mortality rate. To study the various causes of the associated physiological changes, one must first understand the biomechanics of normal swallowing. However, functional studies of the anatomically complex head and neck region can prove to be difficult due to both technical and ethical reasons.To overcome the limitations of clinical studies, this thesis proposes the use of a 3D computer model for performing dynamic simulations. A state-of-the-art model of the hyolaryngeal complex was created for simulating swallowing-related motor tasks with a special focus on hyoid excursion since reduced hyoid motion is a major indicator of oropharyngeal dysphagia. The model was constructed using anatomical data for a male cadaver from the Visible Human Project and an open-source dynamic simulation platform, ArtiSynth.Hyoid motion data obtained from videofluoroscopy of subjects performing normal swallowing was applied to the model for inversely simulating the potential muscle activities of the extrinsic laryngeal muscles during hyoid excursion. Within a specific range, the model demonstrated the ability to reproduce realistic hyoid motion for swallowing. Selective usage of suprahyoid muscles was also examined and was found to be possible in achieving adequate hyoid excursion for successful swallows.Finally, this study investigated the relationship between muscle weakening and hyoid range of motion using the hyolaryngeal model. Loss of muscle strength is characteristic of the aging process. Simulation of the maximum hyoid displacement under various muscle conditions confirmed a nonlinear reduction in the hyoid motion range under a linear decline in muscle strength. With an assumed rate of muscle weakening, the proportion of hyoid range reduction was estimated for a person at various ages. The results suggest that severe muscle weakening might be required to reduce hyoid excursion sufficiently to impair swallowing to a significant degree.

View record

Unifying the social landscape with OpenMe (2012)

With the rapid rise of the popularity of online social networks (OSNs) in recent years, we have seen tremendous growth in the number of available OSNs. With newer OSNs attempting to draw users in by focussing on specific services or themes, it is becoming clearer that OSNs do not compete on the quality of their technology but rather the number of active users. This leads to vendor lock-in, which creates problems for users managing multiple OSNs or wanting to switch OSNs. Third party applications are often written to alleviate these problems but often find it difficult to deal with the differences between OSNs. These problems are made worse as we argue that a user will inevitably switch between many OSNs in his or her lifetime due to OSNs being incredibly fashionable things whose lifespan is dependent on social trends. Thus, these applications often only support a limited number of OSNs. This thesis examines how it is possible to help developers write apps that run against multiple OSNs. It describes the need for and presents a novel set of abstractions for apps to use to interface with OSNs. These abstractions are highly expressive, future proof, and removes the need for an app to know which OSNs it is running against. Two evaluations were done to determine the strength of these abstractions. The first evaluation analyzed the expressiveness of the abstractions while the latter analyzed the feasibility of the abstractions. The contributions of this thesis are a first step to better understanding how OSNs can be described at a high level.

View record

"pCubee: Evaluation of a Tangible Outward-facing Geometric Display" RT118576 (2011)

This thesis describes the evaluation of pCubee, a handheld outward-facing geometric display that supports high-quality visualization and tangible interaction with 3D content. Through reviewing existing literatures on 3D display technologies, we identified and examined important areas that have yet to be fully understood for outward-facing geometric displays. We investigated the performance of a dynamic visual calibration technique to compensate for tracking errors, and we demonstrated four novel interaction schemes afforded by tangible outward-facing geometric displays, including static content visualization, dynamic interaction with reactive virtual objects, scene navigation through display movements, and bimanual interaction. Two experiments were conducted to evaluate the impact of display seams and pCubee's potential in spatial reasoning tasks respectively. Two stimuli, a path-tracing visualization task and a 3D cube comparison task that was similar to a mental rotation task, were utilized in the experiments. In the first experiment, we discovered a significant effect on user performance in path-tracing that was dependent on the seam thickness. As seam size increased beyond a thickness threshold, subjects relied less on multiple screens and spent longer time to trace paths. In the second experiment, we found that subjects had significant preference for using the pCubee display compared to a desktop display setup when solving our cube comparison problem. Both time and accuracy using pCubee were as good as using a much larger, more familiar desktop display. This proved the utility of outward-facing geometric displays for spatial reasoning tasks. Our analysis and evaluation identified promising potential but current limitations of pCubee. The outcomes from our analysis can help to facilitate development and more systematic evaluations of similar displays in the future.

View record

Moving Target Selection in Interactive Video (2010)

In this thesis, we present the results of a user study that compares three different selection methods for moving targets in 1D and 2D space. The standard Chase-and-Click method involves pursuing an onscreen target with the mouse pointer and clicking on it once directly over it. The novel Click-to-Pause method involves first depressing the mouse button to pause all onscreen action, moving the cursor over the target and releasing the mouse button to select it. The Hybrid method combines the initial pursuit with the ability to pause the action by depressing the mouse button, affording an optimization of the point of interception. Our results show that the Click-to-Pause and Hybrid methods results in lower selection times than the Chase-and-Click method for small or fast targets, while the Click-to-Pause technique is the lowest overall for small-fast targets. We integrate the more practical Hybrid method into a multi-view video browser to enable the selection of hockey players in a pre-recorded hockey game. We demonstrate that the majority of correct player selections were performed while the video was paused and that our display method for extraneous information has no effect on selection task performance. We develop a kinematic model that is based on movement speed and direction in 1D as an adjustment to the effective width and distance of a target. Our studies show that target speed assists users when a target is approaching, up to a critical velocity where the direction is irrelevant and speed is entirely responsible for the index of difficulty. In addition, we suggest that existing linear and discrete models of human motor control are inadequate for modeling the selection of a moving target and recommend the minimum jerk law as a guide for measuring human motor acceleration. By combining our empirical results from moving target selection tasks in 1D with our theoretical model for motor control, we propose an extension to Fitts’ Law for moving targets in 2D polar space.

View record

 
 

If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.

 
 

Read tips on applying, reference letters, statement of interest, reaching out to prospective supervisors, interviews and more in our Application Guide!