Enhancing worker management and supporting external tasks in crowdsourced data labeling
dc.contributor.advisor | Lease, Matthew A. | |
dc.creator | Thapa, Sukanya | |
dc.date.accessioned | 2024-04-19T16:10:27Z | |
dc.date.available | 2024-04-19T16:10:27Z | |
dc.date.issued | 2023-12 | |
dc.date.submitted | December 2023 | |
dc.date.updated | 2024-04-19T16:10:27Z | |
dc.description.abstract | Human data labeling is key to training supervised machine learning (ML) models. We propose a new software infrastructure layer to augment capabilities of Amazon’s SageMaker Ground Truth (GT) data labeling platform. Whereas crowdsourced annotation via Amazon Mechanical Turk (MTurk) is well-established, Amazon’s more recent GT platform is less known but specifically designed to support ML annotation. Differentiating features include a curated “public crowd” sourced from MTurk, and integrating human labeling into Amazon’s broader SageMaker ML tool suite, which provides an end-to-end pipeline for training and deploying ML services. Key features of our software layer include: 1) continuous worker performance monitoring wrt. Requester gold labels; 2) automatically restricting task access when performance standards are not met; 3) geographic-based restriction of task access to US-based workers; and 4) the ability to conduct external tasks off-platform while sourcing workers from GT and continuing to use GT’s payment system. Our design seeks to streamline Requester experience with minimal changes, and to utilize a sustainable software design to ease long-term management, extension, and maintenance. More generally, design goals center on promoting efficient, user-friendly, and quality-focused data labeling with crowdsourced annotators. | |
dc.description.department | Information | |
dc.format.mimetype | application/pdf | |
dc.identifier.uri | ||
dc.identifier.uri | https://hdl.handle.net/2152/124875 | |
dc.identifier.uri | https://doi.org/10.26153/tsw/51477 | |
dc.language.iso | en | |
dc.subject | Crowd worker management | |
dc.subject | Amazon Mechanical Turk | |
dc.subject | Crowdsourced data labeling | |
dc.subject | Amazon SageMaker Ground Truth | |
dc.title | Enhancing worker management and supporting external tasks in crowdsourced data labeling | |
dc.type | Thesis | |
dc.type.material | text | |
thesis.degree.department | Information | |
thesis.degree.grantor | The University of Texas at Austin | |
thesis.degree.name | Master of Science in Information Studies |
Access full-text files
Original bundle
1 - 1 of 1