Enhancing worker management and supporting external tasks in crowdsourced data labeling

dc.contributor.advisorLease, Matthew A.
dc.creatorThapa, Sukanya
dc.date.accessioned2024-04-19T16:10:27Z
dc.date.available2024-04-19T16:10:27Z
dc.date.issued2023-12
dc.date.submittedDecember 2023
dc.date.updated2024-04-19T16:10:27Z
dc.description.abstractHuman data labeling is key to training supervised machine learning (ML) models. We propose a new software infrastructure layer to augment capabilities of Amazon’s SageMaker Ground Truth (GT) data labeling platform. Whereas crowdsourced annotation via Amazon Mechanical Turk (MTurk) is well-established, Amazon’s more recent GT platform is less known but specifically designed to support ML annotation. Differentiating features include a curated “public crowd” sourced from MTurk, and integrating human labeling into Amazon’s broader SageMaker ML tool suite, which provides an end-to-end pipeline for training and deploying ML services. Key features of our software layer include: 1) continuous worker performance monitoring wrt. Requester gold labels; 2) automatically restricting task access when performance standards are not met; 3) geographic-based restriction of task access to US-based workers; and 4) the ability to conduct external tasks off-platform while sourcing workers from GT and continuing to use GT’s payment system. Our design seeks to streamline Requester experience with minimal changes, and to utilize a sustainable software design to ease long-term management, extension, and maintenance. More generally, design goals center on promoting efficient, user-friendly, and quality-focused data labeling with crowdsourced annotators.
dc.description.departmentInformation
dc.format.mimetypeapplication/pdf
dc.identifier.uri
dc.identifier.urihttps://hdl.handle.net/2152/124875
dc.identifier.urihttps://doi.org/10.26153/tsw/51477
dc.language.isoen
dc.subjectCrowd worker management
dc.subjectAmazon Mechanical Turk
dc.subjectCrowdsourced data labeling
dc.subjectAmazon SageMaker Ground Truth
dc.titleEnhancing worker management and supporting external tasks in crowdsourced data labeling
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentInformation
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.nameMaster of Science in Information Studies

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
THAPA-PRIMARY-2024-1.pdf
Size:
738.39 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.45 KB
Format:
Plain Text
Description: