Characterizing the relationship in social media between language and perspective on science-based reasoning as justification for belief
MetadataShow full item record
Beliefs that are not the result of science-based interpretation of evidence (e.g., belief in ghosts or belief that prayer is effective) are extremely common. Science enthusiasts have expressed interest in automatic detection of non-science-based claims. This thesis intends to provide some first steps toward a solution, specifically aimed at detecting Twitter users who are likely or unlikely to take a science-based perspective on all topics. As part of this thesis, a set a Twitter users was labeled as being either "pro-science" (i.e. as having the view that beliefs are rational if and only if they are in accord with science-based reasoning) or "non-pro-science" (i.e. as having the view that beliefs may be reasonable even if they are not in accord with science-based reasoning). Word frequency ratios relative to a neutral dataset, and a simple topic alignment technique, suggest considerable linguistic divergence between the pro-science and non-pro-science users. High accuracy logistic regression classification using linguistic features of users' recent tweets support that idea. Supervised classification experiments suggest that the pro-science and non-pro-science perspectives are not only detectable from linguistic features, but that they can be abstracted away from particular topics (i.e. that the pro-science and non-pro-science perspectives are not inherently topic-specific). Results from distantly supervised classification suggest that using easily acquired, weakly labeled data may be preferable to the much slower process of individually labeling data for some applications, despite the pronounced inferiority to the fully supervised approach in terms of accuracy. The best classifier obtained in this thesis has an accuracy of 93.9%.