Browsing by Subject "Feature engineering"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Predicting rental listing popularity : 2 Sigma connect Renthop(2017-05-15) Cai, Shiyao; Keitt, Timothy H.; Robbins, PaulRenting a perfect apartment can be a hassle. There are plenty of features people care about when it comes to finding the apartment, such as price, hardwood floor, dog park, laundry room, etc. Being able to predict people’s interest level on an apartment will help the rental agency better handle fraud control, identify potential listing quality issues, and allow owners and agents to understand renters’ needs and preferences. RentHop, an apartment search engine, along with 2 Sigma, introduced this multiple classification problem in the Kaggle community. It provides the opportunity to use owners’ data to predict the interest level of their apartments on its website. This report attempts to find a pattern of people’s interest level towards rental listing on the website using the dataset from the Kaggle competition. Multiple features are derived from the original dataset. Several common data mining and machine learning techniques are used to improve the accuracy of the predicting model. The final result is evaluated using Log loss function.Item Theory-guided data science : combining machine learning with domain expertise to predict springflow(2020-05-06) Pease, Emily Camille; Pierce, Suzanne Alise, 1969-Traditionally, science follows a theory-based approach through which physical equations are used to model natural phenomena. In this recent era of artificial intelligence and "big data", there is a shift into a new paradigm of scientific discovery. The paradigm of theory-guided data science (TGDS) enables scientists to perform data science modeling while retaining their domain expertise to produce informed results consistent with the physical system. Predicting springflow discharge from Comal Springs using machine learning was determined to be an appropriate case study. The Edwards Aquifer in central Texas serves as the primary water supply for over 1.5 million Texans, providing water for recreational activities, businesses, and down-stream users. Additionally, these waters serve as a home to many aquatic species, eight of which are endangered or threatened. Quantifying springflow is essential in regulating groundwater resources in the Edwards Aquifer, especially during drought conditions. Here, a theory-guided predictive machine learning model for springflow estimation at Comal Springs is developed. First, feature engineering is performed to discover relations between data available in the Edwards Aquifer region, selected through theory-guided parameter initialization. Next, multiple machine learning models were explored and tested in their ability to model a complex springs system. Finally, theory-guided refinement of data science outputs was performed to make the model results consistent with what is possible in nature.