Analysing Housing Price in Australia with Data Science Methods

KOU_Jiaying-Thesis_nosignature.pdf - Submitted Version (3MB) | Preview

Kou, Jiaying (2022) Analysing Housing Price in Australia with Data Science Methods. PhD thesis, Victoria University.


Housing market price prediction is a major and important challenge in economics. Since the 2008 global financial crisis, researchers, economists, and politicians around the world have increasingly drawn attention to the need of better understanding housing market behaviour, since the failure to predict housing market crisis ahead of time had led to catastrophic global damage. On the other hand, around the same time, we have seen the revolution of information technology and artificial intelligence in the last two decades. The advent of powerful cloud and high performance computing systems, big data, and advanced machine learning algorithms have demonstrated new applications and advantages in cutting-edge research and technology areas such as pattern recognition, bioinformatics, natural language processing, and product recommendation systems. Can we make the leap of improving our understanding of housing market behaviour by leveraging these recent advances in artificial intelligence and newly available big data? This is the main theme of the thesis. There is strong motivation to explore the application of data science methods, including new large datasets and advanced machine learning algorithm, to accelerate our understanding of housing market problems for the benefit of the common good. In order to understand housing market behaviour, we divide the problem into two major steps: first, to improve understanding of housing appraisal (at microlevel), which is to predict housing price at the point level given a fixed timeframe; second, to improve understanding of the trend prediction (at macro level), which is to predict the housing price trend for a specific place during a time interval. For these two major steps, we improve upon traditional economic modelling by: • Adding new, non-traditional variables/features to our models, such as location-based Point of Interests, regional economic clusters, qualitative index, searching index, and newspaper articles • Applying machine learning algorithms for data analysis, such as non-linear algorithms, K-Nearest-Neighbour, Support Vector Machine, Gradient Boost, and sentiment analysis Specifically, in Chapter 3, we focus on the development of Location-Based Social Network (LBSN) for our micro-level housing appraisal modelling. A good location goes beyond the direct benefits from its neighbourhood. By leveraging housing data, neighbourhood data, regional economic cluster data and demographic data, we build a housing appraisal model, named HNED. Unlike most previous statistical and machine learning based housing appraisal research, which limit their investigations to neighbourhoods within 1km radius of the house, we expand the investigation beyond the local neighbourhood and to the whole metropolitan area, by introducing the connection to significant influential economic nodes, which we term Regional Economic Clusters. Specifically, we introduce regional economic clusters within the metropolitan range into the housing appraisal model, such as the connection to CBD, workplace, or the convenience and quality of big shopping malls and university clusters. When used with the gradient boosting algorithm 2 XGBoost to perform housing price appraisal, HNED reached 0.88 in R . In addition, we found that the feature vector from Regional Economic Clusters alone reached 0.63 in R2, significantly higher than all traditional features. Chapter 3 focuses on the exploration and validation of HNED modelling. In Chapter 4 and Chapter 5, we focus on macro-level housing price trend prediction. We fill the gap between the traditional macro-level housing market modelling and new developments of the concept of irrationality in microeconomic theories, by collecting and analysing economic behavioural data, such as real estate opinions in local newspaper articles, and people’s web searching behaviour as captured by Google Trend Index. In Chapter 4, we discuss the usage of micro-level behavioural data for understanding macro-level housing market behaviour. We use sentiment analysis to examine local newspaper articles discussing real estate at a suburb level in inner-west Sydney, Australia. We then calculate the media sentiment index by using two different methods, and compare them with each other and the housing price index. The use of media sentiment index can serve as a finer-grained guiding tool to facilitate decision-making for home buyers, investors, researchers and policy makers. In Chapter 5, we discuss how new developments of behavioural economic theory indicate that the information from decision-making at the micro-level will bring a new solution to the age-old problem of economic forecasting. It provides the theoretical link between irrationality and big data methods. Specifically, Google Trend Index is included as a new variable in a time series auto-regression model to forecast housing market cycles. To summarise the contributions of the thesis, we conclude that this is a successful early attempt to study housing price problems using data science methods, by leveraging newly available data sets and applying novel machine learning methods. Specifically, location-based social data improves the housing appraisal modelling. Human behaviour for housing market is analysed by introducing local newspaper articles and Google Trend Index into the modelling and analysis.

Item type Thesis (PhD thesis)
Subjects Current > FOR (2020) Classification > 3801 Applied economics
Current > FOR (2020) Classification > 4605 Data management and data science
Current > FOR (2020) Classification > 4611 Machine learning
Current > Division/Research > Institute for Sustainable Industries and Liveable Cities
Keywords housing price, housing economics, data science, housing appraisal modelling, housing market, economic, computer science, machine learning, regional economic clusters, forecasting
Download/View statistics View download statistics for this item

Search Google Scholar

Repository staff login