The UBC Social Landslides team comprises recent graduates Shengjie Zhang, Mariia Shubina, Badr Jaidi and Yiting Zhou, who completed the project as their capstone for their master of data science degrees.
The Social Landslides group used a natural language processing technique to train the program into significant results. In collaboration with Vancouver-based geosciences consulting firm BGC Engineering and landslide scientists at NASA, the students created a process for surveying Reddit news articles on landslides and automatically cataloguing them in a database.
The project is a part of NASA’s Cooperative Open Online Landslide Repository (COOLR), which works to collect data on landslides for research, scientific modelling and development of emergency response protocols.
Landslides are severely underreported, which makes it difficult to predict and prepare for future ones. COOLR aims to address this gap.
Since 2007, COOLR has been used to shape projects like NASA’s Landslide Hazard Assessment for Situational Awareness, which provides a real-time map of potential landslide hazards based on past data.
However, building COOLR’s database has been a time-consuming process. NASA has recruited volunteer "citizen scientists’’ to log landslide records for over a decade. The records have traditionally been checked manually by staff members at NASA Goddard Space Flight Center.
“They got more than 10,000 examples of landslides, but it’s still a tedious process,” said Jaidi. NASA’s landslides team has previously reported that the dataset has taken over a year and a half of pure cataloguing work to compile.
With the Social Landslides project, the students automate this laborious process: the student team’s system can run through a month’s worth of articles on landslides in 15 minutes.
During that time, the processing model filters out irrelevant articles and parses out relevant information, such as the location and date of the detected landslides, to add to the COOLR database.
Using previously-collected COOLR data, they match the language and situations provided in validated data sets.
This step is particularly important. Without the assistance of specialized computer language, articles referring to everything from drastic election wins (often referred to as ‘landslides’) to Pokemon moves (‘rock slides’) might be accidentally incorporated into the database.
The Social Landslides group used a natural language processing technique called named-entity recognition and “context matching” to train the machine into significant results. Using previously-collected COOLR data, they match the language and situations provided in validated data sets.
“It’s a very unique and very specific challenge,” Jaidi said. “All the techniques we had to use were custom-made.”
The team won the Overall Best Project and Faculty Choice Award for their capstone. Currently, the NASA team is reviewing the Social Landslides project to see how it might be incorporated into COOLR’s workflow.
In the meantime, the team’s project is publicly housed in a repository on the code hosting platform Github, which means the source code is free for anyone to use and improve.
“It’s an open source project, a proof of concept … it was made to be built on top of in the future,” said Jaidi.