Data Cleaning from Global Terrorism Database (GTD)
by Maria Jane Poncardas, 06/14/19
Data preprocessing:
The data was collected from an open-source database from the GLOBAL TERRORISM DATABASE website (https://www.start.umd.edu/gtd/) which has annual updates of worldwide terrorist events from 1970 to 2017. The data file was obtained in the website by hovering over “USING GTD” option and then select the “Download GTD”. On the webpage, the “Action” field has a drop-down menu which contains general inquiries and acquisition of GTD file. It contains comprehensive information, news sources and among others. The researchers only selected the most relevant information for the study such as:
Date
City and Province
Latitude and longitude
Attack Type:
Bombing/Explosion
Armed Assault
Hostage Taking (Kidnapping)
Assassination
Facility/Infrastructure Attack
Hijacking
Hostage Taking (Barricade Incident)
Unarmed Assault
Target type:
Military
Private Citizens & Property
Government
Business
Police
Transportation
Utilities
Educational Institution
Religious Figures/Institutions
Journalists & Media
Maritime
Telecommunication
Terrorists/Non-State Militia
NGO
Food or Water Supply
Tourists
Airports and Aircraft
Weapon details:
Firearms
Explosives
Incendiary
Melee
Chemical
Sabotage equipment
Doubted terrorism
Suicide
Infiltrator group name
Number of dead victims
Number of injured
DATA CLEANING:
The researchers filtered out data from non-Mindanao provinces or states such as Luzon, Visayas, and foreign localities.
Using reverse geocoder module in python, the latitude and longitude was utilized to augment information about its localities, i.e., city/municipality names and provinces.
The augmented cities and municipalities in a data frame was then concatenated to the GTD file to retain each terrorist activity information.
The Philippine shapefile was used as reference to extract capitalization errors, misspelled words, and typos of the cities and municipalities produced from the reverse geocoder.
(Cities not in shapefile indicates erroneous information generated)
The for loop returns the incorrect cities/municipalities and its coordinates.
Each incorrect cities/municipalities are manually verified through Google maps and Wikipedia since most of these municipalities and cities have been renamed or was carved out from another city/municipality
Once verified, all the data that contains common erroneous city/municipality name will be automatically corrected through this code:
Comments
Post a Comment