Debnath, et al.
Post-disaster Situational Analysis from WhatsApp Group
Chats
Post-disaster Situational Analysis from
WhatsApp Group Chats of
Emergency Response Providers
Pragna Debnath
Saniul Haque
IIEST, Shibpur
pragna.madhatter@gmail.com
IIEST, Shibpur
saniul.haque@outlook.com
Somprakash Bandyopadhyay
Siuli Roy
Indian Institute of Management Calcutta
somprakashb@gmail.com
Heritage Institute of Technology, Kolkata
siuli.roy@heritageit.edu
ABSTRACT
Use of social media has established itself as one of the important information carriers in the field of disaster
management. However, use of Twitter and Facebook by victims, first responders and others generates
information that is varied, unstructured and unreliable. On the other hand, NGOs, operating in the disaster area,
are often involved in intra-organizational communication using messaging apps like WhatsApp, and their group
interactions can help in gathering meaningful data for situational analysis and need assessment. Our focus is to
automate the process of filtering relevant information, query-based clustering of pertinent information from a
WhatsApp group conversation of a specific volunteer group, so that situation analysis and need assessment can
be done more rapidly. We have evaluated our scheme using WhatsApp chat log of a medical volunteer group in
two post-disaster scenarios and concluded that it can provide valuable insights about region-specific resource
requirements and allocation for effective decision making.
INTRODUCTION
In today’s world, social media has established itself as one of the most important information carriers with a
trend of diversification in the form of wiki, microblogging, Facebook, Twitter, and so on. It has been often
observed in recent past that the very first indication about the occurrence of a disaster is found on various social
media platforms such as Twitter, Facebook etc. Disaster responders as well as researchers have now started
realizing the value of using social media’s knowledge for effective disaster response. More and more
organizations are now tending to opt for it as a faster alternative to preliminary manual assessment of disaster
situations. This has resulted in a widespread applicability of data mining tools in this field for extracting
meaningful information from the huge amount of data generated in the social media during post disaster
scenario. However, data mining tools don’t prove to be extremely effective in practical situations as was in the
case of Ushahidi’s disaster relief campaign during the Haiti Earthquake of 2010 (Antoniou, 2013). It is often
extremely difficult to extract meaningful, consistent and pertinent information from those unstructured and often
unreliable social media posts. As a result, in spite of having access to huge crowd-sourced information on the
event, disaster management authorities could not make use of them for effective subsequent decision making
(White, 2014; Li and Goodchild, 2010).
Usually, Facebook and Twitter are considered as social media tools that could be used for post-disaster
situational analysis. Due to Facebook’s privacy settings, real-time information during the disaster is not always
accessible. Twitter, on the other hand, is a more ready stream of continuous information – especially when
tweets pertaining to a set of hash-tags (e.g.#NepalEarthquake) are looked for. But this huge data set suffers from
the problem of both non-relevance and inaccuracy. It has been observed that only 5% of relevant tweets/posts
are useful and others are basically RTs of news articles or opinionated posts rather than facts.
On the other hand, NGOs operating in the disaster area have an organised approach to post-disaster work, and
an analysis of their intra-organizational interactions through messaging apps such as WhatsApp can provide us
Short Paper – Social Media Studies
Proceedings of the ISCRAM 2016 Conference – Rio de Janeiro, Brazil, May 2016
Tapia, Antunes, Bañuls, Moore and Porto de Albuquerque, eds.
Debnath, et al.
Post-disaster Situational Analysis from WhatsApp Group
Chats
with a detailed overview of their situation, movement patterns and also help us keeping a log of their
requirements and problems. Messaging apps like Line, WeChat (Chan 2015; Liu et al, 2015) have been used
experimentally in south-eastern Asian countries. However, their applicability had been limited to facilitating
emergency communications between the victims and the authorities. The contribution of our work has been to
solve a part of a larger domain problem through the simple observation that textual intra-group conversations
within disaster management organizations are unutilized powerhouses of information. If analyzed in a structured
way, that could answer pertinent questions about a real disaster. Our aim is to devise an automated process to
analyze the WhatsApp chat-log to do situational analysis. As an example, we have used WhatsApp chat-log of a
medical NGO called “Doctors For You” during two post-disaster scenarios; (i) Nepal Earthquake and (ii)
Chennai Flood. Some members of this team were working in the field while the other members remained in
their home station, planning for resource deployment and other related work. Our analysis reveals that
WhatsApp chat-log generated in the field can give valuable insight about region-specific resource needs and
gaps in resource distribution, so that volunteers and other stakeholders at home station can take important
decisions in real-time.
COLLECTION OF DATA AND INITIAL ANALYSIS
We have analyzed a log of WhatsApp messages exchanged among members of an India-based NGO named
Doctors For You (DFY), who provide medical support to disaster victims. We have considered two cases: postNepal earthquake in April 2015 and Chennai Flood in December 2015. Group members include both volunteers
placed in the field during the chat period, and volunteers at the home station coordinating the event. For initial
analysis, we first took one set of the entire conversation starting from 27th April 2015 to 12th May, 2015 during
Nepal earthquake. For simplicity, the images/videos shared during chat were ignored, and converted into a text
file (sample shown in figure 1) for further analysis.
6:51PM, 29 Apr - Rajat: What is the status?A donor is ready to sponsor all expenses of the volunteers
6:58PM, 29 Apr - Dr. Ravikant: Mridul plz update
8:02PM, 29 Apr - G. Shandeepan: Hi everyone . I had a discussion with health department. They want the
medicines coming from india to be pre approved
8:03PM, 29 Apr - Dr. Ravikant: Ok
Figure 1. Sample Chat on WhatsApp
The following preliminary observations are made based on the above data set:
• The problem of inaccuracy was largely eliminated, considering the authenticity and credibility of the
NGO group and its volunteers.
• Since the data is in the text-messaging format, it is highly unstructured.
• There is no clear categorization of the topics being discussed. For e.g. topic of conversation shifts from
medicines to be sent to partnership troubles within the time-gap of a few minutes.
• Agents regularly updated their locations.
• First-hand information of various other organizations involved in disaster management is available
which have not been covered by other media.
• Requirements of materials and equipment and problems faced by the agents at a specific location were
shared and updated regularly.
• The subsequent text after one text is not always a reply to the immediate previous one; it can be a new
discussion or a reply to a previous question.
Through the analysis of WhatsApp chat-log, we wanted to find out the answers to the following set of questions.
It is to be noted that the answers to these questions were not only relevant for that NGO but also for other
organizations and the community at large:
1.
2.
3.
4.
What are the places that the volunteers were visiting?
What was the status of medicine and medical infrastructure?
What are the common grievances the agents were reporting?
What is the status of relief and rescue at specific places?
Short Paper – Social Media Studies
Proceedings of the ISCRAM 2016 Conference – Rio de Janeiro, Brazil, May 2016
Tapia, Antunes, Bañuls, Moore and Porto de Albuquerque, eds.
Debnath, et al.
Post-disaster Situational Analysis from WhatsApp Group
Chats
Initially, a manual analysis was done to see if the chat-log could answer the aforementioned questions. A
manual analysis of the question #1 suggested that the DFY volunteers were visiting places like Nuwakot,
Sindhupalchowk, Lalitpur, Dhadhing, Pitampura, Kathmandu and Gorkha districts in Nepal. For question #2,
manual analysis revealed that DFY created a prioritized list of medicines which mostly consisted of Vitamins
and Prenatal care medicines. Medical apparatus, like ventilators, C-arms were all ordered in due course. Tent
shortage for field OPD was locally solved by borrowing some from a fellow organization, eg. Red Cross India.
The nature of illnesses they were addressing were mostly care of pregnant woman, children with diarrhea, fever
and headaches and illnesses caused by contamination of water.
A manual analysis of question #3 was conclusive in realizing a few inefficiencies in management as well as
government policies. There were problems with registration of external NGOs that resulted in a delay. Another
source of problem was a government protocol that allowed only generic medicines to be used. In terms of
management, a group of orthopedics coming from India could not be contacted for a long time after their arrival
in Nepal.
A manual analysis of question #4 reveals the existence of inaccessible areas, for example Gorkha. News updates
about the kind of relief movements to such inaccessible places were discussed in the early days of the
organization’s involvement in Nepal Earthquake management. News about hubs set up by ADRRN (Asian
Disaster Reduction and Response Network) as well as setting up of a telemedicine unit was also shared.
The same analysis was done based on the DFY group WhatsApp conversations during the Chennai Floods from
12th December, 2015 to 13th January, 2016 to test whether the same set of questions can be answered in a
different case as well. Manual analysis of the chats on Chennai Floods revealed that the same set of 4 questions
can be answered based on this dataset as well.
In Chennai, the volunteers had worked mainly in the Cuddalore district of Tiruvannamalai region of Tamil Nadu
that included villages like Thirivanur, Kulamjavadi etc. Medical infrastructure analysis revealed that they had
worked in close proximity with Cipla and had received funds from the same for their health camps. Medicines
required in the scene were prenatal vitamins like folic acid and protein supplements for pregnant women, ORS,
Vitamin A and de-worming medicines. There was also a pressing need for a swift method of vector disease
control in those regions. Hygiene and dignity kits were a requirement as well. There was no mention of a
requirement of any infrastructural equipment like tents, tarpaulin sheets etc. The common grievance recorded
was the political unrest in that region creating some problems in setting up camps. Other relief information on
the disaster was not available, as the team was well involved in their specific kind of relief activities by that
time.
Thus, it was felt from the manual analysis that the WhatsApp chat-log was indeed helpful in providing some
relevant information to the government authorities and NGOs working with disaster. However, due to the large
volume of the entire chat-log (more than 10,000 lines of text in Nepal case), manually looking for answers to
queries would not be possible unless some sort of automation is employed.
For questions #1, all the chat lines that were geo-tagged were searched for, and once they were found, we
looked for keywords that would suggest ‘arrival’ and ‘departure’. Similarly, for questions#2 and questions#4,
we looked at medicine and relief related words in the specific text portions, and the locations next to them, and a
list of that was made as well. To answer questions#3, we had to do sentiment analysis. It was also found that the
entire chat contained a lot of unnecessary talk and also spam at times. So, the goal here is to find an optimum
algorithm to filter these chats, remove the unnecessary spam, and divide it into clusters, each of which would
answer one of the aforementioned questions.
DESIGN APPROACH FOR AN AUTOMATED CHAT ANALYSIS SYSTEM
Tools Used
Simple Natural Language Processing techniques have been employed to skim out information from the raw
body of text. The idea is to do a key word/phrase search that hints at a line of text if it is relevant to a particular
query and include it in the final output. Special allowance has been made for the user, to zoom out into the full
body of text to see the context where the string has been used.
Short Paper – Social Media Studies
Proceedings of the ISCRAM 2016 Conference – Rio de Janeiro, Brazil, May 2016
Tapia, Antunes, Bañuls, Moore and Porto de Albuquerque, eds.
Debnath, et al.
Post-disaster Situational Analysis from WhatsApp Group
Chats
The entirety of the package has been created using Python’s built in natural language processing toolkit (Bird
and Loper, 2006) and corpora, mentionable among which is WordNet (Fellbaum and Miller, 1998). Apart from
that, TextBlob’s efficient sentiment analysis tool has been used.
Framework
The working principle of our system is as follows;
First, with the help of a semantic similarity function (Meng et al.2013), a dictionary with keywords/phrases
pertaining to the general topic of a query was created. WordNet’s linked structure was exploited to derive a
collection of closely related words, all being hypernyms or hyponyms of some core/root words that can be
intuitively deemed as the root words for that topic of question.
After that, a string-wise keyword matching algorithm picks out the lines that definitively have one or more of
these keywords. Provisions are made to check surrounding, closely related strings as some information may not
make sense without a context. That control is given completely to the user. A similar dictionary paired with the
same string-picking algorithm has been created for the rescue and relief module.
The location finder is a simple text based tool to bring out the parts of the text that denote displacement or
transportation of participating volunteers. An intelligent charting of the common group of phrases found in such
texts that definitely denote the displacement of a person from one place to the other was done manually. Such
phrases are matched against lines of text using the same keyword matching algorithm and relevant strings are
picked out. Sentiment analysis of the text reveals the difficulties faced by the volunteers working in the field in
general. For example, a lack of support from local aids, unavailability of doctors, a lack in important resources
etc. are picked up by sentiment analysis. It is seen that the reports of further damages as well as after-effects or a
continuation of the same disaster can be easily found out with this algorithm and hence can also be used as an
assessment tool.
Experimental Results
The conversations consist of meta-data in the form of date, time and user-names of the people conversing. An
exhaustive list of the participants is made for ease of information retrieval by participant ID.
A comparative study on the effectiveness of manual and automated text retrieval process for retrieving relevant
results corresponding to a specific set of queries is presented in table 1, 2 and 3. We have assumed that manual
analysis gives the accurate results. Here, the ultimate goal of the system is to retrieve a set of lines automatically
based on topic-specific keyword search and to show that it is almost equal to the set of lines retrieved through
manual analysis. We have used a parameter called “precision” to evaluate the effectiveness of the automated
system which is illustrated below.
Precision: In any NLP based automated system, there might be a possibility of underfitting and overfitting.
Therefore, such systems might pick up few irrelevant texts apart from relevant texts or might miss few relevant
texts (shown as Column B in tables 1-3). Irrelevant texts erroneously increase the total retrieved text count of
the automated system and missing of relevant texts essentially decrease the total retrieved text count of the
automated system (please refer A and B in tables 1-3). As a result, values calculated in the 4th column of tables
1-3 (calculated as C=B-A) may be either positive or negative. Now, we can define the precision (P) of the
automated system as the % of relevant texts picked up by the automated system from the total number of texts
retrieved by automated system. Precision is calculated as follows.
P = 100 - (|C|/B*100)
To retrieve medical information from the bulk conversation, we created a dictionary of medical terms and words
semantically similar to medical science. A simple text search algorithm searches through the text and picks all
medically relevant lines of conversation. Hence, it is possible to shorten a conversation into a chunk of text of
more relevance.
Short Paper – Social Media Studies
Proceedings of the ISCRAM 2016 Conference – Rio de Janeiro, Brazil, May 2016
Tapia, Antunes, Bañuls, Moore and Porto de Albuquerque, eds.
Debnath, et al.
Sample# (lines of
text selected for
each sample)
1.
2.
3.
4.
5.
(402)
(634)
(557)
(333)
(392)
Post-disaster Situational Analysis from WhatsApp Group
Chats
Number of
relevant texts
picked up by
Manual
Analysis (A)
Total number
of texts
picked up by
Automated
system(B)
Number of irrelevant texts
1
picked up / relevant texts
2
missed by
Automated system( C= B-A )
Precision of
Automated
system
(P = 100(|C|/B*100))
(Approx.%)
38
27
13
14
31
27
14
12
9
22
-11
-13
-1
-5
-9
59
7
92
44
59
Table 1 – Comparison between manual and automated analysis for Medical Infrastructure Information
Similarly, to retrieve information about relief and rescue situation, a dictionary is created consisting of rescue
and relief mission terms and phrases that are seen to be occurring in the sample text. When the program is run
over the entire batch of messages it returns relevant information very efficiently. This portion has a similar tally
chat between the manual and the auto filtering (Table 2).
Sample# (lines
of text
selected for
each sample)
1.
2.
3.
4.
5.
(402)
(634)
(557)
(333)
(392)
Number of
relevant texts
picked up by
Manual
Analysis (A)
Total number
of texts
picked up by
Automated
system(B)
Number of irrelevant texts
1
picked up / relevant texts
2
missed by
Automated system( C= B-A )
Precision of
Automated system
(P = 100(|C|/B*100))
(Approx. %)
10
2
2
0
2
11
3
3
0
2
1
1
1
0
91
67
67
100
Table 2 –Comparison between manual and automated analysis for Relief and Rescue Information
Next, using the raw text as input and a list of phrases and clauses in English grammar used to indicate
movement from one place to the other, transportation, translation and location etc., location and movement of
volunteers can be retrieved. This analysed data-set created a repository of tracking information which has wide
usage in disaster management situation (Table 3).
We, then, categorise the entire chat corpus using a separate filter (the number of collaborators) apart from the
first one (type of information). The distribution of each collaborator's "useful texts" (both in terms of the total
quantity of texts contributed by that person compared to others and the specific category where majority
contribution of text is made) gives an idea about the field of work of a collaborator, and the other members
associated with him.
Finally, we have used Textblob's sentiment analysis tool where the sentiment property acts on a sentence, and
returns two values – polarity and subjectivity. The polarity score is a float within the range [-1.0, 1.0], where
anything less than 0 suggests negative sentiment, while anything greater than 0 is positive sentiment. The
subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective. The
polarity value has been used to retrieve sentences which mostly include grievances of the DFY agents towards
the general working conditions, or the general problems they have faced in their day-to-day activities. The
polarity checker has been further used to form a repository of words common in those sentences which express
negativity.
1
2
Number of irrelevant texts picked up will be positive assuming A is actual number of relevant texts
Number of relevant texts missed will be negative assuming A is actual number of relevant texts
Short Paper – Social Media Studies
Proceedings of the ISCRAM 2016 Conference – Rio de Janeiro, Brazil, May 2016
Tapia, Antunes, Bañuls, Moore and Porto de Albuquerque, eds.
Debnath, et al.
Post-disaster Situational Analysis from WhatsApp Group
Chats
Sample# (lines
of text
selected for
each sample)
Number of
relevant texts
picked up by
Manual
Analysis (A)
Total number
of texts picked
up by
Automated
system(B)
Number of irrelevant texts
3
picked up / relevant texts
4
missed by
Automated system( C= B-A )
Precision of
Automated system
(P = 100(|C|/B*100))
(Approx.)
(402)
(634)
(557)
(333)
(392)
24
18
24
19
18
24
14
23
20
23
0
-4
-1
1
5
100
71
96
95
78
Table 3 –Comparison between manual and automated analysis for Location Finder
Conclusion
Our preliminary investigation indicates the usefulness of WhatsApp chat-log in analyzing the situation and
assessing the need during a disaster. If WhatsApp chat-logs can be collected and analyzed from a large number
of disaster management organizations, working at the same place at the same time, it will be helpful for
government agencies or NGOs to monitor requirement as well as availability of different resources in real time.
Other advantages include faster assessment due to unified analysis of every organization’s activities, emergency
collaborations, mutual sharing of resources, knowledge about the sector wise leaders of every organization etc.
The next step in the creation of a complete and functional package for disaster management organisations is to
create a semi-supervised text analytics system, through the use of a suitable machine learning algorithm. Semisupervised learning is suitable because we have already created a labelled data-set of satisfactory accuracy. The
principal goal here is to achieve a state of optimum collaboration between multiple parties that facilitate an allround improvement in the present world disaster management paradigm.
REFERENCES
1. Antoniou, N. and Ciaramicoli, M. (2013) – Social Media in Disaster Cycle - Useful Tools or Mass Distraction?
Presented at 64th International Astronautical Congress, Beijing, China.
2. Bird, S. and Loper, E. (2006) - NLTK: The Natural language Toolkit, ACL 2006, 21st International Conference
on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics,
Proceedings of the Conference, Sydney, Australia, 17-21 July 2006.
3. Chan, J.C. (2015) - The Role of Social Media in Crisis Preparedness, Response and Recovery, VANGUARD
Report.
http://www.oecd.org/governance/risk/The%20role%20of%20Social%20media%20in%20crisis%20preparedness
,%20response%20and%20recovery.pdf
4. Fellbaum, C. and Miller, G. (1998) - WordNet: An Electronic Lexical Database (Language,Speech and
Communication), MIT Press.
5. Li, Linna, Michael F. Goodchild, "The Role of Social Networks in Emergency Management: A Research
Agenda", International Journal of Information Systems for Crisis Response and Management, 2(4), 49-59,
October-December 2010.
6. Liu, C., Chen, T. and Wei, T. (2015) - Research of the Trait and Quality on Emergency WeChat Platform Based
on Service Framework and Quality Gap ,The Open Cybernetics & Systemics Journal, 2015, 9, 1002-1007.
7. Meng, L., R. Huang and J. Gu, "A Review of Semantic Similarity Measures in WordNet",International Journal
of Hybrid Information Technology Vol. 6, No. 1, January, 2013.
8. White, Eric T. ''The Application of Social Media in Disasters", International Institute of Global Resilience,
August 14, 2014.
3
4
Number of irrelevant texts picked up will be positive assuming A is actual number of relevant texts
Number of relevant texts missed will be negative assuming A is actual number of relevant texts
Short Paper – Social Media Studies
Proceedings of the ISCRAM 2016 Conference – Rio de Janeiro, Brazil, May 2016
Tapia, Antunes, Bañuls, Moore and Porto de Albuquerque, eds.