Academia.eduAcademia.edu
Debnath, et al. Post-disaster Situational Analysis from WhatsApp Group Chats Post-disaster Situational Analysis from WhatsApp Group Chats of Emergency Response Providers Pragna Debnath Saniul Haque IIEST, Shibpur pragna.madhatter@gmail.com IIEST, Shibpur saniul.haque@outlook.com Somprakash Bandyopadhyay Siuli Roy Indian Institute of Management Calcutta somprakashb@gmail.com Heritage Institute of Technology, Kolkata siuli.roy@heritageit.edu ABSTRACT Use of social media has established itself as one of the important information carriers in the field of disaster management. However, use of Twitter and Facebook by victims, first responders and others generates information that is varied, unstructured and unreliable. On the other hand, NGOs, operating in the disaster area, are often involved in intra-organizational communication using messaging apps like WhatsApp, and their group interactions can help in gathering meaningful data for situational analysis and need assessment. Our focus is to automate the process of filtering relevant information, query-based clustering of pertinent information from a WhatsApp group conversation of a specific volunteer group, so that situation analysis and need assessment can be done more rapidly. We have evaluated our scheme using WhatsApp chat log of a medical volunteer group in two post-disaster scenarios and concluded that it can provide valuable insights about region-specific resource requirements and allocation for effective decision making. INTRODUCTION In today’s world, social media has established itself as one of the most important information carriers with a trend of diversification in the form of wiki, microblogging, Facebook, Twitter, and so on. It has been often observed in recent past that the very first indication about the occurrence of a disaster is found on various social media platforms such as Twitter, Facebook etc. Disaster responders as well as researchers have now started realizing the value of using social media’s knowledge for effective disaster response. More and more organizations are now tending to opt for it as a faster alternative to preliminary manual assessment of disaster situations. This has resulted in a widespread applicability of data mining tools in this field for extracting meaningful information from the huge amount of data generated in the social media during post disaster scenario. However, data mining tools don’t prove to be extremely effective in practical situations as was in the case of Ushahidi’s disaster relief campaign during the Haiti Earthquake of 2010 (Antoniou, 2013). It is often extremely difficult to extract meaningful, consistent and pertinent information from those unstructured and often unreliable social media posts. As a result, in spite of having access to huge crowd-sourced information on the event, disaster management authorities could not make use of them for effective subsequent decision making (White, 2014; Li and Goodchild, 2010). Usually, Facebook and Twitter are considered as social media tools that could be used for post-disaster situational analysis. Due to Facebook’s privacy settings, real-time information during the disaster is not always accessible. Twitter, on the other hand, is a more ready stream of continuous information – especially when tweets pertaining to a set of hash-tags (e.g.#NepalEarthquake) are looked for. But this huge data set suffers from the problem of both non-relevance and inaccuracy. It has been observed that only 5% of relevant tweets/posts are useful and others are basically RTs of news articles or opinionated posts rather than facts. On the other hand, NGOs operating in the disaster area have an organised approach to post-disaster work, and an analysis of their intra-organizational interactions through messaging apps such as WhatsApp can provide us Short Paper – Social Media Studies Proceedings of the ISCRAM 2016 Conference – Rio de Janeiro, Brazil, May 2016 Tapia, Antunes, Bañuls, Moore and Porto de Albuquerque, eds. Debnath, et al. Post-disaster Situational Analysis from WhatsApp Group Chats with a detailed overview of their situation, movement patterns and also help us keeping a log of their requirements and problems. Messaging apps like Line, WeChat (Chan 2015; Liu et al, 2015) have been used experimentally in south-eastern Asian countries. However, their applicability had been limited to facilitating emergency communications between the victims and the authorities. The contribution of our work has been to solve a part of a larger domain problem through the simple observation that textual intra-group conversations within disaster management organizations are unutilized powerhouses of information. If analyzed in a structured way, that could answer pertinent questions about a real disaster. Our aim is to devise an automated process to analyze the WhatsApp chat-log to do situational analysis. As an example, we have used WhatsApp chat-log of a medical NGO called “Doctors For You” during two post-disaster scenarios; (i) Nepal Earthquake and (ii) Chennai Flood. Some members of this team were working in the field while the other members remained in their home station, planning for resource deployment and other related work. Our analysis reveals that WhatsApp chat-log generated in the field can give valuable insight about region-specific resource needs and gaps in resource distribution, so that volunteers and other stakeholders at home station can take important decisions in real-time. COLLECTION OF DATA AND INITIAL ANALYSIS We have analyzed a log of WhatsApp messages exchanged among members of an India-based NGO named Doctors For You (DFY), who provide medical support to disaster victims. We have considered two cases: postNepal earthquake in April 2015 and Chennai Flood in December 2015. Group members include both volunteers placed in the field during the chat period, and volunteers at the home station coordinating the event. For initial analysis, we first took one set of the entire conversation starting from 27th April 2015 to 12th May, 2015 during Nepal earthquake. For simplicity, the images/videos shared during chat were ignored, and converted into a text file (sample shown in figure 1) for further analysis. 6:51PM, 29 Apr - Rajat: What is the status?A donor is ready to sponsor all expenses of the volunteers 6:58PM, 29 Apr - Dr. Ravikant: Mridul plz update 8:02PM, 29 Apr - G. Shandeepan: Hi everyone . I had a discussion with health department. They want the medicines coming from india to be pre approved 8:03PM, 29 Apr - Dr. Ravikant: Ok Figure 1. Sample Chat on WhatsApp The following preliminary observations are made based on the above data set: • The problem of inaccuracy was largely eliminated, considering the authenticity and credibility of the NGO group and its volunteers. • Since the data is in the text-messaging format, it is highly unstructured. • There is no clear categorization of the topics being discussed. For e.g. topic of conversation shifts from medicines to be sent to partnership troubles within the time-gap of a few minutes. • Agents regularly updated their locations. • First-hand information of various other organizations involved in disaster management is available which have not been covered by other media. • Requirements of materials and equipment and problems faced by the agents at a specific location were shared and updated regularly. • The subsequent text after one text is not always a reply to the immediate previous one; it can be a new discussion or a reply to a previous question. Through the analysis of WhatsApp chat-log, we wanted to find out the answers to the following set of questions. It is to be noted that the answers to these questions were not only relevant for that NGO but also for other organizations and the community at large: 1. 2. 3. 4. What are the places that the volunteers were visiting? What was the status of medicine and medical infrastructure? What are the common grievances the agents were reporting? What is the status of relief and rescue at specific places? Short Paper – Social Media Studies Proceedings of the ISCRAM 2016 Conference – Rio de Janeiro, Brazil, May 2016 Tapia, Antunes, Bañuls, Moore and Porto de Albuquerque, eds. Debnath, et al. Post-disaster Situational Analysis from WhatsApp Group Chats Initially, a manual analysis was done to see if the chat-log could answer the aforementioned questions. A manual analysis of the question #1 suggested that the DFY volunteers were visiting places like Nuwakot, Sindhupalchowk, Lalitpur, Dhadhing, Pitampura, Kathmandu and Gorkha districts in Nepal. For question #2, manual analysis revealed that DFY created a prioritized list of medicines which mostly consisted of Vitamins and Prenatal care medicines. Medical apparatus, like ventilators, C-arms were all ordered in due course. Tent shortage for field OPD was locally solved by borrowing some from a fellow organization, eg. Red Cross India. The nature of illnesses they were addressing were mostly care of pregnant woman, children with diarrhea, fever and headaches and illnesses caused by contamination of water. A manual analysis of question #3 was conclusive in realizing a few inefficiencies in management as well as government policies. There were problems with registration of external NGOs that resulted in a delay. Another source of problem was a government protocol that allowed only generic medicines to be used. In terms of management, a group of orthopedics coming from India could not be contacted for a long time after their arrival in Nepal. A manual analysis of question #4 reveals the existence of inaccessible areas, for example Gorkha. News updates about the kind of relief movements to such inaccessible places were discussed in the early days of the organization’s involvement in Nepal Earthquake management. News about hubs set up by ADRRN (Asian Disaster Reduction and Response Network) as well as setting up of a telemedicine unit was also shared. The same analysis was done based on the DFY group WhatsApp conversations during the Chennai Floods from 12th December, 2015 to 13th January, 2016 to test whether the same set of questions can be answered in a different case as well. Manual analysis of the chats on Chennai Floods revealed that the same set of 4 questions can be answered based on this dataset as well. In Chennai, the volunteers had worked mainly in the Cuddalore district of Tiruvannamalai region of Tamil Nadu that included villages like Thirivanur, Kulamjavadi etc. Medical infrastructure analysis revealed that they had worked in close proximity with Cipla and had received funds from the same for their health camps. Medicines required in the scene were prenatal vitamins like folic acid and protein supplements for pregnant women, ORS, Vitamin A and de-worming medicines. There was also a pressing need for a swift method of vector disease control in those regions. Hygiene and dignity kits were a requirement as well. There was no mention of a requirement of any infrastructural equipment like tents, tarpaulin sheets etc. The common grievance recorded was the political unrest in that region creating some problems in setting up camps. Other relief information on the disaster was not available, as the team was well involved in their specific kind of relief activities by that time. Thus, it was felt from the manual analysis that the WhatsApp chat-log was indeed helpful in providing some relevant information to the government authorities and NGOs working with disaster. However, due to the large volume of the entire chat-log (more than 10,000 lines of text in Nepal case), manually looking for answers to queries would not be possible unless some sort of automation is employed. For questions #1, all the chat lines that were geo-tagged were searched for, and once they were found, we looked for keywords that would suggest ‘arrival’ and ‘departure’. Similarly, for questions#2 and questions#4, we looked at medicine and relief related words in the specific text portions, and the locations next to them, and a list of that was made as well. To answer questions#3, we had to do sentiment analysis. It was also found that the entire chat contained a lot of unnecessary talk and also spam at times. So, the goal here is to find an optimum algorithm to filter these chats, remove the unnecessary spam, and divide it into clusters, each of which would answer one of the aforementioned questions. DESIGN APPROACH FOR AN AUTOMATED CHAT ANALYSIS SYSTEM Tools Used Simple Natural Language Processing techniques have been employed to skim out information from the raw body of text. The idea is to do a key word/phrase search that hints at a line of text if it is relevant to a particular query and include it in the final output. Special allowance has been made for the user, to zoom out into the full body of text to see the context where the string has been used. Short Paper – Social Media Studies Proceedings of the ISCRAM 2016 Conference – Rio de Janeiro, Brazil, May 2016 Tapia, Antunes, Bañuls, Moore and Porto de Albuquerque, eds. Debnath, et al. Post-disaster Situational Analysis from WhatsApp Group Chats The entirety of the package has been created using Python’s built in natural language processing toolkit (Bird and Loper, 2006) and corpora, mentionable among which is WordNet (Fellbaum and Miller, 1998). Apart from that, TextBlob’s efficient sentiment analysis tool has been used. Framework The working principle of our system is as follows; First, with the help of a semantic similarity function (Meng et al.2013), a dictionary with keywords/phrases pertaining to the general topic of a query was created. WordNet’s linked structure was exploited to derive a collection of closely related words, all being hypernyms or hyponyms of some core/root words that can be intuitively deemed as the root words for that topic of question. After that, a string-wise keyword matching algorithm picks out the lines that definitively have one or more of these keywords. Provisions are made to check surrounding, closely related strings as some information may not make sense without a context. That control is given completely to the user. A similar dictionary paired with the same string-picking algorithm has been created for the rescue and relief module. The location finder is a simple text based tool to bring out the parts of the text that denote displacement or transportation of participating volunteers. An intelligent charting of the common group of phrases found in such texts that definitely denote the displacement of a person from one place to the other was done manually. Such phrases are matched against lines of text using the same keyword matching algorithm and relevant strings are picked out. Sentiment analysis of the text reveals the difficulties faced by the volunteers working in the field in general. For example, a lack of support from local aids, unavailability of doctors, a lack in important resources etc. are picked up by sentiment analysis. It is seen that the reports of further damages as well as after-effects or a continuation of the same disaster can be easily found out with this algorithm and hence can also be used as an assessment tool. Experimental Results The conversations consist of meta-data in the form of date, time and user-names of the people conversing. An exhaustive list of the participants is made for ease of information retrieval by participant ID. A comparative study on the effectiveness of manual and automated text retrieval process for retrieving relevant results corresponding to a specific set of queries is presented in table 1, 2 and 3. We have assumed that manual analysis gives the accurate results. Here, the ultimate goal of the system is to retrieve a set of lines automatically based on topic-specific keyword search and to show that it is almost equal to the set of lines retrieved through manual analysis. We have used a parameter called “precision” to evaluate the effectiveness of the automated system which is illustrated below. Precision: In any NLP based automated system, there might be a possibility of underfitting and overfitting. Therefore, such systems might pick up few irrelevant texts apart from relevant texts or might miss few relevant texts (shown as Column B in tables 1-3). Irrelevant texts erroneously increase the total retrieved text count of the automated system and missing of relevant texts essentially decrease the total retrieved text count of the automated system (please refer A and B in tables 1-3). As a result, values calculated in the 4th column of tables 1-3 (calculated as C=B-A) may be either positive or negative. Now, we can define the precision (P) of the automated system as the % of relevant texts picked up by the automated system from the total number of texts retrieved by automated system. Precision is calculated as follows. P = 100 - (|C|/B*100) To retrieve medical information from the bulk conversation, we created a dictionary of medical terms and words semantically similar to medical science. A simple text search algorithm searches through the text and picks all medically relevant lines of conversation. Hence, it is possible to shorten a conversation into a chunk of text of more relevance. Short Paper – Social Media Studies Proceedings of the ISCRAM 2016 Conference – Rio de Janeiro, Brazil, May 2016 Tapia, Antunes, Bañuls, Moore and Porto de Albuquerque, eds. Debnath, et al. Sample# (lines of text selected for each sample) 1. 2. 3. 4. 5. (402) (634) (557) (333) (392) Post-disaster Situational Analysis from WhatsApp Group Chats Number of relevant texts picked up by Manual Analysis (A) Total number of texts picked up by Automated system(B) Number of irrelevant texts 1 picked up / relevant texts 2 missed by Automated system( C= B-A ) Precision of Automated system (P = 100(|C|/B*100)) (Approx.%) 38 27 13 14 31 27 14 12 9 22 -11 -13 -1 -5 -9 59 7 92 44 59 Table 1 – Comparison between manual and automated analysis for Medical Infrastructure Information Similarly, to retrieve information about relief and rescue situation, a dictionary is created consisting of rescue and relief mission terms and phrases that are seen to be occurring in the sample text. When the program is run over the entire batch of messages it returns relevant information very efficiently. This portion has a similar tally chat between the manual and the auto filtering (Table 2). Sample# (lines of text selected for each sample) 1. 2. 3. 4. 5. (402) (634) (557) (333) (392) Number of relevant texts picked up by Manual Analysis (A) Total number of texts picked up by Automated system(B) Number of irrelevant texts 1 picked up / relevant texts 2 missed by Automated system( C= B-A ) Precision of Automated system (P = 100(|C|/B*100)) (Approx. %) 10 2 2 0 2 11 3 3 0 2 1 1 1 0 91 67 67 100 Table 2 –Comparison between manual and automated analysis for Relief and Rescue Information Next, using the raw text as input and a list of phrases and clauses in English grammar used to indicate movement from one place to the other, transportation, translation and location etc., location and movement of volunteers can be retrieved. This analysed data-set created a repository of tracking information which has wide usage in disaster management situation (Table 3). We, then, categorise the entire chat corpus using a separate filter (the number of collaborators) apart from the first one (type of information). The distribution of each collaborator's "useful texts" (both in terms of the total quantity of texts contributed by that person compared to others and the specific category where majority contribution of text is made) gives an idea about the field of work of a collaborator, and the other members associated with him. Finally, we have used Textblob's sentiment analysis tool where the sentiment property acts on a sentence, and returns two values – polarity and subjectivity. The polarity score is a float within the range [-1.0, 1.0], where anything less than 0 suggests negative sentiment, while anything greater than 0 is positive sentiment. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective. The polarity value has been used to retrieve sentences which mostly include grievances of the DFY agents towards the general working conditions, or the general problems they have faced in their day-to-day activities. The polarity checker has been further used to form a repository of words common in those sentences which express negativity. 1 2 Number of irrelevant texts picked up will be positive assuming A is actual number of relevant texts Number of relevant texts missed will be negative assuming A is actual number of relevant texts Short Paper – Social Media Studies Proceedings of the ISCRAM 2016 Conference – Rio de Janeiro, Brazil, May 2016 Tapia, Antunes, Bañuls, Moore and Porto de Albuquerque, eds. Debnath, et al. Post-disaster Situational Analysis from WhatsApp Group Chats Sample# (lines of text selected for each sample) Number of relevant texts picked up by Manual Analysis (A) Total number of texts picked up by Automated system(B) Number of irrelevant texts 3 picked up / relevant texts 4 missed by Automated system( C= B-A ) Precision of Automated system (P = 100(|C|/B*100)) (Approx.) (402) (634) (557) (333) (392) 24 18 24 19 18 24 14 23 20 23 0 -4 -1 1 5 100 71 96 95 78 Table 3 –Comparison between manual and automated analysis for Location Finder Conclusion Our preliminary investigation indicates the usefulness of WhatsApp chat-log in analyzing the situation and assessing the need during a disaster. If WhatsApp chat-logs can be collected and analyzed from a large number of disaster management organizations, working at the same place at the same time, it will be helpful for government agencies or NGOs to monitor requirement as well as availability of different resources in real time. Other advantages include faster assessment due to unified analysis of every organization’s activities, emergency collaborations, mutual sharing of resources, knowledge about the sector wise leaders of every organization etc. The next step in the creation of a complete and functional package for disaster management organisations is to create a semi-supervised text analytics system, through the use of a suitable machine learning algorithm. Semisupervised learning is suitable because we have already created a labelled data-set of satisfactory accuracy. The principal goal here is to achieve a state of optimum collaboration between multiple parties that facilitate an allround improvement in the present world disaster management paradigm. REFERENCES 1. Antoniou, N. and Ciaramicoli, M. (2013) – Social Media in Disaster Cycle - Useful Tools or Mass Distraction? Presented at 64th International Astronautical Congress, Beijing, China. 2. Bird, S. and Loper, E. (2006) - NLTK: The Natural language Toolkit, ACL 2006, 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sydney, Australia, 17-21 July 2006. 3. Chan, J.C. (2015) - The Role of Social Media in Crisis Preparedness, Response and Recovery, VANGUARD Report. http://www.oecd.org/governance/risk/The%20role%20of%20Social%20media%20in%20crisis%20preparedness ,%20response%20and%20recovery.pdf 4. Fellbaum, C. and Miller, G. (1998) - WordNet: An Electronic Lexical Database (Language,Speech and Communication), MIT Press. 5. Li, Linna, Michael F. Goodchild, "The Role of Social Networks in Emergency Management: A Research Agenda", International Journal of Information Systems for Crisis Response and Management, 2(4), 49-59, October-December 2010. 6. Liu, C., Chen, T. and Wei, T. (2015) - Research of the Trait and Quality on Emergency WeChat Platform Based on Service Framework and Quality Gap ,The Open Cybernetics & Systemics Journal, 2015, 9, 1002-1007. 7. Meng, L., R. Huang and J. Gu, "A Review of Semantic Similarity Measures in WordNet",International Journal of Hybrid Information Technology Vol. 6, No. 1, January, 2013. 8. White, Eric T. ''The Application of Social Media in Disasters", International Institute of Global Resilience, August 14, 2014. 3 4 Number of irrelevant texts picked up will be positive assuming A is actual number of relevant texts Number of relevant texts missed will be negative assuming A is actual number of relevant texts Short Paper – Social Media Studies Proceedings of the ISCRAM 2016 Conference – Rio de Janeiro, Brazil, May 2016 Tapia, Antunes, Bañuls, Moore and Porto de Albuquerque, eds.