Category Archives: Natural Language Processing

Finding Twitter Conversations

A Twitter conversation is a series of tweets that usually revolve around an initial tweet. The initial tweet is defined as the root of the conversation, and the rest of the tweets in the conversation would “stem” or “branch” from the root. Finding the root is relatively simple. For example, to find the root of a conversation (or conversations) about “warfarin” a keyword search is performed. Any tweet containing the word “warfarin” would be considered to be a conversation root. The trick is how to define and find “branching”. Branches are tweets that can form a coherent conversation when attached to the root.

Natural Language Cues

One technique is to look for linguistic and textual cues to indicate possible branches. Possible initial branches are replies to the root. Replies to the root tweet cannot be found directly. They can only be found through replies to the user that tweets the root (root user) through the @ symbol (or other variations). So a @root_user string would indicate a reply to the root user, which could be a reply to the root tweet or a reply to another one of the root user’s tweets. Also, any @ in the root tweet indicates that the root user is replying, to another user, with the root tweet. These users being replied to could also potentially write tweets related to the root tweet (reply-to tweets). Figure 1 below illustrates the different components of a twitter conversation.

Example Tweet Conversation-1
Figure 1: Example Twitter Conversation

To find whether a reply or a reply-to tweet is a branch of the root tweet (and therefore part of the conversation), further natural language analysis is required. Such tweets must form a coherent conversation with each other and with the root. The key for finding coherent conversations is to find terms that are possibly related that would act as links between the tweets. For example, a drug and an adverse drug event (ADE) could be related terms. Using such links as a starting point, dependency parsing algorithms would do the heavy lifting of connecting individual tweets into a conversation. The shortest dependency path measure [Liu and Chen 2013] (see figure 2 for example) could be used to find terms with a high probability of being related terms.

Figure 2: Example of a Shortest Dependency Path for a part of the Last Tweet in Figure 1

Tweet Networks

Reply and Reply-to tweet network extend beyond the root tweet as each reply and reply-to tweet to the root can have its own reply and reply-to tweets. Reply and Reply-to tweets can form interesting networks. Well connected users or central users in such a network can be very important parts of a conversation and deserve special attention. These networks are directed graphs where each node would represent a user, and each node would be a tweet pointing from the sending user towards the receiving user. There are two main measures of centrality for each user in a reply and reply-to directed user network. Closeness centrality measures the average shortest path between a user and all the other users in the network. Betweeness centrality measures the number of shortest paths where a user is in the shortest path between two users. It is usually expressed as a probability that a certain user will be in the shortest path between any two users in the network. Both measures point out users that maybe central and important part of a conversation. The tweets of these users could form the main parts of a conversation. Figure 3 illustrates one example of these networks.

Example Tweet Network
Figure 3: Twitter Users’ Tweet Directed Graph Network


Liu, Xiao; Chen, Hsinchun; AZDrugMiner: An Information Extraction System for Mining Patient-Reported Adverse Drug Events in Online Patient Forums, ICSH 2013, LNCS 8040, pp. 134-150, (2013)