Data mining/text analytics for real-time insights, knowledge & collaboration

Background of the POC

Status meetings likely always have some role in every project, but with the right tools and technology, it is possible to dramatically reduce the volume, frequency, and length of these sit-downs while maximizing the productivity of the ones you can’t avoid.

“The status meeting is one of the most common, and most dreaded, offenders when it comes to the hours wasted each week in meetings.”

Despite groans and gripes, meetings are a necessity in virtually every project. In fact, even with so many other communication options like IM, email, texting, etc., 68 percent of enterprise workers still rely on meetings to communicate with other groups, and teams within the organization.

But, for many projects, things have simply gotten out of hand. Meeting mayhem has turned otherwise efficient, productive projects into black holes where hours mysteriously begin to disappear as soon as someone sends an invite. Nearly 60 percent of enterprise workers say wasteful meetings interfere with their productivity, forcing them to spend nearly 40 percent of their time in meetings, only about half of which is actually productive

Now almost everything is turned in virtual, communication over emails digital media is of utmost importance. Employees with diverse backgrounds have different interpretations of languages, signs, and other forms of communication.

62% of unstructured data is growing at an alarming rate

93% of all data in the digital world will be unstructured in 2022

Sending and receiving emails at work

  • A typical corporate user sends and receives about 110 messages daily.
  • The average number of corporate emails received is 75 per day.
  • The average number of legitimate corporate emails received daily is 62.
  • The average number of corporate span emails is 13 per day.

To be honest we may not be engaged with all 75 of those emails. We may even skim the many and flag a few for later. However, emails have an abundance of knowledge and information stored in it. Hence doing email data mining and text analysis provides needed insights to the stakeholders and also do away on major dependency on meetings/

What does Email analysis involve

The vast number of emails are studied and analyzed to spot and discover topics, patterns, keywords, and many other useful things in the emails. This helped to draw meanings, trends, patterns from emails supporting well-informed decisions.

Email mining can be used for many purposes:


Projects operate in highly complex communication channels with organization hierarchy that significantly vary in their complexity, turbulence, and munificence. Such variations in context have important implications for outcomes and practices. The emails present a wealth of untapped information, which sheds light on a number of key project variables of interest. Computational text analysis methods offer a highly generalizable means of tapping in order to generate objective project data.

Sources for analysis

In computing, a personal storage table (. pst) is an open proprietary file format used to store copies of messages, calendar events, and other items within Microsoft Exchange / Microsoft Outlook. Emails organize the bulk of all official communications in an organization. Email repositories are implicit warehouses of knowledge about projects, people and processes. There would be value in visualizing and discovering sentiments in Email data.

Proof of Concept

The solution briefed here was tested as a POC and found this very useful. This POC work involved extracting personal storage tables, to back up pst data, created by using Microsoft Office Outlook. Extract contacts, messages, appointments, notes, tasks, journal entries, and calendars from it and store it into the database. The summary consists of a list of action items extracted from the inbox message. An email analytics structure that combines data analytics and text-mining principles to mine email repositories for useful insights. The email text analysis is used to estimate the communication of interactions among the members. Also, it presented an interactive visualization system that objective to analyze the email data of the project

Development of Solution

The successful development of an intelligent dashboard requires the collaboration of two main stakeholders/actors: subject matter experts and text miners.

Insights are ;

·        Valuable information in the text through semantic search

·        Specific details about a certain subject

·        Mentions of specific things in the texts

·        Structural analysis

The process involved in the analysis is :

·        Information Extraction to get a hold on the unstructured text

·        Categorization – to classify the contents of the text and the role of its elements.

·        Clustering – to group pieces of content with similar or common elements

·       Visualization to streamline the presentation and perception of the results

·       Summarization – to form a concise presentation of what the text is about

One other point is most of the discussions on the status, MOM of meetings get communicated over the email. Using text analytics to include it in an automated dashboard to reflect the sentiments and discussions between the stakeholders, giving the list of meetings, actions, and possible status, the response time of the emails to various stakeholders, out of office notifications with info on backup SPOCs, etc.

User cases were developed along with SWOT analysis for each team: The next step was to build a solution to get all emails in a common PST file or the common location, periodically from everyone in the project. Convert emails to the tasks in JIRA and assign them to different teams.

Two dashboards were designed.

·       for each scrum team

·       for the project

And the contents of the dashboard are explained below

Categorization – classify the contents of the text and the role of its elements

·        Queries,

·        New Changes

·        Other Discussions

Response Time of the Emails

·        Average Response Time of emails of all stakeholders in the project

·        Average response time from the recipient perspective of all stakeholders in the project

List of completed meetings with a list of participants

·        Meetings, Participants, Actions and Status with the reminder to outstanding actions

·        Total Time Spent By individuals in meetings


·        Clustering to group pieces of content with similar or common elements. Possible number of escalations/issues and its details like the issue, raised by and discussion thread with the current status

Risk Management – Forecast of

·        Opportunities

·        Risks

·        Escalations

·        Issues

·        Key stakeholders ranked from happy to unhappy scale based on the email contents

·        Finally Trends and Themes that needs immediate attention from Project or Sr Management

Knowledge Management

·        Most used keywords and hyperlinks

Visualization to streamline the presentation and perception of the results

·        One Status – consolidated view developed from the status given by each project team member and it is presented as overall status

Tools Used

  • REST API, Outlook APIs, JIRA APIs
  • RapidMinor,
  • R, JAVA, Node.js and Python
  • XOBNI reports as a reference

With all of the tools and solutions, there’s no reason everyone needs to sit in a physical room or virtual room together to rehash the work they’ve done. The idea is to use this dashboard in real-time and also they can use or should be sharing creative ideas to resolve challenges, brainstorming, and collaborating.

This write-up introduces the POC and concepts used in POC presenting an implementation ecosystem, which is adopted for the POC.

Thank You.

References: 1234567

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s