A Stack to Build Collective Intelligence Technology

Publié dans Coder stories

29 janv. 2020

9min

A Stack to Build Collective Intelligence Technology
auteur.e
Gary McDowell

CTO @ Bluenove

What is collective intelligence? Frank Escoubes, co-founder of the technology and consulting company Bluenove, describes it thus: “Collective intelligence starts in conversation and ends in engagement. From ideas to action. It is an art, a sport, a science, a culture, with a bit of magic. Call it enlightenment.” Put in the simplest terms possible, it’s the idea that we are more intelligent together than as individuals.

Hopefully, your interest is piqued; mine was 6 months ago when I joined Bluenove as CTO. I’d never worked in the collective intelligence industry before because, probably like you, it isn’t that well known among the technical community yet, except for data scientists working on natural language processing (NLP) or semantic or lexicographic analysis. I saw a huge set of challenges in this space. Firstly, how do you communicate the result of a collective intelligence initiative? Especially when we’re potentially talking about something on a massive scale, such as the Great National Debate, which was organized by President Macron in France following the Yellow Vests protests. Also, collective intelligence cuts across every single sector, domain, and department.

So how do we achieve collective intelligence? Let’s talk tech. The Assembl software is at the heart of the digital solution and it’s now a very mature platform, built with Python at the back end, PostgreSQL for the database, and ReactJS at the front end.

More recently, we’ve started supplementing this stack with microservices that are dockerized and can therefore be deployed individually as part of a Swarm or in a Kubernetes cluster. The interface between the front and back ends is a combination of classic RESTful APIs and GraphQL.

Assembl

Although we use a fair number of open-source technologies to bring Assembl to life, there is a lot of bespoke code written to handle some of the concepts, data models, and domain principles when it comes to collective intelligence. As an example, we have a rich schema that models concepts such as themes, ideas, messages, reactions, sentiments, and the links between these things. At the heart of collective intelligence lies the identification of actionable ideas, and so, as you might imagine, some of the complexity is about understanding the origins of these ideas, how the subject matter evolved over time, and how to traverse this network. The Bluenove strategy is one that blends technology, methodology, and human intelligence. This means there is a lot of emphasis on how the different personas or actors need to interact with the system and, because of the diverse nature of the projects, the system has to be highly configurable. All of this adds up to a lot of logic and possible combinations, so testing becomes a very interesting challenge!

The basic setup of a consultation or debate using Assembl normally involves the creation of one of more phases via the administration interface. A phase is a defined period of time during which the debate will take place. Themes are attached to phases and are conceptually linked to one or more Ideas that you want to explore. Themes themselves can have sub-themes but eventually you end up at a module, which is responsible for the capture and analysis of information. Modules guide users in how to contribute in a structured fashion, which we’ll look at in more depth shortly, but first let’s pause to consider everything I just explained with an example.

I want to run a phase, from the beginning of November until the end of December—I’ve decided this period should be sufficient for me to gather enough contributions but not be too long, which could cause the participants to lose interest. During this phase, I decide that I want to explore several themes, in other words specific questions or topics I want people to explore. Finally, in order to capture the content of the participants and stimulate the conversation, I’m going to add one of our modules. We have modules for threads of conversations, surveys, votes, multi-columns, and Bright Mirror, where people can collaborate on creating design fiction.

Modules are where some of the smarts come into play, technology-wise. The thread module, for example, although classically Reddit in nature, allows people to share posts on social media, provide reactions, and translate into their local language, with every post being tied into the IBM Watson Natural Language Understanding API. This particular feature allows us to analyze more deeply the contribution someone makes, identifying sentiments, categories, concepts, entities, and keywords. Keywords, for example, allow you to create simple but effective word-cloud visualizations, but they also help with the creation of the summary—the extraction of problems and actionable solutions.

The Holmes microservice

Sorry there’s been no code up until now, so let’s rectify that! We’ve built a microservice called Holmes to communicate with the IBM Watson APIs (geddit, Sherlock fans?), and this is linked to a RabbitMQ message broker, as we’ve de-coupled the application (or module) logic from long-running, secondary, background async tasks. So the creation of a post in a thread, for example, might launch several secondary tasks, such as translation, NLP analysis, semantic analysis, and email notification to the post’s creator that someone has responded. None of these tasks should block the users from the act of making the post itself or other users from seeing the post.

For the microservice tech stack, we’ve chosen Node.js as our primary platform, but we also have the possibility to write in Python 3. In order to standardize the way we build microservices, we’ve created code generators. We use Yeoman for Node.js microservices and Cookiecutter for Python-based ones. The rationale is that a developer should not have to step out of their comfort zone in order to generate their microservice—or in other words, we don’t force a Python developer to install Node.js in order to generate their Python microservice. The benefit of the code generators is that all the boilerplate stuff is taken care of for me, so I can focus on business logic, unit testing, and shipping code.

We’re using the Natural Language Understanding API from IBM, which is very easy to get going with, especially if you use the IBM package for Node.js. You can, of course, do all this by hand using Fetch or SuperAgent to make API calls, as it’s a RESTful API.

import NaturalLanguageUnderstandingV1 from 'ibm-watson/natural-language-understanding/v1';

This will import the package. Next, we set up the object:

const naturalLanguageUnderstanding = new NaturalLanguageUnderstandingV1({    version: '2019-07-12',    iam_apikey: <YOURAPIKEY>,    url: 'https://gateway-lon.watsonplatform.net:443/natural-language-understanding/api',  });

Then we launch a request:

try {    result = await naturalLanguageUnderstanding.analyze(data);    } catch (err) {…    }

The input of the call, “data”, has come from our predefined contractual API between the microservice and the clients of the microservice. Here’s an example fragment:

{    "text": "I think we should not be going ahead with a no deal brexit",    "features": {        "emotion": {},        "relations": {},        "sentiment": {},        "syntax": {        "sentences": true        },        "semantic_roles": {},        "categories": {        "limit": 3        },        "concepts": {        "limit": 3        },     "keywords": {        "sentiment": true,        "emotion": true,        "limit": 3        },        "entities": {        "sentiment": true,        "limit": 1        }    }    }

And a response to this request might be:

{  "usage": {    "text_units": 1,    "text_characters": 58,    "features": 8  },  "syntax": {    "sentences": [    {        "text": "I think we should not be going ahead with a no deal brexit",        "location": [        0,        58        ]    }    ]  },  "sentiment": {    "document": {    "score": -0.769502,    "label": "negative"    }  },  "semantic_roles": [    {    "subject": {        "text": "I"    },    "sentence": "I think we should not be going ahead with a no deal brexit",    "object": {        "text": "we should not be going ahead with a no deal brexit"    },    "action": {        "verb": {        "text": "think",        "tense": "present"        },        "text": "think",        "normalized": "think"    }    }  ],  "relations": [],  "language": "en",  "keywords": [    {    "text": "deal brexit",    "sentiment": {        "score": -0.769502,        "label": "negative"    },    "relevance": 0.5,    "emotion": {        "sadness": 0.129986,        "joy": 0.315904,        "fear": 0.052876,        "disgust": 0.073678,        "anger": 0.069467    },    "count": 1    }  ],  "entities": [],  "emotion": {    "document": {    "emotion": {        "sadness": 0.129986,        "joy": 0.315904,        "fear": 0.052876,        "disgust": 0.073678,        "anger": 0.069467    }    }  },  "concepts": [],  "categories": [    {    "score": 0.610982,    "label": "/automotive and vehicles/cars/car culture"    },    {    "score": 0.603625,    "label": "/finance/financial news"    },    {    "score": 0.581126,    "label": "/sports/go kart"    }  ]}

The algorithms (the sexy NLP stuff)

As mentioned, at the heart of collective intelligence is the detection of actionable solutions, and the Bluenove approach is about blending technology, methodology, and human intelligence. So while our consultancy team is very good at identifying actionable solutions, we use NLP techniques to help this process.

At Bluenove, we have 3 different algorithms that we use at different times for different reasons. The first is a semantic analysis clustering algorithm for on-site live debates. This uses a combination of unsupervised and then supervised learning in order to group and classify responses into topics. So, after a question is put to a live audience and the responses start coming in, we launch the supervised learning, which allows us to create an initial clustering of responses. We can then typically move things around, clean up, and then move into unsupervised learning, with the algorithm using a refined set of boundaries as part of its ongoing classification.

The second algorithm—an NLP algorithm designed to find actionable solutions in text—is one we created more than 2 years ago with the company Big Datext. An actionable solution might be, in the simplest terms, “Il faut faire…” (or, “It is necessary to… ”), but obviously there are lots more operators than this. This algorithm is very good at precision.

In complementary terms, this year we developed, in conjunction with Inria, our newest algorithm, which again is an NLP algorithm designed to find actionable solutions in text. This one is complementary because it scores higher in terms of recall than the one created with Big Datext. It benefits from a very large training set and, although it was initially created with more than 160 indicators for actionable solutions, we have settled at about 79. What’s an indicator? As above, with “Il faut faire… ”, we have included terms such as “améliorer” (“improve”), “au lieu de” (“instead of”), “en conclusion” (“in conclusion”), and “on pourrait” (“we could”).

Generally, when evaluating the results of our algorithms, we use precision, recall, and F1-score (harmonic average of precision and recall) as part of our scoring. This is in fact why we use these 3 algorithms as part of our analysis rather than one or the other. There’s a nice article about this by Andreas Klintberg.

Conclusion

There are still lots of challenges in the field of collective intelligence. Engagement is a big one—how do you get people to participate, contribute, and return again and again? What’s the best way to identify the key performance indicators in a successful debate? Are they purely due to the size of the audience or is it more subtle than that?

It’s important to bear in mind that people do, in fact, still learn about something even if they are not contributing—page views are just as powerful as my contributions.

Helping people understand what happened is fundamental and another big challenge. Good ways to convey or explain what occurred during the debate include mind maps, infographics, storyboards, and knowledge trees.

Coming up with a transformation plan is something else to consider—”I ran the debate but what do I do next? How do I go about achieving all the things I need to do now?”

So it’s true to say collective intelligence is still evolving and probably always will be. After all, democracy doesn’t sit still either!

This article is part of Behind the Code, the media for developers, by developers. Discover more articles and videos by visiting Behind the Code!

Want to contribute? Get published!

Follow us on Twitter to stay tuned!

Illustration by Blok

Les thématiques abordées