Dun & Bradstreet - A Natural Language Interface

roscler

5.00/5 (4 votes)

Aug 1, 2013

CPOL

12 min read

26283

196

Expanding the power and utility of the D&B API with artificial intelligence and natural language processing

Download dnb-chatscript.zip - 3.6 KB

Live Demo

The Dun & Bradstreet natural language interface for the D&B Developer Sandbox is live now.

Click here now to use it.

How To Use The Demo

There is one extra step you have to do when querying the data set, you need to specify the company type you want to find in a separate entry box than the one you use for filter conditions. The final version will not have this quirk. I have created code in the past that successfully parses out complex elements like company types from a free form sentence. It's straight-forward but time consuming and there was not enough time to implement that feature. So for now, just remember to put the kind of company you are looking for in the Industry description entry box and the rest of your query that contains the filter conditions in the Filter Conditions entry box:

For example, don't try to enter the following into the Filter Conditions entry box:

"Accounting firms owned by women in Texas."

Instead put "Accounting" in the Industry code description entry box and "owned by women" in the Filter Conditions entry box.

Introduction

The Dun & Bradstreet API is a treasure trove of lucrative and useful business information. It is the seed of a whole new wave of commerce applications that can leverage valuable company data from the huge, detailed , and accurate databases Dun & Bradstreet maintains on millions of businesses world-wide. The trick is providing an intelligent interface that makes finding the information a user needs as effortless as possible.

Current interfaces require the user to fill out a complex form, wading through a sea of different fields across 10 different D&B databases, while having to know the correct values to enter for the various fields to find the companies they are looking for. Unfortunately this requires a lot of learning on the part of the user. Not only do they need to know how to map their query to the huge number of fields available, they also have to know the correct database(s) to use. Sadly, this leads to these common end results:

The user does not find what they want or find only a small subset of what they want
The user gives up in frustration believing incorrectly the data is not there
Once the user manages to make a few queries work, through hard work and a tedious iterative process of trial and error, they stop exploring the databases. This is a shame because they end up under-utilizing the wealth of data hidden in those databases and only methodically repeat the queries they managed to get working.

The cure for this situation is a natural language interface that does the heavy lifting in translating the user’s query to the results they desire. They know how to ask for the data they want, just not how to ask it in a way that maps easily to the D&B API. The solution is to provide an interface that does that mapping and translation work for them.

Time Constraints

NOTE: The application presented in this article is incomplete. There just wasn't enough time to do a full implementation given the contest deadline. Because of this I do not have time to show you the C# code that takes the conformed query expressions created by the chatbot and converts them to the database scan operation that retrieves the actual D&B data shown to the user. However, I have included as a download the chatbot source files. These files will show you exactly how I used the ChatScript chatbot engine to perform entity extraction from the user’s query. Later in this article I explain how those source files work and what I generally do to convert the chatbot output to D&B queries. An experienced C# programmer should be able to use this information to recreate the natural language interface demonstrated.

Limitations

The application also has some notable gaps in it's query handling capability due to time constraints. These deficiencies are not due to technical obstacles. They are relatively easy to solve and I have solved them in the past in other applications. Again, there just wasn't time but the final version will not have these limitations. Here’s a list of them so you know why certain queries do not work:

Date related queries
Numeric quantity related queries
Only AND logic is supported, not OR. You can not make queries like "Companies that are out of business or are bankrupt". You can make queries like "Companies that are in Texas, are bankrupt, and have legal troubles".
The many D&B data fields that just weren't implemented (See the available fields list below).

Even with these glaring omissions, the application still provides plenty of utility value and definitely shows the power and advantages of a natural language interface in this difficult query environment.

NOTE: You also cannot do queries that span databases. For example, you cannot look for bankrupt companies owned by women because that would require scanning the Public Records and Women Owned business databases. This is not a limitation of the application and is certainly not a limitation of the expansive and thorough full D&B data set. It is a limitation of the D&B Sandbox data set that fuels this demo application. That data set is a small subset of the full D&B data set. Unfortunately many of the companies in a particular sample database do not have corresponding records in the other sample databases. For example, many of the companies in the Minority owned business database do not have records in the Public Records database, making cross database scans impossible. The full D&B data set you get as a paying customer does not have this problem.

Available Field List

Here’s a list of fields you can query using the demo application:

City
State
Industry Description (Using the IndustryDesc1 field, or the Company name field if that field does not exist).
Congressional District
Bankruptcy Indicator
Out of Business Indicator
Debarment (forbidden from doing business with the Federal Government)
Businesses owned by Women, Veterans, Minorities, or the Disadvantaged
Energy efficient companies (“green” businesses)
Businesses with (or without) Legal Problems such as Lawsuits, Liens, or Judgments

The large number of other fields in the D&B data set would be easy to add to the system. Again, time was the limiting factor. But this is enough to show just how much easier it is to use simple English to query the D&B data set over existing approaches. Here is just a small set of example queries:

"Show me bankrupt companies in Texas"
"Give me a list of companies that have gone out of business in California"
"I want a list of businesses owned by Hispanic Americans in the 28th congressional district"
"What businesses are forbidden from working with the federal government?"
"Businesses that can work with the federal government that have had legal troubles"
"Businesses that can work with the federal government that have never had a lawsuit"
"Boat makers that are green companies". (Remember, "boat makers" would go in the Industry description entry box, and "that are green companies" would go in the Filter Conditions entry box. )

And a huge number of other queries since there is no specific syntax or formal structure you need to adopt when asking questions of the demo application.

Implementing the Natural Language Front End

Prerequisites:

First, install the D&B Developer Sandbox API as a service. This is very easy to do and there are a lot of detailed tutorials on how to do this like this one, so there is no need to reinvent the wheel by duplicating that information here. You can find the service root URL for the Developer Sandbox here.

You need to get a basic understanding of the powerful open source ChatScript chatbot engine. That software is the front-end to this demo application that converts plain English queries to an abbreviated command language used by the C# code to make actual queries against the D&B data set. It is the part of the application that does the natural language processing. Read this short ChatScript tutorial I wrote for the Azure Developer Challenge first. It will show you how to set up your own ChatScript server and teach you the basics of ChatScript scripting.

Entity Recognition using ChatScript

The problem that needs to be solved when parsing a user’s query is called entity recognition. We need to extract the discrete elements in the user’s input that map to actual fields in the D&B data set. Let’s take a simple query to begin with:

“Show me companies that have not gone bankrupt”

This plain language query maps to the concrete field named Bankruptcyindicator, but only when that field contains the letter “B”. In short, we want to convert that user input to this actionable condition:

BankruptcyIndicator == “B”

How do we do this using ChatScript? In the attached ChatScript source files you will find this rule that extracts this entity from a user query:

u: ( $bot=bankrup _~query_relation *~2 bankrupt* )
    ^QUERYOUTPUT('_0 (Bankruptcyindicator) (B))

The “u” character tells ChatScript that we want to respond to questions or statements. This element in the rule pattern:

$bot=bankrupt

Tells ChatScript that we want to select the chatbot named “bankrupt” from our set of currently active chatbots. This will be covered in a moment in the section titled Chatbot Gauntlet below. The next element in the first rule is:

_~query_relation

This tells Chatscript that we want to capture any word in the ~query_relation concept set. Here is the definition of that concept set:

concept: ~query_relation [not never less_than less_than_or_equal_to greater_than greater_than_or_equal_to]

If the element is found it is captured to a variable we can inspect later. If not, the rule will still match if the rest of the rule pattern matches. This element is used to capture logical relations that the user has specified. In our sample query above this would capture the word “not” in the query so we know the user wants companies that have not gone bankrupt.

Following that element in the match pattern is this:

*~2

This allows us to skip words between the query relation element and the next element in the match pattern, which is the stemmed word “bankrupt*”. What this element says is “skip up to 2 words between the element to the left of me (~query_relation) and the element to the right of me (bankrupt*), but it is OK if we don’t find any words between those two elements”. This pattern allows us to match phrases that have filler words between the query relation and bankrupt like:

not bankrupt (zero filler words)
not gone bankrupt (1 filler word “gone”)

The third and last element in out match pattern is:

bankrupt*

This element uses the * operator to stem the word. This tells ChatScript to match the word “bankrupt” or any word that starts with the letters in that word, such as “bankruptcy”. The net result of this rule is a flexible pattern matching operation that handles most of the ways a user will query for bankrupt companies.

The QueryOutput Macro

The second line of the bankruptcy entity extractor is this:

^QUERYOUTPUT('_0 (Bankruptcyindicator) (B))

In the attached download you will find this code in the control file:

# ======================== MACROS =====================

outputmacro: ^QUERYOUTPUT(^relphrase ^fields ^values)
	if(^relphrase)
	{
		$$rel = unknown
		if (^relphrase == not){$$rel = n}
		if (^relphrase == never){$$rel = n}
		if (^relphrase == less_than){$$rel = lt}
		if (^relphrase == less_than_or_equal_to){$$rel = lte}
		if (^relphrase == greater_than){$$rel = gt}
		if (^relphrase == greater_than_or_equal_to){$$rel = gte}
		if (^relphrase == equals){$$rel = eq}
		if (^relphrase == equal_to){$$rel = eq}
	}
	else
	{
		$$rel = "equals"
	}
	{$$rel ^fields : ^values}

QUERYOUTPUT is an outputmacro, which is a user defined function that produces text the chatbot will output in response to a query. The bulk of the code is devoted to taking the query relation found in the user’s input (if any), and converting it to a conformed string that our C# query processing code understands. In our sample query, this code would translate “not” to the single character “n”. Finally it creates the formatted output string that is passed on to the C# query processing code.

What does the formatted output look like after we’re done pre-processing the sample query? Let’s go back to the QUERYOUTPUT macro call found in the bankruptcy entity extractor:

^QUERYOUTPUT('_0 (Bankruptcyindicator) (B))

The first element is this:

'_0

It tells ChatScript to pass on the query relation found to QUERYOUTPUT. In this case it is the word “not” from the _~query_relation variable capture element in the match pattern. The second element is:

(Bankruptcyindicator)

As noted above, that is the name of the actual field found in the D&B dataset, the one we need to use when scanning the D&B PublicRecords database for bankrupt companies. So the second parameter in the call to QUERYOUTPUT is the field list. The third and last element is:

(B)

That is the field value we need to find in the Bankruptcyindicator field to select a bankrupt company for our search results. Let’s take a look again at the last line in the QUERYOUTPUT definition:

{$$rel ^fields : ^values}

Given the parameter values we have established above, the actual string that would be received by the C# post-processing code would be:

{n ( Bankruptcyindicator ): ( B )}

This tells the post-processing code that:

The logical relation is negation (not)
The field we need to compare against is the single field named Bankruptcyindicator
The single value we need to find to consider a company bankrupt is the letter “B”

Chatbot Rule Summary

The main point of each chatbot is to extract a single entity.
Each chatbot has one or more rules to accomplish this task.
The output of a matching rule is the concrete information the C# post-processing code needs to execute a database search, a triplet consisting of the logical relation, applicable database fields, and applicable field values for the listed database fields.

The Chatbot Entity Gauntlet

The final piece of the puzzle is the Chatbot Entity Gauntlet. There is a simple block of code in the C# post processing code that looks like this:

// Create the Chatbot Entity Extraction Gauntlet
List<string> listChatBotNames = new List<string>();

listChatBotNames.Add("cong");
listChatBotNames.Add("table");
listChatBotNames.Add("legal");
listChatBotNames.Add("outofb");
listChatBotNames.Add("bankrup");
listChatBotNames.Add("state");
listChatBotNames.Add("city");
listChatBotNames.Add("debar");
listChatBotNames.Add("minorit");
// Query each available sub-parser (named chatbot).
foreach (var chatBotName in listChatBotNames)
{
    // We have what we need.  Make the first request to the D&B ChatScript server.
    string strResponse = GetChatBotResponse(strMessage, strLoginName, chatBotName);
    // Accumulate responses and process.
    …
}

The code above has been excerpted and abbreviated for clarity. But as you can see, our ChatScript instance is queried one time for each entity currently supported that we need to extract. In our current example, when the “bankrupt” chatbot was selected the actual call would look like this:

GetChatBotResponse(“Show me companies that have not gone bankrupt”, “guest”, “bankrupt”);

The output from this call would be, as indicated above:

{n ( Bankruptcyindicator ): ( B )}

For other queries there will be different responses of course and for longer queries with multiple extractions, there will be multiple sub-query triplets like the one above. For example, this user query:

Show me companies that have not gone bankrupt in Texas and are not debarred

Would return the following sub-query triplets when the Chatbot gauntlet loop is done:

{n ( Bankruptcyindicator ): ( B )}

{n ( Debarrment ): ( Y )}

{unknown ( StateAbbrv ): Texas}

This list of triplets would be converted to the concrete database query that returns the search results to the user.

Conclusion

The Dun & Bradstreet API literally creates a whole new wave of exciting business applications. With its wealth of vital statistics on millions of businesses, it alone can be the source of many intriguing and lucrative software applications. When combined with other data sets, then only your imagination limits what can be done with it. From selecting specific classes of business to sell to or service, to creating maps that highlight companies of a service type, or creating maps for people who only want to engage with businesses that are environmentally friendly or have not had legal problems; a huge number of powerful new ideas for app development are waiting for you to explore.

NOTE: As indicated above, with the attached ChatScript source files an experienced C# developer should have everything they need to use the same techniques I did here to build their own natural language interface to the Dun & Bradstreet Developer Sandbox API.