Click here to Skip to main content
15,891,607 members
Articles / Database Development / MongoDB

Calculating the Probability of Related Events Based on Bayes' Theorem Using MongoDB Aggregation Framework in C#

Rate me:
Please Sign up or sign in to vote.
2.71/5 (4 votes)
29 Jan 2016CPOL3 min read 13.2K   103   3   4
Calculating the Probability of Related Events Based on Bayes' Theorem Using MongoDB Aggregation Framework in C#

Introduction

MongoDb Aggregation Framework helps to analyse survey data. One of the most common analysis done on survey data is calculating the dependent probability of an event. In this article, how to calculate the probability of dependent events using Bayes' theorem is demonstrated by giving an example.

Background

Bayes' theorem is stated mathematically as the following equation:

P(A|X) = [P(X|A) P(A)]/P(X) = [P(X|A) P(A)] / [P(X|A) P(A) + P(X | ~A) P(~A)]

Where A and X are events.

  • P(A) and P(X) are the probabilities of A and X without regard to each other.
  • P(A | X), a conditional probability, is the probability of observing event A given that X is true.
  • P(X | A) is the probability of observing event X given that A is true.

Suppose we want to know the probability of having breast cancer given we have a positive test result.

i.e.,  A  =  the event of having breast cancer
       X  =  the event of testing positive

Using the Code

Let’s assume we have tested 200 volunteers of which some with preexisting condition. We maintained the records in MongoDB database as follows:

SQL
{ "_id" : ObjectId("56a92adab2326d187c099531"), "patientName" : "cxb yyy", _
"mammogramResult" : 0, "diagnosedBefore" : 0 }
{ "_id" : ObjectId("56a92adab2326d187c09953d"), "patientName" : "pxx yyy", _
"mammogramResult" : 0, "diagnosedBefore" : 1 }
{ "_id" : ObjectId("56a92adab2326d187c099540"), "patientName" : "sxx yyy", _
"mammogramResult" : 1, "diagnosedBefore" : 0 }
{ "_id" : ObjectId("56a92adab2326d187c09953d"), "patientName" : "pxx yyy", _
"mammogramResult" : 0, "diagnosedBefore" : 1 }
{ "_id" : ObjectId("56aa4cda358f09b19d447bab"), "patientName" : "fxx rrr", _
"mammogramResult" : 1, "diagnosedBefore" : 1 }

Based on the device manufacturer’s specification, let’s assume the mammogram has the following attributes:

  • 80% of mammograms detect breast cancer when it is there (and therefore 20% miss it).
  • 9.6% of mammograms detect breast cancer when it’s not there (and therefore 90.4% correctly return a negative result).

We want to know the probability of having a breast cancer given the mammogram test gave a positive result.

Let:

  • A: The event of having a breast cancer
  • X: The event of mammogram positive result
  • P(X)  is the probability of having  a positive mammogram test result
  • P(A) is the probability of having breast cancer
  • P(A | X) is the probability of having breast cancer given you a positive test result
  • P(X | A) is the probability of a positive test result given you have a breast cancer

From the given mammogram specification, we can have the following figures:

  • Mammogram true test positive result  =  80%
  • Mammogram false positive result = 20%
  • Mammogram true test negative result  = 94%
  • Mammogram false test negative result = 9.6% 

Let’s aggregate our sample data in survey collection (the database is called bayesdb):

C#
var connectionString = "mongodb://localhost:27017";
var client = new MongoClient(connectionString);
var db = client.GetDatabase("bayesdb");
var col = db.GetCollection<BsonDocument>("survey");
var surveyList = await col.Find(new BsonDocument()).ToListAsync();
Console.Write("Count of collected survey: {0}", surveyList.Count());
var totalSurveyCount = surveyList.Count();

Determine how many of those tested have preexisting condition of breast cancer. To figure out how many of those tested +ve and –ve from the survey, we use aggregation on mammogramResult.

C#
var previousDiagnosis = col.Aggregate()
            .Group(new BsonDocument { { "_id", "$diagnosedBefore" }
                                 , { "count", new BsonDocument("$sum", 1) } });
var results = await previousDiagnosis.ToListAsync();
var mammogramTestResults = await col.Aggregate()
                .Group(new BsonDocument { { "_id", "$mammogramResult" }
                                , { "count", new BsonDocument("$sum", 1) } }).ToListAsync();

From the survey data, we calculate the following parameters:
P(A): the probability of having breast cancer:

C#
//Pr(A) is the probability of having cancer  => cancerExistingConditionsCount/totalCount = ~ 1%
    var PreviousDiagnosisPositive = previousDiagnosis.First();
    int PositiveCount = (int)PreviousDiagnosisPositive["count"];
    double PrA = (double)PositiveCount / totalSurveyCount;

P(~A): the probability of NOT having cancer = 1-P(A)

C#
//Pr(A) is the probability of having cancer  => cancerExistingConditionsCount/totalCount = ~ 1%
    var PreviousDiagnosisPositive = previousDiagnosis.First();
    int PositiveCount = (int)PreviousDiagnosisPositive["count"];
    double PrA = (double)PositiveCount / totalSurveyCount;

P(X|A): the probability of getting positive result from those with preexisting conditions of breast cancer

C#
var PrXA = mammogramConstants.mammogramTP * PrA;

The struct mammogramConstants contains all the necessary constants.

C#
public struct mammogramConstants
{
    public const double mammogramTP = 0.800;
    public const double mammogramFN = 0.200;
    public const double mammogramFP = 0.096;
    public const double mammogramTN = 0.904;
}

P(X | ~A): the probability of getting positive result from those with no breast cancer (Probability of False Positive)

C#
var PrXnA = mammogramConstants.mammogramFP * PrnA;

P(X): the probability of getting any positive result (True Positive + False Positive)

i.e., P(X) = P(X | A) + P(X | ~A)

C#
var PrX = PrXA + PrXnA;

Finally:

P(A|X): the probability of having cancer given a positive test result

C#
var PrAX = (double)(mammogramConstants.mammogramTP * PrA) / PrX;

Show Result:

C#
Console.WriteLine("Based on the given survey data, we can see that having a positive result means:");
Console.WriteLine("you are {0: 0.00}% likely of having a cancer.", PrAX * 100);

Points of Interest

Even if our mammogram specification states that there is an 80% chance of having a breast cancer given a positive result, based on Bayes' theorem and the data in the survey collection, we can see that a positive result mean only ~7.76% chance of contracting breast cancer.

History

  • Version 2.0

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior) Comcast Cable
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionBayes Pin
Luc Pattyn30-Jan-16 4:33
sitebuilderLuc Pattyn30-Jan-16 4:33 
AnswerRe: Bayes Pin
Alazar 986058530-Jan-16 6:46
professionalAlazar 986058530-Jan-16 6:46 
GeneralRe: Bayes Pin
Luc Pattyn30-Jan-16 11:28
sitebuilderLuc Pattyn30-Jan-16 11:28 
QuestionHave you consider to post this as a tip? Pin
Nelek29-Jan-16 0:13
protectorNelek29-Jan-16 0:13 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.