Introduction
“When will you or your team be able to complete this project or software or feature?” How often you listen above phrase from your bosses or project leads or managers? I do here them a lot and it’s always an annoying experience for me to give an estimate just by judgment or without any preparation. I always tried to find the answers in the books but estimation is difficult subject to understand. Estimation is science as well as an art.
Estimation is huge subject but in this article I only cover some of the most important aspects of the estimation. This article is for the people who are starting to put their foot steps into the estimation world. This article is for those people who want to answer their bosses about an estimate of the project they are going to develop based on the science and proven principles of estimation not by sole judgement.
By reading this article one will be able to estimate size, effort and schedule for future projects. A second objective of this article is to enable the reader to develop one’s organization/group projects matrices. In this article I first discuss the challenges or the problems which one can face during estimation then estimation purpose and some misconception about estimation. After that I will discuss the benefits of estimation and then discuss what constitute an estimate. After that I will discuss briefly the kind techniques one can use for estimation. Next I will discuss my estimation experience and walk you through the estimation for two of my past projects. Finally I discuss about the matrices and historical data.
Challenges in estimation
When estimating any software project we omit a lot of factors which may affect one’s overall estimate. Therefore before giving an estimate one should consider these factors.
Personnel experience
Individuals involve in the projects effect the estimate. Most experience developers perform better than junior developers. Therefore knowing the persons who will actually work or perform the task during the project execution is of very importance while estimating.
Project size and type of project
Project size also affects the estimation process. One cannot simply use the previous experience of small projects to estimate the large projects. Studies have shown effort does not scale up linearly but exponentially. This is because larger projects require more integration effort between the components. In large project we need larger teams and the size of the team also affects the total effort because in large teams we have to deal with the problem of communication among the team members. Hence in larger teams new efforts such as communication and coordination are added which increases the total effort require to complete the project.
Type of the project is another important factor. If an organization have experience developing desktop application they will take more time to complete a web application. Therefore when you develop matrices or historical data and making estimation do take into account the type of the project.
Level of uncertainty
In every project there are many phases or stages during the development. These are from inception to final production output. These stages may be inception, requirements, design, development, testing and then delivery to production environment. Now in the very start of the project there is too much uncertainty about the final product or what will be the end product. Because requirements may change during any stage of the development or new parameter or new problems can arrive at any stage of the development. Due to this phenomena during estimation process if you estimate about the project at the start of the project that estimate will be less accurate when compare to the estimate at the later stages or final stages.
Uncertainty level decreases as the project progresses from inception phase to implementation phase. If one re-estimates their project during the later stages of the development they can get a better estimate than their initial estimates.
Different Assumptions
Another factor to take into consideration is programming language used because some programming supports a large eco-system. For some programming languages there are many tools available which are not available for others. If you choose a programming language that is supported by a large community then you will have more tools available and more help is readily available. This community support greatly enhance the productivity of the development team.
Similarly when estimating the project one must take into consideration the time available for the developers. In one project when we estimate we must take into account either developer are available the whole time or they will be engaged in other projects during the development.
Estimation Purpose and Approaches
There are two situations in which one has to estimate the project. First situation is that your boss come to you and gives you a feature list or requirements list. After that you are asked to give the total time required to implement all of these features. You estimate the project and give your estimates to your boss. Boss take these estimate and put into a larger sub-total for a large project. In this situation when you are estimating you do not have any time limit. In this estimate you are not kept under-pressure for deadline. This type situation happens very rarely.
In second situation you have a deadline and a list of features to implement. Your task is to develop an estimate for all the features that you can give before the deadline. We face most of the time this second situation. There may be times when with all resources available one cannot deliver the output on or before the deadline. In this situation one has to negotiate either the deadline or the number of features for implementation.
Estimation alone cannot guarantee you the project completion at the committed date. One need project control and good project management skills to complete the project according to the estimate. Therefore in software project management estimation is just one part and just help out in planning.
Benefit from Estimation
When you have the estimation you have a framework to control the project. One can measure their performance at any time during the project. Estimation allows for better resource allocation. Estimation allows basic foundation to develop plans upon. Good estimation provides the basis for better risk management and high project progress visibility.
What we do in estimation?
When one is given the task for estimation one has to estimate three things size, effort and schedule. One can adopt one or many techniques to calculate of each of these estimates. It is a better idea to go for many techniques while estimating. This will help you to sanity check your estimates. Because if two techniques give estimates which are very distant from each another then it means there is some wrong assumption on which one is making estimates.
In estimation first we estimate the size. This is because most of the effort or schedule equations require providing them with the size parameter. Size can either be in LOC (Lines of Code) or in function points. These are two major units for calculating the software size. Since at the start of the project one cannot have the LOC therefore Function Points calculation is used. From function points we can estimate the LOC required for implementing function points. Function points to LOC calculation is based upon the programming language you are using. For example to implement one function point in C# one needs 40 to 80 Lines of code or 55 Lines of code as median value. There are other units for calculating the size estimation and examples of these are: no of user stories, no of use cases and no of web pages.
Effort is directly calculated from the size of the software. There are many models for calculating the effort. Effort is time required for one person to complete the project. Effort can be calculated in staff-month, staff-week or person-hour. Here week and month are dependent upon the company policy. For example in my organization there are 32 hours in each week. If there are more than one person in the team then you have to account for that too because it will definitely reduce the schedule.
After the estimation one can devise up the schedule for completing the project. There are also models and techniques for estimating the schedule for a software project. Schedule is calculated in either months or weeks. There is one major point when working on schedule estimation. One can ask how much we compress the schedule by adding more and more resources. But in software project estimation one cannot compress to infinite. Take for example if 4 people require 4 month to complete a project therefore 40 people require 1 month to complete the project similarly in this way 800 people can complete the project in one day! This is not possible. Hence one cannot compress the schedule beyond a certain point. That point and beyond is called the impossible zone. According to a research one cannot compress the schedule beyond 25% of its nominal schedule. The reason for this schedule compression limit is that in order to reduce the schedule one has to increase the number of developers and this will increase communication problem among the team members. Another reason is inn shorter amount of time one have complete the tasks in parallel and if that task depend upon each other then error produce in dependent task will propagate the error.
Now let's discuss the estimation process or flow. For the sake of simplicity the flow of the estimation is:
Size ---> Effort ---> Schedule
First estimate the size and based on the size one can estimate the effort and then from effort one can easily calculate the schedule. The reader see the application for this process in the upcoming examples.
What are different types of techniques?
There are two categories of techniques for estimation. Techniques in the first category are based upon the scientific methods such as COCOMO. In scientific methods we have the mathematical equations and from those equations one calculates the effort and schedule. Second category is based upon the empirical methods. Examples are expert judgment and informal comparison to past projects. Software based tools used scientific methods and used historical data for calibration of these scientific methods.
Some techniques are good for sequential and some estimation techniques are good for iteration based development model. Similarly some techniques are tuned for use in the start of the project and other is used late in the project. Some techniques also depend upon the size of the project different techniques for large project, medium project and other for small projects. Since description of these techniques are out of bound for this article therefore it is leftover to reader to explore.
Example estimation for Two Projects
To better understand the estimation process and some estimation concepts I take two of my previous projects. One good thing about these projects is that I have maintained the log of each project. For example for each project I have a log file like this:
Date, Time given to project, Project related task description
e.g:30th April, 2 hours, implement file parsing for project A (Name of the project)
I record all this log in my personal log journal.
With the help of this log I can easily calculate the total time to complete up the project.
I take two projects and from the description of their features I calculate the total function points to complete the project. I used function points because it is easier to drive function points from the features list and requirements document. From these function points I calculate the expected number of lines to implement these function points using C#. After that I compare it with the actual no of lines to complete the project.
In this way I estimate the size and then compare it with the actual size in SLOC(Source lines of code). I found very astonishing similarities. In this way I estimate the effort and schedule and then compare it with the actual effort and schedule to complete the project. This study gives me a lot of awareness for the estimation techniques.
Therefore I am sharing my experience for each project here. For protecting organization data I cannot disclose the real name of the projects here therefore I have used two other names: Project A and Project B.
(1) Estimation for Project A
Now I will estimate the size, effort and schedule from the requirements or feature list of the project A. After that I will compare this estimate with the actual size, effort and schedule. In this way I will show the reader that how the estimation work and how you can compare it after you have completed the project.
SIZE for Project A
As discussed earlier I have used function points for estimating the size. Briefly Project A is a data acquisition and processing software. It displays acquired data on the screen.
In function points estimation we have to take into accounts all external inputs, external outputs, internal logical files, external interface file and external queries. Now for project A
External Inputs | Complexity |
Data Stream | High |
Configuration files | Low |
User Selection | Medium |
User Input | Low |
As you can see each parameter is assigned a complexity value which is either high, low or medium. One can give this value by determining the complexity that will be tackle when writing the code for that particular parameter. Similarly we estimate the external output
External Output | Complexity |
Telemetry Data1 screen | Medium |
Telemetry timing screen | Low |
Telemetry Data2 Screen | Medium |
Status Screen | Medium |
and now internal Logical Files
Internal Logical Files | Complexity |
Storage file | Medium |
Intermediate buffer | High |
Intermediate Result | Low |
Channel Files |
Low |
Now the external interface files
External Interface Files | Complexity |
External Interface for Data2 | Low |
External Queries
Not Any.
Now as you can see that all of this files and inputs grading is only possible when you have carefully analyze the specification or requirements. If one miss any of the important parameter then that will result in an error in the estimate. Therefore function points estimates are more accurate when performed by an experience developer.
After identifying all possible parameters one can calculate the function points as follows:
Parameters | Low | Medium | High |
External Inputs | 2*3=6 | 1*4=4 | 1*6=6 |
External Outputs | 1*4=4 | 3*5=15 | 0*7=0 |
External Queries | 0*3=0 | 0*4=0 | 0*6=0 |
Internal Logical Files | 2*7=14 | 1*10=10 | 1*15=15 |
External Interface Files | 1*5=5 | 0*7=0 | 0*10=0 |
Here first number in the multiplication (bold one) is the count parameter in that category and the second number is a fixed number given by the Function Points method itself.
If we add up the result of each we get the value of 79. This is called the un-adjusted FPs value. Either we can take this un-adjusted function point value as size or we adjust it using a multiplier. This multiplier may come from the past project i.e. historical data or from the type of project for which you are estimating. I will try with 3 multipliers as shown below:
Multiplier | Adjusted Function Point |
1 | 79 |
1.2 | 94.8 |
0.8 | 63.2 |
Now one can choose any multiplier and then multiply it with the un-adjusted FPs. This will give us the Adjusted function point count.
There is table which can convert the function point’s calculation to Source Lines of Code (SLOC) measure. For each function to implement one need 40 to 80 Lines of C# code and median value is 55.
Adjusted Function Points | Median C# LOC | Range of C# LOC |
79 | 4345 | 3160 to 6320 |
94.8 | 5214 | 3792 to 7584 |
63.2 | 3476 | 2523 to 5056 |
Hence in this way I estimated the size of the project A. Now let’s check it with the reality. Are these numbers any closer to reality? Since I have the actual code of the project A software I can count the actual number of lines of code for the software.
I used a utility for LOC calculation. I have not counted the white space and the designer generated code but I do count the comments because a lot of effort was invested in writing good comments. Therefore in actual code I have got 4700 lines of code which is very near to estimated size of 4345. One can see that this estimation is very close to the actual lines of code written.
Hence I consider my Multiplier for adjusting the function points as 1 and make a record of it in my historical data.
EFFORT for Project A
For effort estimation I used the ISBSG(International Software Bench-marking Standards Group ) method. ISBSG takes function points as inputs and number of staff to complete the project and return the effort estimate. Since I have estimated the size of the project A and there is only person doing the job I can estimate the required effort for completing project A.
ISBSG (Using Equation for Desktop software)
ISBSG Effort = 0.157 * functionPoints ^ 0.591 * MaximumTeamSize ^ 0.810
ISBSG Effort = 0.157 * 79 ^ 0.591 * 1^0.810 <br /> = 2.07 staff-month
In reality from my log register I can calculate the actual effort which is:
Actual effort = 2.2045 staff-month = 291 men-hours (132 hours in one staff months)
Hence one can see there is only 6% error in estimation.
SCHEDULE for Project A
One can calculate using basic schedule equation
Schedule Equation In months = 3.0 * Staff-Months ^ (1/3)<br /> = 3.0 * 2.2045 ^ (1/3)(Using the actual-effort for Project A)<br /> = 3.9 Months.
This 3.0 can be 2.0 or 4.0 based on your organization historical data, but if you don’t have the historical data you can go with the number 3.0.
This Schedule equation mostly works with medium to large projects and for the projects which executes sequentially.
The real effort or the number of actual months in which the project is completed is 3 months which is roughly equal to the schedule equation’s output.
Other methods of estimating the schedule is by comparing to the past projects completed in that area in your own organization. Comparing to the past project gives an highly accurate estimate than any other method.
Using the estimate software will give us the accurate more accurate results but again estimation software mostly require calibration data or your own organization’s past projects data.
(2) Estimation for Project B
Now similar to project A I have complete the same procedure for Project B and estimated the size, effort and schedule and then compare each one of that with the actual size, effort and schedule. I will just go through all the steps with detail description here.
SIZE for Project B
External Inputs | Complexity |
Serial data | High |
Screen control | Medium |
Site Testing Data Files | Low |
Tiles | Medium |
User Movement control | High |
Simulation In File | Medium |
External Outputs | Complexity |
1)Screen display | High |
2)Screen Status | Low |
Internal Logical Files | Complexity |
1)Tiles Files | High |
2)Result in excel | Low |
External Interface File | Complexity |
1)Tiles downloading | High |
External Queries
Not Any
Now estimating un-adjusted function points in the following table:
Parameters | Low | Medium | High |
External Inputs | 1*3=3 | 3*4=12 | 2*6=12 |
External Outputs | 1*4=4 | 0 | 1*7=7 |
External Queries | 0 | 0 | 0 |
Internal Logical Files | 1*7=7 | 0 | 1*15=15 |
External Interface Files | 0 | 0 | 1*10=10 |
Here un-adjusted function count is 66.
For adjustment
Multiplier | Adjusted function points |
1 | 66 |
0.8 | 52.8 |
1.2 | 79.2 |
Adjusted Function Points | Median C# LOC | Range of C# LOC |
66 | 3630 | 2640 to 5280 |
52.8 | 2904 | 2112 to 4224 |
79.2 | 4354 | 3168 to 7128 |
Now as we have estimated the size.
The real size of project B is: 2761 LOC. No automated generated code and no blank lines are counted.
When we look at the multiplier there is 30 % error for multiplier “1” and 5% error for “0.8” multiplier.
Here multiplier 0.8 produces function points and LOC which are very near to the actual. There I make a not of this kind of project in my historical data. For simplicity purpose the reader may choose any and then stick to that multiplier.
Effort for Project B
ISBSG Effort = 0.157 * 52.8 ^ 0.591 * 1^0.810 taking 52.8 as total function points<br /> = 1.6367 (staff months)
Now the actual effort in completing the project from my log register is:
Actual-Effort = 1.522 staff-month<br /> = 201 men-hours (Considering 132 hours/staff month)
One can see there 2.5% error in the original estimate.
Schedule for Project B
Estimating the schedule using Basic Schedule Equation
Estimated Schedule in Months = 3.0 * Staff-Months ^ (1/3)<br /> = 3.0* 1.522 ^ (1/3)(using actual effort)<br /> = 3.45 Months
The actual schedule or time take to complete the project is around 3 Months from my log register.
Historical Data and its Importance
My assumption
Before deep diving into the estimation world I assume that historical data or industry specific data must be of more than 100 projects. But this is not the case in estimation. You only need 2 to 3 project’s data. This amount of data is enough to estimate for the future projects. This amount of data is enough to use in any type calibration software and to calibrate any other estimation technique such as COCOMO. Therefore I encourage the reader of this article to go for collecting there organization data even if they have completed two projects up to now.
There is a difference between industry specific data and an organization data. Industry specific data is collected from organization working in the same type of project. This industry specific data is not very accurate since each and every organization has its own environment which affects the overall the estimate. Organization data or historical data is related to single organization or your organization in which you are currently working. Historical data is of very importance and can be used for getting accurate estimation.
How did I make my historical data?
After understanding the historical data and clearing the confusion for assumption I tried to gather the information for my past projects. Although I have completed many projects in the past I do not have their records or logs. But I do have the records or the logs of the 2 recent projects.
What to collect in order to have some historical data. From past project one can collect size of the project, effort and schedule for each project. Data for 2 to 3 projects will be enough for accurate estimate.
In my Organization I have these parameters:
1 day = 6 men-hour
1 Week = 30 men-hour
1 Month = 132 men-hour
1 Month =22 days
You can adjust these parameters according to your own organization.
Following is the historical data that I collect to use in future projects for estimation. These are related to the two examples that I have discussed in this article previously.
Historical data matrix:
Project Name | Size(LOC) | Size(FPS) | Effort(hours) | LOC/Hour | FP/Hour | Schedule |
Project A | 5000 | 79 | 291 | 17.18 | 0.271 | 3-Months |
Project B | 2761 | 52.8 | 201 | 13.7363 | 0.262 | 3-Months |
This historical data is based on two examples which I have discussed in this article above.
Use in Estimation software
One can find a very good COCOMO based estimation software from the USC (University of Southern California website) at this link. Another software is from Construx which is free to use and can be downloaded from here. In both of these software tools you can calibrate using historical data for getting accurate estimates.
Both of these software are very good in estimating the effort and schedule if they are provided with the calibration data also known as historical data. Check these estimation software and see how they can help you.
Productivity matrices
From the historical data one can measure the productivity of their organization. From above two examples I can drive the productivity matrix here.
Productivity Matrix in Person-Day:
Project Name | LOC | FPS |
Project A | 103.08 | 1.626 |
Project B | 82.4178 | 1.572 |
By using these matrices I can estimate any future project that I will be asked to develop.
To conclude I emphasize everyone collect their historical data. After 2 projects they can use modern techniques such as COCOMO and software based techniques to estimate their future projects.
Conclusion
In this I article I introduced briefly the topic of estimation, what are the challenges that we face during estimation process. I describe the estimation process with the help of two projects and share some of my own experience. I also describe the importance of managing the historical data as well as provide guidelines and example for managing historical data. One can start estimating their future project from the information given in this article straight away. Their starting or initial estimates may contain errors but they will refine their skills after a couple of projects. After all estimation is a science as well as art.
Additional Resources
1) Book, Software Engineering: A Practitioner's Approach by Roger S Pressman
2) Book, Software Estimation: Demystifying the black art by Steve McConnell