This is called Video Conferencing, so when searching for information you should use that term.
There are several parts to a project like this.
1) Capturing raw video from camera and audio from microphone.
2) Compressing the video and audio.
3) Transmitting and receiving video and audio.
4) Decompressing the video and audio.
5) Displaying the video and playing the audio.
Capturing
There are several articles here on Code Project related to capturing video from web cams. I believe most of them use DirectShow.
Chesnokov Yuriy[
^] has written excellent articles on video processing, compression and analytics and also has an article on video capturing:
Video Preview and Frames Capture to Memory with SampleGrabber in Buffered Mode.[
^]
Here is another article, which really is about solving Sudoku puzzles, but he does captures video from a web cam and I think it is worth looking at:
Realtime Webcam Sudoku Solver[
^]
Here are a couple more articles like that:
An Easy Video-processing Framework by Grabbing Frames as Bitmaps Using DirectShow[
^]
Yet another Web Camera control[
^]
I don't really have a lot of references for audio capturing, but here is one to get you started:
Play or Capture Audio Sound. Send and Receive as Multicast (RTP)[
^]
Compressing audio and video
There are so many audio and video CODECs, that it can be difficult to zero in on any specific ones. I will recommend using H.264 for the video and a speech targeted audio compression. I have used
Speex[
^] in the past, but it looks like that project is end of life.
Overall, I will recommend using the
FFmpeg[
^] libraries (not FFmpeg itself, but the libavcodec, libavformat, etc. libraries). It can be challenging to get this working, but there is so much functionality that it is worth researching.
Transmitting and Receiving
You do not mention if you want this conferencing to work within an organization or if it is for your friends and family on the Internet, but in order to develop something that will be flexible and work in most environments, you need to settle on standardized protocols (such as RTP) and video formats (such as H.264).
RTP was created for this purpose and is allowed by most organizations IT departments. If you try to make up your own protocol, the traffic will most likely end up getting blocked by some firewalls.
The FFmpeg libraries also has built-in RTP functionality, but I do not know if it is fully extended to support actual video conferencing.
Decompressing audio and video
Again I will point to FFmpeg as full-featured package for decompressing audio and video.
Displaying video and playing audio
If you go through the articles I have linked to here, presentation of the multimedia will be more or less covered as well.
There is a great article here to get you started:
Examples to create your Conferencing System in .NET, C# VOIP & Video Conferencing Systems using H.323 and TAPI 3[
^].
ConferenceXP[
^] is a Video Conferencing project originally started by Microsoft and later open-sourced. I don't know if it is still being actively developed, but it is a fairly large project, so there is a lot to study.
The author of the first article link I gave also wrote an article about utilizing the ConferenceXP API to build your own system:
How to use the managed RTP API classes in .NET to create your multicasting systems[
^]
Best of luck to you.
Soren Madsen