In simplified terms, the Parallel library will use as many threads as you have cores in your CPU. If you've got a 4 core CPU with Hyperthreading, you've got 8 logical cores. By default, Parallel will run up to 8 threads out of the thread pool at a time.
Now, your CPU is sitting at 15% because those threads are all blocked by I/O bound operations. All 8 threads are spending most of their time sitting around waiting for a web request to complete.
You could run lots more threads, because you have CPU resources just sitting there idle, but spinning up a thread is a very expensive operation. It takes a lot of time, roughly 200,000 CPU cycles to spin one up and about 6,000 to tear it down. Memory is also an issue. Each thread you spin up will also eat 1MB of memory for stack space. Spin up 1,000 threads and you also eat 1GB of memory, and that's without the thread even executing anything!
Context switching is also a problem. If you spin up 1,000 threads on an 8 core CPU, only 8 threads can run at any point in time. Your threads end up sharing the CPU with the thousands of the other threads running in the system from all other process, including Windows. A context switch is where a core switches from executing one thread to another. This is also kind of expensive, taking about 8,000 CPU cycles.
Throwing threads at a problem is not the "silver bullet" to gain performance. You have to know what's causing your performance problem, what the bottlenecks are, what you're doing about it and how to do it efficiently to keep the performance hits of threading to a minimum.
Start reading
here[
^].