|
[nvidia-smi.exe]
Unable to determine the device handle for GPU 0000:01:00.0: GPU is lost. Reboot the system to recover this GPU
[nvcc --version]
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
[nvidia-smi.exe] after reboot
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 471.68 Driver Version: 471.68 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P4 WDDM | 00000000:01:00.0 Off | 0 |
| N/A 34C P8 7W / 75W | 146MiB / 7680MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 852 C+G Insufficient Permissions N/A |
+-----------------------------------------------------------------------------+
modified 12-Mar-24 13:27pm.
|
|
|
|
|
Okay, just a random thought: Do you know if your YOLOv5 was installed with the dependencies necessary to run on GPU? Perhaps it was 'not detected' previously, so those required dependencies were not downloaded?
Maybe, worth to try to reinstall YoloV5?
|
|
|
|
|
Just reinstalled YOLOv5. GPU still not showing available to enable. Thank you for the suggestion though.
|
|
|
|
|
it may not be relevant but I am having similar issues with YOLO not working on 2 CodePorject installs. Both systems where I am having problems are using a Tesla P4 with the Nvidia Grid drivers, CUDA 11.8 and CUDNN 8.9.
CodeProject.AI Server: AI the easy way.[^]
|
|
|
|
|
Sounds very similar. Does your object detection work as if there is no issues... as if the Tesla is processing images?
|
|
|
|
|
Actually, no. I took a picture from one of my BlueIris alerts and tried Object detection in explorer. For both systems where I am having issues object detection immediately reports back "No predictions returned" I took the same picture and ran it through object detection on a temp machine I threw an OS and CodeProject on just to see what would happen and it works perfectly, it correctly detected a person, a car, even a backpack. I installed CodeProject exactly the same way on all 3 systems and the OS is the same, the only difference is the hardware. I can't imagine why the Tesla GPU would cause this but it's starting to feel like that's part of my issue.
|
|
|
|
|
I just noticed that you are using YOLOv5 3.1. And based on your first post, your Tesla P4 is having CUDA 12.4:
18:46:20:GPU (Primary): Tesla P4 (8 GiB) (NVIDIA)
18:46:20: Driver: 551.61, CUDA: 12.4 (up to: 12.4), Compute: 6.1, cuDNN: 8.9
Based on the description of YOLOv5 3.1, it is for CUDA 10 or 11 for older GPUs
Object Detection (YOLOv5 3.1)
2024-02-08
GPL-3.0Provides Object Detection using YOLOv5 3.1 targeting CUDA 10 or 11 for older GPUs.
Project by Chris Maunder, Matthew Dennis, based on Deepstack. Uses Python, PyTorch, YOLO. Your CUDA version is above the 3.1's description. Maybe that's why it is not using GPU?
Now, if you look at the YOLOv5 6.2's description, it is for 11.5+ which matches with your Tesla P4's setup:
Object Detection (YOLOv5 6.2)
2024-02-08
GPL-3.0Provides Object Detection using YOLOv5 6.2 targeting CUDA 11.5+, PyTorch < 2.0 for newer GPUs.
Project by Matthew Dennis, based on Ultralytics YOLOv5.
My computer that has Tesla P4 is using YOLOv5 6.2 and it is using the GPU.
I hope this solves it.
|
|
|
|
|
Thank you! Based on your suggestion I did a full tear-down and used the GRID driver, CUDA 11.8, cuDNN-8.9.4 and YOLOv5 6.2. Everything worked for a good 10 minutes and then it crashed. Do you recommend a specific CUDA version?
|
|
|
|
|
This is what I have for my GPU:
GPU (Primary): Tesla P4 (8 GiB) (NVIDIA)
Driver: 538.15, CUDA: 12.2 (up to: 12.2), Compute: 6.1, cuDNN: 8.9
I'm using the driver version of 538.15 from the following link: Drivers for NVIDIA RTX Virtual Workstation (vWS) | Compute Engine Documentation | Google Cloud[^].
Note: Left side of 538.15 says the CUDA version is 16.3, I don't know why it is saying that because it seems to be incorrect.
If possible, please post the log so we know what errors your CodeProject AI ran into. It should give some idea what caused the crash. And if it is something deeper, it might be helpful for Sean and others.
|
|
|
|
|
OMG! Thank you! Driver: 538.15 updated and CUDA has been going strong for 12 hours! I believe this has now been resolved; I can now do other AI projects.
[Issue Resolved]
|
|
|
|
|
That is great news! I hope it just keeps going non-stop!
|
|
|
|
|
I performed a fresh install on Ubuntu server 24.04/CUDA/CUDNN/CodeProject.AI and encountered the following issues.
- Installation of CodeProject.AI and Dependencies Failure: The installation of CodeProject.AI itself completes without any issues. However, during the "FINAL REQUIREMENTS" portion of the installation of additional required components for Face Processing and Object Detection (YOLOv5 6.2), I encountered problems.
pushd "/usr/bin/codeproject.ai-server-2.5.4/" && bash setup.sh && popd
- Running the CodeProject.AI-Service Instability: Although I could enable and start the CodeProject.AI-Service, I noticed that it was not stable—shutting down and restarting every 5 to 30 seconds.
Server version: 2.5.4
System: Linux
Operating System: Linux (Ubuntu 22.04)
CPUs: Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz (Intel)
1 CPU x 4 cores. 4 logical processors (x64)
GPU (Primary): NVIDIA RTX A2000 12GB (12 GiB) (NVIDIA)
Driver: 550.54.14, CUDA: 12.4 (up to: 12.4), Compute: 8.6, cuDNN: 9.0.0
System RAM: 16 GiB
Platform: Linux
BuildConfig: Release
Execution Env: Native (SSH)
Runtime Env: Production
.NET framework: .NET 7.0.16
Default Python: 3.10
Go Version:
Video adapter info:
Device 1234:
Driver Version
Video Processor
GA106 [RTX A2000 12GB] (rev a1):
Driver Version
Video Processor
System GPU info:
GPU 3D Usage 0%
GPU RAM Usage 884 MiB
Global Environment variables:
CPAI_APPROOTPATH = <root>
CPAI_PORT = 32168
modified 5 days ago.
|
|
|
|
|
There is a known bug in 2.5.4 that prevented the server from starting.
Please upgrade to 2.5.6.
We have pull the 2.5.4 release.
"Mistakes are prevented by Experience. Experience is gained by making mistakes."
|
|
|
|
|
Thank you @matthew-dennis! Please excuse my ignorance, but where do I download 2.5.6, I'm only seeing 2.5.4?
|
|
|
|
|
I'm facing same problem... 2.5.6 seems only available for windows and not for native Linux...
|
|
|
|
|
At least I have company now. I thought I was just really daft being unable to find the 2.5.6 linux version. In the meantime, I just keep a tmux session open running:
sudo bash /usr/bin/codeproject.ai-server-2.5.4/start.sh
|
|
|
|
|
Hi Matthew, I have since upgraded to 2.6.2 and experience nearly identical behaviour. Following the same steps above gets me up and running with the missing libraries and unfortunately utilizing the service continues to start and stop.
|
|
|
|
|
Are you running the installer under sudo?
cheers
Chris Maunder
|
|
|
|
|
Hi Chris, yes, I ran the installer as sudo. I looked in my history and the exact command was:
sudo dpkg -i codeproject.ai-server_2.6.2_Ubuntu_x64.deb
Please let me know if there is anything you would like me to try.
|
|
|
|
|
We've just updated the Ubuntu version to 2.6.4. Curious to know if 2.6.4 works better for you.
Thanks,
Sean Ewington
CodeProject
|
|
|
|
|
Thanks Sean! Okay, this looks MUCH better. Everything seemed to go without a hitch. I didn't have to manually install any libs, the service fired up, no shutdowns and appears stable after a reboot.
I did observe the following logs errors, but it otherwise all appears to be working:
Pillow reported an initial failed check, then subsequently reported already installed.
10:12:01:FaceProcessing: Python packages will be specified by requirements.linux.cuda11_5.txt
10:12:09:FaceProcessing: - Installing Pandas, a data analysis / data manipulation tool... (✅ checked) done
10:12:19:FaceProcessing: - Installing CoreMLTools, for working with .mlmodel format models... (✅ checked) done
10:12:25:FaceProcessing: - Installing OpenCV, the Open source Computer Vision library... (✅ checked) done
10:12:26:FaceProcessing: - Installing Pillow, a Python Image Library... (❌ failed check) done
10:12:33:FaceProcessing: - Installing SciPy, a library for mathematics, science, and engineering... (✅ checked) done
10:12:33:FaceProcessing: - Installing PyYAML, a library for reading configuration files...Already installed
10:13:23:FaceProcessing: - Installing PyTorch, an open source machine learning framework... (✅ checked) done
10:14:23:FaceProcessing: - Installing TorchVision, for working with computer vision models... (✅ checked) done
10:14:34:FaceProcessing: - Installing Seaborn, a data visualization library based on matplotlib... (✅ checked) done
10:14:34:FaceProcessing: Installing Python packages for the CodeProject.AI Server SDK
10:14:35:FaceProcessing: Searching for python3-pip...All good.
10:14:36:FaceProcessing: Ensuring PIP compatibility... done
10:14:36:FaceProcessing: Python packages will be specified by requirements.txt
10:14:37:FaceProcessing: - Installing Pillow, a Python Image Library...Already installed
10:14:37:FaceProcessing: - Installing Charset normalizer...Already installed
10:14:41:FaceProcessing: - Installing aiohttp, the Async IO HTTP library... (✅ checked) done
10:14:43:FaceProcessing: - Installing aiofiles, the Async IO Files library... (✅ checked) done
10:14:45:FaceProcessing: - Installing py-cpuinfo to allow us to query CPU info... (✅ checked) done
10:14:45:FaceProcessing: - Installing Requests, the HTTP library...Already installed
I don't ultimately use the following, but see a "self-test failed" from the auto install on launch.
10:14:57:Module FaceProcessing started successfully.
10:15:15:ObjectDetectionYOLOv5Net: Downloading ObjectDetectionYOLOv5Net-CUDA-1.10.1.zip...Expanding... done.
10:15:15:ObjectDetectionYOLOv5Net: Moving contents of ObjectDetectionYOLOv5Net-CUDA-1.10.1.zip to bin...done.
10:15:25:ObjectDetectionYOLOv5Net: Downloading YOLO ONNX models...Expanding... done.
10:15:25:ObjectDetectionYOLOv5Net: Moving contents of yolonet-models.zip to assets...done.
10:15:31:ObjectDetectionYOLOv5Net: Downloading Custom YOLO ONNX models...Expanding... done.
10:15:31:ObjectDetectionYOLOv5Net: Moving contents of yolonet-custom-models.zip to custom-models...done.
10:15:31:ObjectDetectionYOLOv5Net: Scanning modulesettings for downloadable models...No models specified
10:15:32:ObjectDetectionYOLOv5Net: Self test: Self-test failed
10:15:32:ObjectDetectionYOLOv5Net: Module setup time 00:00:37
10:15:32:ObjectDetectionYOLOv5Net: Setup complete<br />
10:15:32:ObjectDetectionYOLOv5Net: Total setup time 00:00:38
10:15:32:Module ObjectDetectionYOLOv5Net installed successfully.
Thanks again for this update!
|
|
|
|
|
Hello,
I'm getting these errors when trying to run the setup, also notice when I try run a check it fails on missing modules
Checking for .NET 7.0...Checking SDKs...Upgrading: .NET is 6.0.100-preview.3.21202.5 [C:\Program Files\dotnet\sdk]
(not checked) Done
Already installed
'""' is not recognized as an internal or external command,
operable program or batch file.
Modules has not been installed
08:26:12:detect_adapter.py: Traceback (most recent call last):
08:26:12:detect_adapter.py: File "C:\Program Files\CodeProject\AI\modules\ObjectDetectionYOLOv5-6.2\detect_adapter.py", line 13, in
08:26:12:detect_adapter.py: from module_runner import ModuleRunner
08:26:12:detect_adapter.py: File "../../SDK/Python\module_runner.py", line 30, in
08:26:12:detect_adapter.py: import aiohttp
08:26:12:detect_adapter.py: ModuleNotFoundError: No module named 'aiohttp
modified 11-Mar-24 12:07pm.
|
|
|
|
|
I managed o update dotnet and a few other things and finally get installation o proceed, it fails on one point now, and I've just downloaded the very last version:
21:41:18:detect_adapter.py: Traceback (most recent call last):
21:41:18:detect_adapter.py: File "C:\Program Files\CodeProject\AI\modules\ObjectDetectionYOLOv5-6.2\detect_adapter.py", line 13, in
21:41:18:detect_adapter.py: from module_runner import ModuleRunner
21:41:18:detect_adapter.py: File "../../SDK/Python\module_runner.py", line 30, in
21:41:18:detect_adapter.py: import aiohttp
21:41:18:detect_adapter.py: ModuleNotFoundError: No module named 'aiohttp'
|
|
|
|
|
|
Your last issue appears that the module did not install correctly, possibly due to Internet or PyPi issues.
Please uninstall the ODYolov5.6.2 module and re-install it with the cache disabled.
"Mistakes are prevented by Experience. Experience is gained by making mistakes."
|
|
|
|