|
This one's beyond my pay grade. If you've configured Docker to have this much RAM it should be lavishing thanks on you, not core dumping. That's just inconsiderate.
I did see mention of hosting system core dumps when a Docker container hit it's assigned RAM max, but that doesn't seem to be the case here.
I wonder if it's not an out-of-memory issue, but rather a memory access / memory corruption issue?
cheers
Chris Maunder
|
|
|
|
|
Thanks for thinking about this. I also think it's a memory access issue, but why isn't everybody using this Docker container getting it? Docker provides such a consistent environment that it's really hard to figure out why it's only my Docker container that doesn't work. The amount of system resources is probably the biggest variable not controlled by the container, but as you point out, there is no shortage. I have watched the memory consumption using "docker stats" once a second, and the memory consumption does not gradually increase over the 5-10 minute lifetime of the container as you might expect it to with a memory leak.
|
|
|
|
|
Well, turns out it's probably a memory issue and not a memory access issue. Based on a whim and partly on a "docker out of memory" thread unrelated to ai-server, I limited the file handles in the docker-compose file as follows:
ulimits:
nofile:
soft: 65536
hard: 65536
and that appears to have resolved or at least mitigated the issue. To be clear, the number of files was unlimited prior to my change. The ai-server has been up more than 4 hours, which is 3 hours 50 minutes longer than it has ever run before. It is happily matching faces using only 3.1 GB of RAM. I have not yet tried to prove that the number of file handles increases until it consumes all of the memory, but I'm wondering if ai-server spends its free time grabbing file handles as fast as it can when they are unlimited.
It's still very curious that nobody else has reported this. Maybe it has to do with Fedora, but it seems to me that Docker running under Fedora should look the same as Docker running under any other distribution from inside the container.
I have some time to do further troubleshooting in the next few days.
|
|
|
|
|
I've got some answers.
Codeproject.ai-server does, in fact, continuously open new file handles at the rate of about 120/minute on my system, up to the limit if one exists. If there is no limit, it keeps going until it consumes all system memory. The reason Fedora is different (I think) is because Fedora made a decision not to impose limits on Docker itself due to the overhead of enforcing those limits, and suggests that limits be established on individual containers using cgroups instead. This "out of memory" error would inevitably occur on any distribution not enforcing file limits on docker by default. That may only be Fedora and Redhat at this time.
I reduced the file open limit to 1024 on ai-server and observed it for a while. It gets up to the limit, then bounces back down to about 440 files and starts over. It doesn't crash. The file handle that increases is a FIFO.
This is definitely a bug that needs to be addressed.
modified yesterday.
|
|
|
|
|
We had an issue that eventually lead to many file handles / watchers being set at startup. There's a check for this at startup and a warning issued, but as to it creating a bucket load more each second, that's bizarre. It would be handy to know which process is adding the handles: a module or the server itself.
Thanks,
Sean Ewington
CodeProject
|
|
|
|
|
A file handle is left open every time one of these child processes exits:
Quote: futex(0x55d234f6aba4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1671, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
write(62, "\21", 1) = 1
rt_sigreturn({mask=[]}) = 202
futex(0x55d234f6aba4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
futex(0x55d234f6aba4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1675, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
write(62, "\21", 1) = 1
rt_sigreturn({mask=[]}) = 202
futex(0x55d234f6aba4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1677, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
write(62, "\21", 1) = 1
rt_sigreturn({mask=[]}) = 202
futex(0x55d234f6aba4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1680, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
write(62, "\21", 1) = 1
rt_sigreturn({mask=[]}) = 202
futex(0x55d234f6aba4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1682, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
write(62, "\21", 1) = 1
rt_sigreturn({mask=[]}) = 202
|
|
|
|
|
Could you do me a favour please? The process id is given by si_pid (eg si_pid=1671). Can you do this for a process Id you've recently spotted?
- Identify the Process with PID 1671:
ps -p 1671 -o comm=
It should spit out the app name
- Identify the Parent Process:
ps -p 1671 -o ppid=
eg output will be '1234'
- Identify the Parent Process Name:
ps -p 1234 -o comm=
It should spit out the parent app name
- List Open Files for Parent Process:
lsof -p 1234
cheers
Chris Maunder
|
|
|
|
|
Chris, I already tried to figure out what was starting all those processes, but they don't last long enough. I have never seen one of the additional processes even with a ps aux. However, there may be another way to answer the question. I had already turned off all of the modules except Face Processing, so I turned that one off too. With no modules active, the FIFO file handles continue to accumulate. For what it's worth, lsof attributes all of the FIFOs to CodeProject.
|
|
|
|
|
I am more than happy to help troubleshoot in any way that I can. I suspect that any server, though, running from the same Docker image, is doing the same thing. I created a completely independent RPI instance using an RPI 4 with 8GB RAM and newly downloaded image (Linux pi8-rpi 6.6.31+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.31-1+rpt1 (2024-05-29) aarch64 GNU/Linux). I added Docker and downloaded codeproject/ai-server:rpi64 then set up the docker-compose file exactly like the example on your site. In other words, it is completely vanilla.
The open files limit (by default) is 1048576. The CodeProject.AI.Server.dll process is doing exactly the same thing it does in my Fedora environments and at about the same rate:
Quote: futex(0x55a20d65b8, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=237742, si_uid=0, si_status=0, si_utime=0, si_stime=2} ---
write(64, "\21", 1) = 1
rt_sigreturn({mask=[]}) = 367791007160
The number of FIFO file handles increases until it hits the limit, then drops back to around 800 (probably less because I'm probably not seeing the lowest number).
|
|
|
|
|
It does make sense that it would be the server, but thanks for confirming that.
Do you have the Explorer and/or dashboard open when you're seeing file handles grow? If so, and if you close both, do the file handles stabilise?
My guess is it's TCP/IP connections. The question is: where?
cheers
Chris Maunder
|
|
|
|
|
I'm running the 12_2 CUDA Docker version...
This is what I see:
I also tried the version 11 Cuda Docker as well.
Is it just me or did this change?
|
|
|
|
|
|
Hi all. I am a long term Windows and BlueIris user but a novice with linux etc.
In an effort to use the mesh capabilities of CodeProject.AI on BlueIris, I have managed to get Mendel running on a Google Coral Dev Board and now want to install CodeProject.ai to the dev board - and am struggling so would really appreciate assistance, please.
I couldn't find any specific guidance for this board so am following the general installation guide
sudo apt install dotnet-sdk-7.0 appears to be failing with this output:
mendel@coy-apple:~$ sudo apt install dotnet-sdk-7.0
Reading package lists... Done
Building dependency tree... Done
E: Unable to locate package dotnet-sdk-7.0
E: Couldn't find any package by glob 'dotnet-sdk-7.0'
E: Couldn't find any package by regex 'dotnet-sdk-7.0'
mendel@coy-apple:~$
What am I doing wrong, please?
|
|
|
|
|
Mendel is essentially Debian so you could try using the Ubuntu .deb installer
cheers
Chris Maunder
|
|
|
|
|
I'm using codeproject 2.6.5 and when installing LlamaChat the following error occurs:
Installing simple Python bindings for the llama.cpp library...(⌠failed check) done
Soon after:
23:37:26:LlamaChat: Traceback (most recent call last):
23:37:26:LlamaChat: File "C:\Program Files\CodeProject\AI\modules\LlamaChat\llama_chat_adapter.py", line 16, in
23:37:26:LlamaChat: from llama_chat import LlamaChat
23:37:26:LlamaChat: File "C:\Program Files\CodeProject\AI\modules\LlamaChat\llama_chat.py", line 7, in
23:37:26:LlamaChat: from llama_cpp import ChatCompletionRequestSystemMessage, \
23:37:26:LlamaChat: ModuleNotFoundError: No module named 'llama_cpp'
what is happening?
|
|
|
|
|
Thanks very much for your message. It could be the module did not install correctly. Could you please try re-installing it?
If the same thing happens, could you please C:\Program Files\CodeProject\AI\modules\LlamaChat and share your install.log (as well as System Info tab from your CodeProject.AI Server dashboard)?
Thanks,
Sean Ewington
CodeProject
|
|
|
|
|
hello, I already reinstalled the entire codeproject on two PCs, but none of them worked. follow the log:
=============================
2024-05-31 00:20:36: Installing CodeProject.AI Analysis Module
2024-05-31 00:20:36: ======================================================================
2024-05-31 00:20:36: CodeProject.AI Installer
2024-05-31 00:20:36: ======================================================================
2024-05-31 00:20:36: 95.3Gb of 976Gb available on
2024-05-31 00:20:36: General CodeProject.AI setup
2024-05-31 00:20:36: Creating Directories...done
2024-05-31 00:20:36: GPU support
2024-05-31 00:20:36: CUDA Present...Yes (CUDA 12.2, No cuDNN found)
2024-05-31 00:20:37: ROCm Present...No
2024-05-31 00:20:37: Checking for .NET 7.0...Checking SDKs...Upgrading: .NET is 0
2024-05-31 00:20:37: Current version is 0. Installing newer version.
2024-05-31 00:20:37: 'winget' não é reconhecido como um comando interno
2024-05-31 00:20:37: ou externo, um programa operável ou um arquivo em lotes.
2024-05-31 00:20:39: Reading LlamaChat settings.......done
2024-05-31 00:20:39: Installing module LlamaChat 1.4.4
2024-05-31 00:20:39: Installing Python 3.9
2024-05-31 00:20:39: Python 3.9 is already installed
2024-05-31 00:20:46: Creating Virtual Environment (Local)...done
2024-05-31 00:20:46: Confirming we have Python 3.9 in our virtual environment...present
2024-05-31 00:20:46: Downloading mistral-7b-instruct-v0.2.Q4_K_M.gguf
2024-05-31 00:32:41: Moving mistral-7b-instruct-v0.2.Q4_K_M.gguf into the models folder.
2024-05-31 00:32:41: Installing Python packages for LlamaChat
2024-05-31 00:32:41: [0;Installing GPU-enabled libraries: If available
2024-05-31 00:32:42: Ensuring Python package manager (pip) is installed...done
2024-05-31 00:32:52: Ensuring Python package manager (pip) is up to date...done
2024-05-31 00:32:52: Python packages specified by requirements.cuda12_2.txt
2024-05-31 00:32:59: - Installing the huggingface hub...(✅ checked) done
2024-05-31 00:33:01: - Installing disckcache for Disk and file backed persistent cache...(✅ checked) done
2024-05-31 00:33:09: - Installing NumPy, a package for scientific computing...(✅ checked) done
2024-05-31 00:33:25: - Installing simple Python bindings for the llama.cpp library...(⌠failed check) done
2024-05-31 00:33:25: Installing Python packages for the CodeProject.AI Server SDK
2024-05-31 00:33:26: Ensuring Python package manager (pip) is installed...done
2024-05-31 00:33:28: Ensuring Python package manager (pip) is up to date...done
2024-05-31 00:33:28: Python packages specified by requirements.txt
2024-05-31 00:33:32: - Installing Pillow, a Python Image Library...(✅ checked) done
2024-05-31 00:33:32: - Installing Charset normalizer...Already installed
2024-05-31 00:33:36: - Installing aiohttp, the Async IO HTTP library...(✅ checked) done
2024-05-31 00:33:39: - Installing aiofiles, the Async IO Files library...(✅ checked) done
2024-05-31 00:33:41: - Installing py-cpuinfo to allow us to query CPU info...(✅ checked) done
2024-05-31 00:33:42: - Installing Requests, the HTTP library...Already installed
2024-05-31 00:33:42: Scanning modulesettings for downloadable models...No models specified
2024-05-31 00:33:42: Traceback (most recent call last):
2024-05-31 00:33:42: File "C:\Program Files\CodeProject\AI\modules\LlamaChat\llama_chat_adapter.py", line 16, in <module>
2024-05-31 00:33:42: from llama_chat import LlamaChat
2024-05-31 00:33:42: File "C:\Program Files\CodeProject\AI\modules\LlamaChat\llama_chat.py", line 7, in <module>
2024-05-31 00:33:42: from llama_cpp import ChatCompletionRequestSystemMessage, \
2024-05-31 00:33:42: ModuleNotFoundError: No module named 'llama_cpp'
2024-05-31 00:33:43: Self test: Self-test passed
2024-05-31 00:33:43: Module setup time 00:13:05.67
2024-05-31 00:33:43: Setup complete
2024-05-31 00:33:43: Total setup time 00:13:06.86
Installer exited with code 0
===============================
|
|
|
|
|
Can you please paste the info from the System Info tab here? Otherwise we're just guess what system you have.
The translation is "'winget' is not recognized as an internal command" which means you're missing some bits. I'm guessing there may be other issues the installer may be having due to the language on your machine not being English
cheers
Chris Maunder
|
|
|
|
|
Server version: 2.6.5
System: Windows
Operating System: Windows (Microsoft Windows 10.0.19045)
CPUs: AMD Ryzen 9 5900X 12-Core Processor (AMD)
1 CPU x 12 cores. 24 logical processors (x64)
GPU (Primary): NVIDIA GeForce RTX 3060 (12 GiB) (NVIDIA)
Driver: 536.25, CUDA: 12.2.91 (up to: 12.2), Compute: 8.6, cuDNN:
System RAM: 64 GiB
Platform: Windows
BuildConfig: Release
Execution Env: Native
Runtime Env: Production
Runtimes installed:
.NET runtime: 7.0.10
.NET SDK: Not found
Default Python: Not found
Go: Not found
NodeJS: Not found
Rust: Not found
Video adapter info:
NVIDIA GeForce RTX 3060:
Driver Version 31.0.15.3625
Video Processor NVIDIA GeForce RTX 3060
System GPU info:
GPU 3D Usage 44%
GPU RAM Usage 10,6 GiB
Global Environment variables:
CPAI_APPROOTPATH = <root>
CPAI_PORT = 32168
|
|
|
|
|
Thanks very much for that. We're narrowing in on a possible reason for this, but to confirm, could you please change your Logging level in the Server logs tab to Information, and then try to re-install the module again, and then share those module install logs with us?
Thanks,
Sean Ewington
CodeProject
|
|
|
|
|
hello,
I changed the log level to 'trace'. Follow the informations:
17:36:50:LlamaChat doesn't appear in the Process list, so can't stop it.
17:36:51:Call to run Uninstall on module LlamaChat has completed.
17:37:10:Preparing to install module 'LlamaChat'
17:37:10:Downloading module 'LlamaChat'
17:37:11:Installing module 'LlamaChat'
17:37:11:Installer script at 'C:\Program Files\CodeProject\AI\setup.bat'
17:37:12:LlamaChat: Installing CodeProject.AI Analysis Module
17:37:12:LlamaChat: ======================================================================
17:37:12:LlamaChat: CodeProject.AI Installer
17:37:12:LlamaChat: ======================================================================
17:37:12:LlamaChat: 154.1Gb of 476Gb available on
17:37:12:LlamaChat: General CodeProject.AI setup
17:37:12:LlamaChat: Creating Directories...done
17:37:12:LlamaChat: GPU support
17:37:13:LlamaChat: CUDA Present...Yes (CUDA 12.5, No cuDNN found)
17:37:13:LlamaChat: ROCm Present...No
17:37:15:LlamaChat: Reading LlamaChat settings.......done
17:37:15:LlamaChat: Installing module LlamaChat 1.4.4
17:37:15:LlamaChat: Installing Python 3.9
17:37:15:LlamaChat: Python 3.9 is already installed
17:37:26:LlamaChat: Creating Virtual Environment (Local)...done
17:37:26:LlamaChat: Confirming we have Python 3.9 in our virtual environment...present
17:37:26:LlamaChat: Downloading mistral-7b-instruct-v0.2.Q4_K_M.gguf
17:40:00:LlamaChat: Moving mistral-7b-instruct-v0.2.Q4_K_M.gguf into the models folder.
17:40:00:LlamaChat: Installing Python packages for LlamaChat
17:40:00:LlamaChat: [0;Installing GPU-enabled libraries: If available
17:40:02:LlamaChat: Ensuring Python package manager (pip) is installed...done
17:40:13:LlamaChat: Ensuring Python package manager (pip) is up to date...done
17:40:13:LlamaChat: Python packages specified by requirements.cuda12.txt
17:40:21:LlamaChat: - Installing the huggingface hub...(✅ checked) done
17:40:23:LlamaChat: - Installing disckcache for Disk and file backed persistent cache...(✅ checked) done
17:40:32:LlamaChat: - Installing NumPy, a package for scientific computing...(✅ checked) done
17:40:51:LlamaChat: - Installing simple Python bindings for the llama.cpp library...(⌠failed check) done
17:40:51:LlamaChat: Installing Python packages for the CodeProject.AI Server SDK
17:40:53:LlamaChat: Ensuring Python package manager (pip) is installed...done
17:40:55:LlamaChat: Ensuring Python package manager (pip) is up to date...done
17:40:55:LlamaChat: Python packages specified by requirements.txt
17:40:58:LlamaChat: - Installing Pillow, a Python Image Library...(✅ checked) done
17:40:59:LlamaChat: - Installing Charset normalizer...Already installed
17:41:04:LlamaChat: - Installing aiohttp, the Async IO HTTP library...(✅ checked) done
17:41:06:LlamaChat: - Installing aiofiles, the Async IO Files library...(✅ checked) done
17:41:09:LlamaChat: - Installing py-cpuinfo to allow us to query CPU info...(✅ checked) done
17:41:10:LlamaChat: - Installing Requests, the HTTP library...Already installed
17:41:10:LlamaChat: Scanning modulesettings for downloadable models...No models specified
17:41:11:LlamaChat: Traceback (most recent call last):
17:41:11:LlamaChat: File "C:\Program Files\CodeProject\AI\modules\LlamaChat\llama_chat_adapter.py", line 16, in
17:41:11:LlamaChat: from llama_chat import LlamaChat
17:41:11:LlamaChat: File "C:\Program Files\CodeProject\AI\modules\LlamaChat\llama_chat.py", line 7, in
17:41:11:LlamaChat: from llama_cpp import ChatCompletionRequestSystemMessage, \
17:41:11:LlamaChat: ModuleNotFoundError: No module named 'llama_cpp'
17:41:11:LlamaChat: Self test: Self-test passed
17:41:11:LlamaChat: Module setup time 00:03:58.04
17:41:11:LlamaChat: Setup complete
17:41:11:LlamaChat: Total setup time 00:03:59.33
17:41:11:Module LlamaChat installed successfully.
17:41:11:Module LlamaChat not configured to AutoStart.
17:41:11:Installer exited with code 0
|
|
|
|
|
hello,
Any feedback on this problem?
|
|
|
|
|
Quote: Installing simple Python bindings for the llama.cpp library...(⌠failed check) done
This is the issue. It's not able to install llama-cpp-python.
In the llama folder under /modules, there is a file requirements.cuda12.txt
Can you edit that and change
https://abetlen.github.io/llama-cpp-python/whl/cu121/ to https://abetlen.github.io/llama-cpp-python/whl/cu124/
(121 to 124)
then open a terminal in the llama directory and type ..\..\setup and it'll rerun the setup for that module.
The issue, I think, is you're on CUDA 12.5 and our requirements.txt files only support up to CUDA 12.2. It could be failing due to that. The best match I can find is for 12.4, so it's worth a try
cheers
Chris Maunder
|
|
|
|
|
did not work,
I uninstalled the module and had it installed again at the time of installation, I quickly modified the files (requirements.cuda12....) and also created a final file cuda12_5.txt, the installation pulled this file, but in the end it failed with the same error:
"Installing simple Python bindings for the llama.cpp library...(⌠failed check)"
I installed cuda 12.5 after trying to install LlamaChat, it shouldn't be cuda 12.5... Previously it was with cuda 11.8.
Any other suggestions?
|
|
|
|
|
I'm currently running 2.6.2 and it is working fine. 2.6.2 was easier to install than previous version and reflects amazing work by the team!
If I read the "release note", it only says: "2.6.5 Various installer fixes"
Given that upgrades may, or may not be, fast or even successful, based on this, I would not choose to upgrade solely for installer fixes...
But, in the UI, I see: "An update to version 2.6.5 is available Download
Support for external modules and module updates."
OK, that's a different matter... do I need to upgrade to get the updated modules? I do not see any modules available for update in the Modules control panel? I thought that was the point of modules?
Is an upgrade recommended if I already have 2.6.2 installed and functioning?
Do I need to upgrade in order to update modules or should module updates be available in 2.6.2?
A little more clarity would be helpful.
|
|
|
|
|