Colab : Memory Allocation - memory

I am facing the below error on google colab on one of code loops.
The Run time selected is GPU and its shows RAM as full but Disk storage usage is 35GB/ 358GB.
Can anyone suggest where I am heading wrong?
The error snippet is as below -
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 405, in _handle_workers
pool._maintain_pool()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 246, in _maintain_pool
self._repopulate_pool()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 239, in _repopulate_pool
w.start()
File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 66, in _launch
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
Thanks

Related

Pytorch dataloaders : OSError: [Errno 9] Bad file descriptor

Description of the problem
The error will occur if the num_workers > 0 , But when I set num_workers = 0 , the error disappeared, though, this will slow down the trainning speed. I think the multiprocessing really matters here .How can I solve this problem?
env
docker python3.8 Pytorch 1.11.0+cu113
error output
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/resource_sharer.py", line 149, in _serve
send(conn, destination_pid)
File "/opt/conda/lib/python3.8/multiprocessing/resource_sharer.py", line 50, in send
reduction.send_handle(conn, new_fd, pid)
File "/opt/conda/lib/python3.8/multiprocessing/reduction.py", line 184, in send_handle
sendfds(s, [handle])
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/reduction.py", line 149, in sendfds
File "save_disp.py", line 85, in <module>
sock.sendmsg([msg], [(socket.SOL_SOCKET, socket.SCM_RIGHTS, fds)])
OSError: [Errno 9] Bad file descriptor
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/resource_sharer.py", line 151, in _serve
test()
File "save_disp.py", line 55, in test
close()
for batch_idx, sample in enumerate(TestImgLoader):
File "/opt/conda/lib/python3.8/multiprocessing/resource_sharer.py", line 52, in close
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
os.close(new_fd)
OSError: [Errno 9] Bad file descriptor
data = self._next_data()
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1207, in _next_data
idx, data = self._get_data()
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1173, in _get_data
success, data = self._try_get_data()
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1011, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/opt/conda/lib/python3.8/multiprocessing/queues.py", line 116, in get
return _ForkingPickler.loads(res)
File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 295, in rebuild_storage_fd
fd = df.detach()
File "/opt/conda/lib/python3.8/multiprocessing/resource_sharer.py", line 58, in detach
return reduction.recv_handle(conn)
File "/opt/conda/lib/python3.8/multiprocessing/reduction.py", line 189, in recv_handle
return recvfds(s, 1)[0]
File "/opt/conda/lib/python3.8/multiprocessing/reduction.py", line 159, in recvfds
raise EOFError
EOFError

I am getting error while running docker-compose up in Windows server 2016

I am getting the below error while running docker-compose up in windows server 2016.
docker compose version is docker-compose version 1.27.4, build 40524192
enter image description here
Traceback (most recent call last):
File "site-packages\docker\api\client.py", line 205, in _retrieve_server_version
File "site-packages\docker\api\daemon.py", line 181, in version
File "site-packages\docker\utils\decorators.py", line 46, in inner
File "site-packages\docker\api\client.py", line 228, in _get
File "site-packages\requests\sessions.py", line 543, in get
File "site-packages\requests\sessions.py", line 530, in request
File "site-packages\requests\sessions.py", line 643, in send
File "site-packages\requests\adapters.py", line 449, in send
File "site-packages\urllib3\connectionpool.py", line 677, in urlopen
File "site-packages\urllib3\connectionpool.py", line 392, in _make_request
File "http\client.py", line 1244, in request
File "http\client.py", line 1290, in _send_request
File "http\client.py", line 1239, in endheaders
File "http\client.py", line 1026, in _send_output
File "http\client.py", line 966, in send
File "site-packages\docker\transport\npipeconn.py", line 32, in connect
File "site-packages\docker\transport\npipesocket.py", line 23, in wrapped
File "site-packages\docker\transport\npipesocket.py", line 72, in connect
File "site-packages\docker\transport\npipesocket.py", line 59, in connect
pywintypes.error: (2, 'CreateFile', 'The system cannot find the file specified.')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "docker-compose", line 3, in <module>
File "compose\cli\main.py", line 67, in main
File "compose\cli\main.py", line 123, in perform_command
File "compose\cli\command.py", line 69, in project_from_options
File "compose\cli\command.py", line 132, in get_project
File "compose\cli\docker_client.py", line 43, in get_client
File "compose\cli\docker_client.py", line 170, in docker_client
File "site-packages\docker\api\client.py", line 188, in __init__
File "site-packages\docker\api\client.py", line 213, in _retrieve_server_version
docker.errors.DockerException: Error while fetching server API version: (2, 'CreateFile', 'The system cannot find the file specified.')
[4332] Failed to execute script docker-compose
Thanks in Advance
I just ran into the exact same symptoms. Turns out Docker wasn't running, and starting Docker solved the problem.
Your docker is probably not running on your machine. Just open Docker Desktop on your computer then go the Containers section to see if its running. If it is running, switch it off then go back to your project and run "docker-compose up --build "

Use google default credentials on local docker run

I have the same problem asked on this question, but the provided solution does not work for me.
Basically, I want to run my docker image, with entrypoint run_query.py, locally. I have issues with credentials when I try to run a Bigquery job.
When I try to run my
docker run -v ~/.config/:/root/.config my-image-name --param1 ...
I get this error
Traceback (most recent call last):
File "run_query.py", line 97, in <module>
query_params=params)
File "run_query.py", line 54, in create_table
query_job = client.query(query, job_config=job_config)
File "/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/client.py", line 2467, in query
query_job._begin(retry=retry, timeout=timeout)
File "/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/job.py", line 3156, in _begin
super(QueryJob, self)._begin(client=client, retry=retry, timeout=timeout)
File "/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/job.py", line 638, in _begin
retry, method="POST", path=path, data=self.to_api_repr(), timeout=timeout
File "/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/client.py", line 558, in _call_api
return call()
File "/usr/local/lib/python3.7/dist-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
on_error=on_error,
File "/usr/local/lib/python3.7/dist-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File "/usr/local/lib/python3.7/dist-packages/google/cloud/_http.py", line 419, in api_request
timeout=timeout,
File "/usr/local/lib/python3.7/dist-packages/google/cloud/_http.py", line 277, in _make_request
method, url, headers, data, target_object, timeout=timeout
File "/usr/local/lib/python3.7/dist-packages/google/cloud/_http.py", line 315, in _do_request
url=url, method=method, headers=headers, data=data, timeout=timeout
File "/usr/local/lib/python3.7/dist-packages/google/auth/transport/requests.py", line 444, in request
self.credentials.before_request(auth_request, method, url, request_headers)
File "/usr/local/lib/python3.7/dist-packages/google/auth/credentials.py", line 133, in before_request
self.refresh(request)
File "/usr/local/lib/python3.7/dist-packages/google/oauth2/credentials.py", line 198, in refresh
self._scopes,
File "/usr/local/lib/python3.7/dist-packages/google/oauth2/_client.py", line 248, in refresh_grant
response_data = _token_endpoint_request(request, token_uri, body)
File "/usr/local/lib/python3.7/dist-packages/google/oauth2/_client.py", line 124, in _token_endpoint_request
_handle_error_response(response_body)
File "/usr/local/lib/python3.7/dist-packages/google/oauth2/_client.py", line 60, in _handle_error_response
raise exceptions.RefreshError(error_details, response_body)
I also tried to use -v ~/.config/gcloud/:/root/.config/gcloud, but I get the same result.
Keep in mind that using this image into a Kubeflow Pipeline works smoothly.
Did I misinterpret the solution from the previous question? What am I missing?

docker-compose command is giving error while building

I am pretty amateur to docker. I was running my application on docker successfully from many days.
Now when I run the command docker-compose up -d it gives me the error as below
raceback (most recent call last):
File "docker-compose", line 3, in <module>
File "compose/cli/main.py", line 88, in main
File "compose/cli/main.py", line 140, in perform_command
File "compose/cli/main.py", line 900, in up
File "compose/project.py", line 385, in up
File "compose/project.py", line 590, in warn_for_swarm_mode
File "site-packages/docker/api/daemon.py", line 73, in info
File "site-packages/docker/utils/decorators.py", line 47, in inner
File "site-packages/docker/api/client.py", line 179, in _get
File "site-packages/requests/sessions.py", line 488, in get
File "site-packages/requests/sessions.py", line 466, in request
File "site-packages/requests/sessions.py", line 641, in
merge_environment_settings
File "site-packages/requests/utils.py", line 605, in get_environ_proxies
File "site-packages/requests/utils.py", line 589, in should_bypass_proxies
File "urllib.py", line 1510, in proxy_bypass
File "urllib.py", line 1484, in proxy_bypass_macosx_sysconf
ValueError: negative shift count
Failed to execute script docker-compose
I searched this error ValueError: negative shift count a lot but nothing useful found.

What does this memory error of cvs2git mean?

I am trying to migrate an very big cvs repository (12GB) to git with cvs2git. Thereby I get the following error in pass 10:
----- pass 10 (BreakSymbolChangesetCyclesPass) -----
Breaking symbol changeset dependency cycles...
Traceback (most recent call last):
File "/usr/bin/cvs2git", line 70, in ?
git_main(os.path.basename(sys.argv[0]), sys.argv[1:])
File "/usr/lib/python2.4/site-packages/cvs2svn_lib/main.py", line 119, in git_main
main(progname, run_options, pass_manager)
File "/usr/lib/python2.4/site-packages/cvs2svn_lib/main.py", line 96, in main
pass_manager.run(run_options)
File "/usr/lib/python2.4/site-packages/cvs2svn_lib/pass_manager.py", line 181, in run
the_pass.run(run_options, stats_keeper)
File "/usr/lib/python2.4/site-packages/cvs2svn_lib/passes.py", line 1174, in run
for (changeset, time_range) in self.changeset_graph.consume_graph(
File "/usr/lib/python2.4/site-packages/cvs2svn_lib/changeset_graph.py", line 355, in consume_graph
for (changeset, time_range) in self.consume_nopred_nodes():
File "/usr/lib/python2.4/site-packages/cvs2svn_lib/changeset_graph.py", line 285, in consume_nopred_nodes
(
File "/usr/lib/python2.4/site-packages/cvs2svn_lib/changeset_graph.py", line 58, in __init__
self._nodes = [
File "/usr/lib/python2.4/site-packages/cvs2svn_lib/indexed_database.py", line 118, in __getitem__
return self._fetch(offset)
File "/usr/lib/python2.4/site-packages/cvs2svn_lib/indexed_database.py", line 107, in _fetch
return self.serializer.loadf(self.f)
File "/usr/lib/python2.4/site-packages/cvs2svn_lib/serializer.py", line 117, in loadf
return unpickler.load()
MemoryError
An here my memory statistics:
MemTotal: 4017036 kB
MemFree: 1830728 kB
Has anybody an idea, how I could fix this?
I have found a solution:
An upgrade of my python version from 2.4.3 to 2.6.8 has fixed the problem.

Resources