Profiling Google Cloud Dataflow job - google-cloud-dataflow

Profiling Google Cloud Dataflow job - google-cloud-dataflow

What would be the best way to profile a dataflow job if the scale does not permit doing so locally?
In the past, I tried using jstack to check what the Java threads are doing on the worker instances, but that doesn't seem to work for anymore.
Of course I can use stopwatches and log the measured timing data, but I was hoping maybe there is a better way.
Update: The instructions here still seem to work, with the only difference that instead of installing java with apt-get install openjdk-7-jdk, I had to download it from Oracle's site.
Thanks,
GB

As mentioned in the question, you can install jstack if you install the JDK.
We have a Github issue tracking the need for user-code profiling -- check there for progress.

Related

Regarding scikit_learn installation

Hello and thank you for looking at this. I have been using docker to create various multi-arch builds. I have noticed some interesting behavior. When trying to run buildx for image creation, scikit_learn 0.21.3 takes a very long time to download/install (about 2.5 hours). However, when using the regular build command through docker on a single arch, it only takes about 10 minutes or so. The reason I am having to use this specific version of scikit_learn is due to an error I receive from my application where it is unable to find the sklearn.utils.linear_assignment module.
I receive this \nModuleNotFoundError: No module named 'sklearn.utils.linear_assignment_'
The only version I have been able to run is 0.21.3 up to this point. Having said that, I have found more recent versions do install much faster, but again they do not have the linear assigmment module, which is a dependency for my application. Any help or guidance would be greatly appreciated.

Influx CLI driving me really crazy, cannot open the CLI

i am installing the influxdb_2.0.9 on my ubuntu, i followed the instructions here:
https://docs.influxdata.com/influxdb/v2.0/install/?t=Linux
I download it, setup it, i start the influxd daemon, i can connect via browser on localhost:8086, i think i can work with it, its functioning, i am gonna use the python API anyway, but this really drives me crazy...
I can't get to CLI. When i copy it to /usr/local/bin, or anywhere, or when i run it right in the directory via ./influx, it just returns a HELP message, like if i would type ./influx -h
Funny thing is when i download influxdb-client and run its influx binary, it starts the CLI, but its meant on version 1.6.4 and it does not seem to connect ideally on localhost running influxd 2.0.9, mbe i could somehow config it, but it seems to be bad idea anyway.
I purged everything, tried to reinstall everything, even manually deleted all empty directories, anything named influx on my 20.04 Ubuntu is gone and when i try to just follow the instructions online, everything seems to work except the CLI which i really **** need.
Just FYI, if i try to isntall via sudo apt-get install influxdb, influxdb-client -> it works perfectly, but it is older version 1.6.4
I tried installing the deb package via dpkg as well, but no difference... Im running 20.04 Ubuntu. There is the option of running older influxdb(1.6.x), but that I really dont want to do.
Anyone had the same problem? Could not find the solution in last hour online so im trying here to find help. This really drives me crazy though... Thanks for your time. Q.
Thank you guys for your time.

Well, i seem to realize, that the old CLI is gone and not implemented as it used to be, in the new version. In 2.0.0 it was started by
influx repl
But now i found a msg
I have no idea why they decided to deprecate this, not posting that much info about it..
I will try to finish this post by myself, leave it here, maybe someone in the future will find this useful.
Now I will try to 'build the REPL from source', wish me luck lol.
--EDIT
found a clone, closing the thread.
https://github.com/influxdata/influxdb/issues/19986

Compile Tensorflow from source with Docker to get CPU speed up

I am looking for a way to set up or modify an existing Docker image for installing tensorflow that will install it such that the SSE4, AVX, AVX2, and FMA instructions can be utilized for CPU speed up. So far I have found how to install from source using bazel How to Compile Tensorflow... and CPU instructions not compiled.... Neither of these explain how to do this within Docker. So I think what I am looking for is what you need to add to an existing docker image that installs without these options so that you can get a compile version of tensorflow with the CPU options enabled. The existing docker images do not do this because they want the image to run on as many machines as possible. I am using Ubuntu 14.04 on linux PC. I am new to docker but have installed tensorflow and have it working without getting the CPU warnings I get when I use the docker images. I may not need this for speed, but I have seen posts that claim the speed up can be significant. I searched for existing docker images that do this and could not find anything. I need this to work with gpu so needs to be compatible with nvidia-docker.
I just found this docker support for bazel and it might provide an answer, however I do not understand it well enough to know for sure. I believe this is saying that you can not build tensorflow with bazel inside a Dockerfile. You have to build a Dockerfile using bazel. Is my understanding correct and is this the only way to get a docker image with tensorflow compiled from source? If so, I could still use help in how to do it and still get the other dependencies that I would get if using an existing docker image for tensorflow.

Dockerfiles that build with CPU support can be found here.
Hope that helps! Spent many a late night here on Stack Overflow and Github Issues and stuff. Now it's my turn to give back! :)
The GPU stuff in particular is really hairy - especially when enabling the XLA/JIT/AOT stuff as well as the Graph Transform Tools.
Lots of hacks embedded in my Dockerfiles. Feel free to review and ask me questions!

The contributing guidelines mention building TensorFlow from source with Docker to run the unit tests:
Refer to the
CPU-only developer Dockerfile and
GPU developer Dockerfile
for the required packages. Alternatively, use the said
Docker images, e.g.,
tensorflow/tensorflow:nightly-devel and tensorflow/tensorflow:nightly-devel-gpu
for development to avoid installing the packages directly on your system.

How can I preinstall software on travis-ci?

We use travis-ci for continuous integration. I'm troubled by the fact that our build process takes too long (~30 minutes). We depend on several Ubuntu packages which we fetch using apt-get, among others python-pandas.
We also have some of our own debs which we fetch over HTTPS and dpkg install. Finally, we have several pip/pypi requirements, such as Django, Flask, Werkzeug, numpy, pycrypto, selenium.
It would be nice to be able to at least pre-package some of these requirements. Does travis support something like this? How can I prepackage some of these requirements? Is it possible to build a custom travis base VM and start the build from there (perhaps using docker)? Especially the apt-get requirements from the default Ubuntu precise repository as well as the pip requirements should be easy to include.

So while this question is already answered, it's doesn't actually provide a solution path. You can use cache directives in travis to cache your built packages for future travis runs.
cache:
directories:
- $HOME/.pip-cache/
- $HOME/virtualenv/python2.7
install:
- pip install -r requirements.txt --download-cache "$HOME/.pip-cache"
Now your package content is saved for your next travis build. You can similarly store slow-to-retrieve resources in other directories and cache them.

Currently Travis-CI doesn't support such a feature. There are related issues currently open though such as custom VMs, running Docker in an OpenVz container - (Spotify seems to have a somewhat working example links in this issue), using Linux Containers (LXC), using KVM.
Some of those have workarounds mentioned in the issues, I'd give those a try until something more substantial is supported by Travis-CI. I'd also suggest reaching out to Travis-CI support and see if they have any suggestions (maybe there's something coming out soon that could help).

modrails - rogue ruby processes consuming 100% cpu

I'm having ruby instances from mod_rails go "rogue" -- these processes are no longer listed in passenger-status and utilize 100% cpu.
Other than installing god/monit to kill the instance, can anyone give me some advice on how to prevent this? I haven't been able to find anything in the logs that helps.

If you're using Linux, you can install the "strace" utility to see what the Ruby process is doing that's consuming all the CPU. That will give you a good low-level view. It should be available in your package manager. Then you can:
$ sudo strace -p 22710
Process 22710 attached - interrupt to quit
...lots of stuff...
(press Ctrl+C)
Then, if you want to stop the process in the middle and dump a stack trace, you can follow the guide on using GDB in Ruby at http://eigenclass.org/hiki.rb?ruby+live+process+introspection, specifically doing:
gdb --pid=(ruby process)
session-ruby
stdout_redirect
(in other terminal) tail -f /tmp/ruby_debug.(pid)
eval "caller"
You can also use the ruby-debug Gem to remotely connect to debug sockets you open up, described in http://duckpunching.com/passenger-mod_rails-for-development-now-with-debugger
There also seems to be a project on Github concerned with debugging Passenger instances that looks interesting, but the documentation is lacking:
http://github.com/ddollar/socket-debugger/tree/master

I had a ruby process related to Phusion Passenger, which consumed lots of CPU, even though it should have been idle.
The problem went away after I ran
date -s "`date`"
as suggested in this thread. (That was on Debian Squeeze)
Apparently, the problem was related to a leap second, and could affect many other applications like MySQL, Java, etc. More info in this thread on lklm.

We saw something similar to this with very long running SQL queries.
MySQL would kill the queries because they exceeded the long running limit and the thread never realized that the query was dead.
You may want to check the database logs.

This is a recurring issue with passenger. I've seen this problem many times helping people that ran ruby on rails with passenger. I don't have a fix but you might want to try this http://www.modrails.com/documentation/Users%20guide%20Apache.html#debugging_frozen

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart