Embedding IPython REPL in Docker - docker

Following this wonderful article, I'm trying to use the IPython REPL for debugging my Flask app. The idea is that you run import IPython; IPython.embed() at a point where you want to take a look around the state of your projects.
I'm developing my app in a Docker container to make it easier to run with other services. I tried inserting this line into a views.py function like so:
#page.route('/', methods=['GET', 'POST'])
def index():
form = SearchForm()
if form.validate_on_submit():
results = request.form.get('search')
import IPython; IPython.embed()
return render_template('page/index.html', form=form, results=results)
return render_template('page/index.html', form=form)
When a valid POST request is made through the form, I see the following output from Docker:
website_1 | IPython 8.4.0 -- An enhanced Interactive Python. Type '?' for help.
website_1 | In [1]: Do you really want to exit ([y]/n)?
Then I see gunicorn logging the POST and GET requests. It would seem docker automatically shuts down IPython and continues to render_template.
I'm wondering if there is anyway to get this to work as an actual breakpoint as described in the article. I'd love to be able to take a look around my code this way. Thanks in advance for any advice.


nltk.download('punkt') giving output as false

When I trying to install nltk and download the file punket using nltk.download('punkt').
I am getting the following errors. Have tried many alternative codes and changing networks.
Please help with this error.
Post applying :-
= df['num_words'] = df['text'].apply(lambda x:len(nltk.word_tokenize(x)))
I am gettring the error:-
**Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
import nltk
For more information see: https://www.nltk.org/data.html
Attempted to load tokenizers/punkt/english.pickle**
I tried some alternative codes like
import nltk
import ssl
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
ssl._create_default_https_context = _create_unverified_https_context
Also tried changing the networks as at some places I found it is saying server issue.
Try to launch the jupyter notebooks session as administrator (open the command or anaconda prompt as administrator).
The last option would be to download the corpus manually. You may find this, helpful in your case.

how to set up proper printout destination for dask multiprocessing in jupyter notebook on linux

I am using dask in jupyter notebook on a linux server to run python functions on multiple CPUs. The python functions have standard print statement. I would like the output of the print to be shown in the jupyter notebook right below the cell. However, the print out were all shown in the console. Can anyone explain why this happens and how to make dask.function.print output to the notebook, or both the console and the notebook.
The following is a simplified version of the problem:
import dask
import functools
from dask import compute, delayed
def iFunc(item):
# call this function itself will print normally to the
# notebook below the cell, desired.
with dask.config.set(scheduler='processes',num_workers=2):
ret=compute([delayed(func1)(item) for item in iter_list])
# surprisingly, Meme 0, Meme 1 only print out to the console,
# not the notebook, Not desired, hard to debug. Any clue?
The whole point of dask is leveraging multiple threads, processes, or nodes/machines to distribute work. The workers you create are therefore not on the same thread as your client, and may not be on the same process, or even the same machine (or like, in the same country) as your client, depending on how you set up your cluster.
If you start a LocalCluster from your jupyter notebook, whether you're using threads or processes, you should see printed output appearing as output in the cells which execute jobs on the workers:
In [1]: import dask.distributed as dd
In [2]: client = dd.Client(processes=4)
In [3]: def job():
...: print("hello from a worker!")
In [4]: client.submit(job).result()
hello from a worker!
However, if a different process is spinning up your workers, it is up to that process to decide how to handle stdout. So if you're spinning up workers using the jupyterlab terminal, stdout will appear there. If you're spinning up workers in a kubernetes pod, stdout will appear in the worker logs. Dask doesn't actively manage standard out, so it's up to you to handle this. Note that this also applies to logging - neither stdout nor logs are captured by dask. This is actually a really important design feature - many distributed systems have their own systems for managing the standard out & logging of nodes, and dask does not want to impose its own parallel/conflicting system for handling output. The main focus of dask is executing the tasks, not managing a distributed logging system.
That said, dask does have the infrastructure for passing around messages, and this is something the package could support. There is an open issue and pull request attempting to add this ability as a feature, but it looks like there are a lot of open design questions that would need to be resolved before this could be added. Many of them revolve around the issues I raised above - how to add a clean distributed logging feature without overburdening the scheduler, complicating the already complex set of configuration options, or overriding the important, existing logging systems users rely on. The dask core team seems to agree that this is a good idea, if the tough design questions can be resolved.
You certainly always have the option of returning messages. For example, the following would work:
In [10]: def job():
...: return_blob = {"diagnostics": {}, "messages": [], "return_val": None}
...: start = time.time()
...: return_blob["diagnostics"]["start"] = start
...: try:
...: return_blob["messages"].append("raising error")
...: # this causes a DivideByZeroError
...: return_blob["return_val"] = 1 / 0
...: except Exception as e:
...: return_blob["diagnostics"]["error"] = e
...: return_blob["diagnostics"]["end"] = time.time()
...: return return_blob
In [11]: client.submit(job).result()
{'diagnostics': {'start': 1644091274.738912,
'error': ZeroDivisionError('division by zero'),
'end': 1644091274.7389162},
'messages': ['raising error'],
'return_val': None}

Is it possible to run a Python program within a Java "GraalVM" program?

From the GraalVM examples, they have code like this to run a single line of Python code:
context.eval("python", "\nprint('Hello polyglot world Python!');");
Yes that works fine in a Java program.
I can also run a Python program from the command line using the "graalpython" program.
My question is how do I run a python program from the Java example I mentioned above?
context.eval("python", "\nprint('Hello polyglot world Python!');");
I tried using the "file:" argument, but that didn't work or I'm doing something wrong.
For example, this did not work:
context.eval("python", "file: /path_to_python/test.py");
This line of code gives me:
Original Internal Error:
java.lang.RuntimeException: not implemented
So, maybe that answers my question, but I have to believe you can run a python script from a GRAAL program like you can a single line of code. Hence, this posting.
Is running a python program from within a Java program using graal "eval" supported? If so, I would very much appreciate an example of usage.
Thanks very much.
You need to build a Source object in order to eval a file:
File file = new File("/path_to_python/test.py");
Source source = Source.newBuilder("python", file).build();

Use stderr in lua io.popen to determine faulty function call

I'm making a function that can read the metadata of the current song playing in spotify. This is being programmed in lua since it is an implementation for awesome wm. I got the following line to get all the metadata that I can later use.
handle = io.popen('qdbus org.mpris.MediaPlayer2.spotify /org/mpris/MediaPlayer2 org.mpris.MediaPlayer2.Player.Metadata | awk -F: \'{$1=\"\";$2=\"\";print substr($0,4)}\'')
However when Spotify is not running I don't get the expected information and qdbus writes an error to the stderr stream. I wanted to use the fact that qdbus writes to the error stream to determine a fault and stop the program there. (This should also catch any other errors not related to wheter spotify is running or not)
My understanding is that lua popen uses popen3 that can subdivide between stdout and stderr. but all my efforts so far are fruitless and my error stream is always empty. Is it possible to check for a non nil value in the stderr in order to determine a faulty call to qdbus (or awk)?
I think you can redirect stderr to stdout in the call to popen like this:
handle = io.popen("somecommand 2>&1")
If you want to differentiate stderr and stdout, you cannot do it with the io library but you can with luaposix. See this answer for instance.
You can checkout juci.exec which I wrote for JUCI webgui. I struggled with the same problem and I ended up using luaposix for this kind of thing when I really need two separate streams. My implementation also gives you the program exit code which is good for testing for errors: https://github.com/mkschreder/juci/blob/master/juci/lua/core.lua

How to monitor elasticsearch using nagios

I would like to monitor elasticsearch using nagios.
Basiclly, I want to know if elasticsearch is up.
I think I can use the elasticsearch Cluster Health API (see here)
and use the 'status' that I get back (green, yellow or red), but I still don't know how to use nagios for that matter ( nagios is on one server and elasticsearc is on another server ).
Is there another way to do that?
I just found that - check_http_json. I think I'll try it.
After a while - I've managed to monitor elasticsearch using the nrpe.
I wanted to use the elasticsearch Cluster Health API - but I couldn't use it from another machine - due to security issues...
So, in the monitoring server I created a new service - which the check_command is check_command check_nrpe!check_elastic. And now in the remote server, where the elasticsearch is, I've editted the nrpe.cfg file with the following:
command[check_elastic]=/usr/local/nagios/libexec/check_http -H localhost -u /_cluster/health -p 9200 -w 2 -c 3 -s green
Which is allowed, since this command is run from the remote server - so no security issues here...
It works!!!
I'll still try this check_http_json command that I posted in my qeustion - but for now, my solution is good enough.
After playing around with the suggestions in this post, I wrote a simple check_elasticsearch script. It returns the status as OK, WARNING, and CRITICAL corresponding to the "status" parameter in the cluster health response ("green", "yellow", and "red" respectively).
It also grabs all the other parameters from the health page and dumps them out in the standard Nagios format.
Shameless plug: https://github.com/jersten/check-es
You can use it with ZenOSS/Nagios to monitor cluster health, data indices, and individual node heap usage.
You can use this cool Python script for monitoring your Elasticsearch cluster. This script check your IP:port for Elasticsearch status. This one and more Python script for monitoring Elasticsearch can be found here.
from nagioscheck import NagiosCheck, UsageError
from nagioscheck import PerformanceMetric, Status
import urllib2
import optparse
import json
except ImportError:
import simplejson as json
class ESClusterHealthCheck(NagiosCheck):
def __init__(self):
self.add_option('H', 'host', 'host', 'The cluster to check')
self.add_option('P', 'port', 'port', 'The ES port - defaults to 9200')
def check(self, opts, args):
host = opts.host
port = int(opts.port or '9200')
response = urllib2.urlopen(r'http://%s:%d/_cluster/health'
% (host, port))
except urllib2.HTTPError, e:
raise Status('unknown', ("API failure", None,
"API failure:\n\n%s" % str(e)))
except urllib2.URLError, e:
raise Status('critical', (e.reason))
response_body = response.read()
es_cluster_health = json.loads(response_body)
except ValueError:
raise Status('unknown', ("API returned nonsense",))
cluster_status = es_cluster_health['status'].lower()
if cluster_status == 'red':
raise Status("CRITICAL", "Cluster status is currently reporting as "
elif cluster_status == 'yellow':
raise Status("WARNING", "Cluster status is currently reporting as "
raise Status("OK",
"Cluster status is currently reporting as Green")
if __name__ == "__main__":
I wrote this a million years ago, and it might still be useful: https://github.com/radu-gheorghe/check-es
But it really depends on what you want to monitor. The above measures:
if Elasticsearch responds to HTTP
if ingestion rate drops under the defined levels
if total number of documents drops the defined levels
But of course there's much more that might be interesting. From query time to JVM heap usage. We wrote a blog post about the most important ones here: https://sematext.com/blog/top-10-elasticsearch-metrics-to-watch/
Elasticsearch has APIs for all these, so you may be able to use a generic check_http_json to get the needed metrics. Alternatively, you may want to use something like Sematext Monitoring for Elasticsearch, which gets these metrics out of the box, then forward threshold/anomaly alerts to Nagios. (disclosure: I work for Sematext)
