What snitch am I running? - datastax-enterprise

I recently ran;
sudo nodetool describecluster
and got the following output;
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Which confused me because in cassandra.yaml on each of my nodes, I have the following;
endpoint_snitch: GossipingPropertyFileSnitch
In fact - I can't even see
DynamicEndpointSnitch
as a valid option in the cassandra.yaml file.
Are the two the same thing?
Am I just misinterpreting the output of nodetool?
As always - Thanks!
-Gavin.

Cassandra's dynamic snitching feature wraps the snitch specified in the cassandra.yaml file with the DynamicEndpointSnitch. This snitch sorts endpoints by latency with an adapted phi failure detector thus providing a way to select the highest performing nodes for reads.

Related

How to run a python script in the background on azure

I have a uni project in which I have to run a number of machine learning algorithms like SVM, ME, Naive bayes, etc... and perform a grid search on them, to find the optimal sets of hyper-parameters. Running all these would take an exceedingly long amount of time (48-168 hours total but run- in batches) and considering my computer becomes more or less unusable while I run them, I was attempting to find a solution which allowed me to run my code externally. The scripts I have to run are in python and my plan was to run them on azure to make use of its "Azure for students" $100 credit.
My original plan was to use azure's ml notebook section and then run the python scripts in the terminal they provide. My problem with this route is as far as I can tell, when the browser closes, the computation stops which is a problem. I looked into it, and I found some articles mentioning a combination of 'ctrl-z', 'bg', and 'disown', to disconnect the process from the shell but I thought there should definitely be a better way to do it. (I also wasn't sure how this worked in my case where there were 8 processes running at once using gridsearchcv's n_jobs=-1 feature).
I then realized a better way to do this would be to use pipelines. My intent was to create a number of pipelines of the form:
(Import data in xlsx file) -> (python script to run ML) -> (export data to working directory)
And then run them until all the work is completed. In the first stage I used the parameters,
And I got the error,
My intention was to have the excel file pipe into the python script as a data frame but this implantation (and all the others I've tried) isn't working.
My question first question is, how do I get the excel data to pipe into the python script properly?
My second question is, is there a better way to go about doing this? Would running it on the shell be an easier way to do it? If so, how do ensure it runs while my browser is closed? Are there other services that would be better? My main metrics for this are price (Cheap) and time limit (ability to run for long time) but any suggestions would be greatly appreciated.
I also tried using google colab, this worked but it felt slower than running on my computer.
To run a grid search with AzureML, you would use the Sweep job. The simplest way to kick of a Sweep is via the CLI. See here for an example.
$schema: https://azuremlschemas.azureedge.net/latest/sweepJob.schema.json
type: sweep
trial:
command: >-
python hello-sweep.py
--A ${{inputs.A}}
--B ${{search_space.B}}
--C ${{search_space.C}}
code: src
environment: azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu#latest
inputs:
A: 0.5
compute: azureml:cpu-cluster
sampling_algorithm: random
search_space:
B:
type: choice
values: ["hello", "world", "hello_world"]
C:
type: uniform
min_value: 0.1
max_value: 1.0
objective:
goal: minimize
primary_metric: random_metric
limits:
max_total_trials: 4
max_concurrent_trials: 2
timeout: 3600
display_name: hello-sweep-example
experiment_name: hello-sweep-example
description: Hello sweep job example.
You can start that job using the AzureML v2 CLI with the following command:
az ml job create -f hello-sweep.yml
That will create max_total_trials number of jobs for different parameter combinations as defined in the search_space governed by the sampling_algorithm, which can be random, grid or bayesian.
The actual job that is started is defined under trial. You need a program or script of some sort that you can execute via a command line and that can take parameters via that command line. command is that command that is executed, code is a folder on the local machine that contains the script/program you want to run and environment is a registered environment in your workspace. azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu#latest is one that is predefined in AzureML, but you can also create your own.
If you prefer Python, here is the same thing done in Python.
See here for a blog post on How to do hyperparameter tuning using Azure ML.

Drake Installation Freeze

I am trying to install the python-binding of drake. After make --j it freezes. I believe I have done everything correctly for the previous steps. Can anyone help? I am running on Ubuntu 18.04 with python 3.6.9.
Thank you in advance. It looks like this.
Frozen Terminal
Use make (no -j flag) or make -j1 because bazel (which is called internally during the build) handles the parallelism of the build (and of tests) and will set the number of jobs to the number of cores by default (appears to be 8 in your case).
To adjust the parallelism to reduce the number of jobs to less than the number of cores, create a file named user.bazelrc at the root of the repository (same level as the WORKSPACE file) with the content
test --jobs=N
for some N less than the number of cores that you have.
See also https://docs.bazel.build/versions/master/guide.html#bazelrc.
From the screen shot, it doesn't look like the drake build system is doing anything wrong. But make -j is probably trying to do too many things in parallel. Try starting with -j4 and if it still freezes, go down to 2, etc.
Possibly out of memory..
A hacky solution is to change the CMakeLists.txt file to set the max number of jobs bazel uses by adding --jobs N (where N is the number of jobs you allow concurrently) after ${BAZEL_TARGETS} like so
ExternalProject_Add(drake_cxx_python
SOURCE_DIR "${PROJECT_SOURCE_DIR}"
CONFIGURE_COMMAND :
BUILD_COMMAND
${BAZEL_ENV}
"${Bazel_EXECUTABLE}"
${BAZEL_STARTUP_ARGS}
build
${BAZEL_ARGS}
${BAZEL_TARGETS}
--jobs 1
BUILD_IN_SOURCE ON
BUILD_ALWAYS ON
INSTALL_COMMAND
${BAZEL_ENV}
"${Bazel_EXECUTABLE}"
${BAZEL_STARTUP_ARGS}
run
${BAZEL_ARGS}
${BAZEL_TARGETS}
--
${BAZEL_TARGETS_ARGS}
USES_TERMINAL_BUILD ON
USES_TERMINAL_INSTALL ON
)

Can Telegraf combine/add value of metrics that are per-node, say for a cluster?

Let's say I have some software running on a VM that is emitting two metrics that are fed through Telegraf to be written into InfluxDB. Let's say the metric are no. successfully handled HTTP requests (S), and no. of failed HTTP requests (F), on that VM. However, I might configure three such VMs each emitting those 2 metrics.
Now, if I would like to have a computed metric which is the sum of S from each VM, and sum of F from each VM, and store as new metrics, at various instants of time. Is this something that can be achieved using Telegraf ? Or is there a better, more efficient, more elegant way ?
Kindly note that my knowledge of Telegraf and InfluxDB are theoretical, as I've recently started reading up about them, so I have not actually tried any of the above, yet.
This isn't something telegraf would be responsible for.
With Influx 1.x, you'd use a TICKScript or Continuous Queries to calculate the sum and inject the new sampled value.
Roughly, this would look like:
CREATE CONTINUOUS QUERY "sum_sample_daily" ON "database"
BEGIN
SELECT sum("*") INTO "daily_measurement" FROM "measurement" GROUP BY time(1d)
END
CQ docs

How to transform "Tags Values" in Telegraf

How can I transform the Tag Values in Telegraf?
I am trying to import Web access logs into InfluxDB with Telegraf. However, some of the URL PATHs include identifiers (session IDs, product IDs, etc).
I need to search and aggregate per path type (ids excluded), therefore, I can't(?) have them vary like that.
In the input plugin "logparser" I can use a grok extraction pattern but I can't do transformations of the values extracted that I know of.
And the only processor plugin (in between Input and Output) is merely a "printer".
I can't find any clean way of doing this with Telegraf. Maybe I could do some gymmics with Telegraf (multiple Grok parsers + ex/inclusions?) but after some quite extensive attempts I didn't manage to make anything work - it appeared quite fiddly.
This is only half an answer but:
I managed to achieve what I was trying with LogStash instead, outputting to InfluxDB (LogStash has its own output plugin to InfluxDB). Not as desirable, since now I'm having to run both Telegraf + LogStash but it's working.
I've created a feature request on Telegraf's GitHub:
https://github.com/influxdata/telegraf/issues/2667

Why does Riak store all my documents on only one node? n_val is equal to 3

I have built a 5-node cluster using Riak 2.0pre11 on EC2 servers. Installed Riak, got it working, then repeated the same actions on 4 more servers using a bash script. At that point I used riak-admin cluster join riak#node1.example.com on nodes 2 thru 5 to form a cluster.
Using the Python Riak client I wrote a script to send 10,000 documents to Riak. Works fine and I can wrote another script to retrieve a doc which worked fine. Other than specifying the use of protobufs I haven't specified any other options when storing keys. I stored all the docs via a connection to node1.
However Riak seems to be storing all 3 replicas on the same node, in other words the storage used on node1 is about 3x the original HTML docs.
The script connected to node 1 and that is where all docs are stored. I changed the script to connect to node 2 and send 10,000 more which also all ended up in node 1. I used the command du -h /data/riak/bitcask to verify the aggregate stored size of the objects. On nodes 2 thru 4 there is only a few K which is the overhead of an empty Bitcask datastore.
For each document I specified the key similar to this
http://www.example.com/blogstore/007529.html4787somehash4787947:2014-03-12T19:14:32.887951Z
The first part of all keys are identical (testing), only the .html name and the ISO 8601 timestamp are different. Is it possible that I have somehow subverted the perfect hashing function?
Basically I used a default config. What could be wrong? Since Riak 2.0 uses a different config format, here is a fragment of the generated config for riak-core in the old format:
{riak_core,
[{enable_consensus,false},
{platform_log_dir,"/var/log/riak"},
{platform_lib_dir,"/usr/lib/riak/lib"},
{platform_etc_dir,"/etc/riak"},
{platform_data_dir,"/var/lib/riak"},
{platform_bin_dir,"/usr/sbin"},
{dtrace_support,false},
{handoff_port,8099},
{ring_state_dir,"/datapool/riak/ring"},
{handoff_concurrency,2},
{ring_creation_size,64},
{default_bucket_props,
[{n_val,3},
{last_write_wins,false},
{allow_mult,true},
{basic_quorum,false},
{notfound_ok,true},
{rw,quorum},
{dw,quorum},
{pw,0},
{w,quorum},
{r,quorum},
{pr,0}]}]}
If the bitcask directory only grows on a single node, it sounds like the nodes might not be communicating. Please run riak-admin member-status to verify that all nodes in the cluster are active.
Once you have issued the riak-admin cluster join <node> commands on all the nodes joining the cluster, you will also need to run riak-admin cluster plan to verify that the plan is correct before committing it using riak-admin cluster commit. These commands are described in greater detail here..

Resources