Ansible parse value from stdsout [closed] - parsing

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Im runnning a ansible playbook with some commands to a networkdevice (juniper) to check status.
The output look like this:
"Monitor Failure codes:",
" CS Cold Sync monitoring FL Fabric Connection monitoring",
" GR GRES monitoring HW Hardware monitoring",
" IF Interface monitoring IP IP monitoring",
" LB Loopback monitoring MB Mbuf monitoring",
" NH Nexthop monitoring NP NPC monitoring ",
" SP SPU monitoring SM Schedule monitoring",
" CF Config Sync monitoring",
" ",
"Cluster ID: 1",
"Node Priority Status Preempt Manual Monitor-failures",
"",
"Redundancy group: 0 , Failover count: 0",
"node0 1 secondary no no None ",
"node1 125 primary no no None ",
"",
"Redundancy group: 1 , Failover count: 0",
"node0 1 secondary yes no None ",
"node1 125 primary yes no None"
I want to filter and then do some checks on the segments under Redundancy group: 0 and 1 (the other port is of no intrest. I cant figure out the best way to parse them out, write filter plugin? Or just regexp? In the best of world i would like to sepearate redundacy group 0 and 1 so i can check them separatly. But thats not needed.

If you need some complex processing, then yes – write a filter plugin.
But some basic processing can be done with regular expressions and other filters, see this example:
- debug:
msg: "Name={{ node_name }}, val1={{ node_val_1 }}"
vars:
node_name: "{{ item.split()[0] }}"
node_val_1: "{{ item.split()[1] }}"
regexp: '^Redundancy group: 0[\s\S]+?^$'
new_line: "\n"
loop_control:
label: "{{ node_name }}"
with_items: >
{{ (
(mylines | join(new_line) + new_line)
| regex_search(regexp, multiline=true)
).splitlines()[1:] }}
This searches for range of lines starting with Redundancy group: 0 and ending with empty line, then feed this lines into looped debug module.

Related

Timestamps taken by rosbag play (and rqt_bag) and by rosbag.Bag's read_messages() differ

There is something very strange happening with some rosbags I have.
These rosbags contain messages of type sensor_msgs/Image among other topics.
So I do:
First scenario
On one terminal I run rostopic echo /the_image/header because I am not interested in the actual data, just the header info.
In another terminal I run rosbag play --clock the_bag.bag
With this I get
seq: 7814
stamp:
secs: 1625151029
nsecs: 882629359
frame_id: ''
---
seq: 7815
stamp:
secs: 1625151029
nsecs: 934761166
frame_id: ''
---
seq: 7816
stamp:
secs: 1625151029
nsecs: 986241550
frame_id: ''
---
seq: 7817
stamp:
secs: 1625151030
nsecs: 82884301
frame_id: ''
---
Second Scenario
I do the same as the previous scenario but instead of rosbag play I run rqt_bag the_bag.bag and once there I right click the message to publish them.
With that I get similar values but (I have reported the problem before) the first messages are skipped. (this is not the problem of this question)
Third Scenario
Here comes the weird part. Instead of doing as above I have a python script that does
timestamps=[]
for topic, msg, t in rosbag.Bag("the_bag.bag").read_messages():
if topic == '/the_image':
timestamps.append((image_idx, t))
with open("timestamps.txt",'w') as f:
for idx, t in timestamps:
f.write('{0},{1},{2}\n'.format(str(idx).zfill(8), t.secs, t.nsecs))
So as you can see I open the bag and get a list of timestamps and record it in a text file.
Which gives:
00000000,1625151029,987577614
00000001,1625151030,33818541
00000002,1625151030,88932237
00000003,1625151030,170311084
00000004,1625151030,232427083
00000005,1625151030,279726253
00000006,1625151030,363255375
00000007,1625151030,463079346
00000008,1625151030,501315763
00000009,1625151030,566104245
00000010,1625151030,586694806
As you can see the values are totally different!!!
What could be happening here?
This is a known "issue" with rostopic echo and bag files. I put issue in quotes because it's not necessarily an issue, but just a product of how rostopic works. To spare the somewhat obscure details of the implementation, this problem essentially happens because rospy.rostime does not get initialized correctly when just playing a bag file and echoing that; even if you set /use_sim_time to true.
To give some clarity on what you're seeing, the timestamps coming off your Python script are correct and the rostopic ones are not. If you need the timestamps to be 100% correct with rostopic you can use the -b flag like: rostoic echo -b the_bag.bag /my_image_topic

Missing creation time for LVM volume

One of our storage has an lvm volume group on which multiple logical volumes are not set the creating host name and time (LV Creation host, time attribute).
These were created quite a long time ago, but unfortunately (since the date is missing) I can’t say exactly. It can be many, many years, even 8-10.
Currently, we want to pull it under a Proxmox 6 (shared lvm via fiber, currently used by Proxmox 3), which we can’t do because Proxmox 6 needs creation time.
I couldn't find a command to set an exact time for this.
Can anyone help me how to set the hostname and creation time on an lvm logical volume?
Regards,
Laszlo
sorry, I can't format in a comment.
So, lvdisplay output example ( 1 volume, but there are 32 in total, 5 of which have no creation time):
LV Creation host, time attribute is empty.
Probably, created under proxmox 3 - is it possible that this attribute did not exist then?
--- Logical volume ---
LV Path /dev/stvg1/vm-103-disk-1
LV Name vm-103-disk-1
VG Name stvg1
LV UUID sbSgG8-4nuw-RwG5-skxe-J9e1-OkGo-ImEMfO
LV Write Access read/write
LV Creation host, time ,
LV Status available
# open 0
LV Size 1.50 TiB
Current LE 393216
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:20
and proxmox error message when trying to read lvm volumes
Result verification failed (400)
[9].ctime: type check ('integer') failed - got ''
[8].ctime: type check ('integer') failed - got ''
[4].ctime: type check ('integer') failed - got ''
[7].ctime: type check ('integer') failed - got ''
[10].ctime: type check ('integer') failed - got ''

Configure Prometheus alerting rules combining the status of 2 different instances

I'm trying to configure into Prometheus Alerting Manager an alert that appears when the status of 2 different hosts is down.
To better explain, I have these couples of hosts (host=instance):
host1.1
host1.2
host2.1
host2.2
host3.1
host3.2
host4.1
host4.2
...
and I need an alert that appears when both hosts of the SAME couple are DOWN:
expr = ( icmpping{instance=~"hostX1"}==0 and icmpping{instance=~"hostX2"}==0 )
(I know that the syntax is not correct, I just wanted to underline that X refers to the same number on both icmppingconditions)
Any hint?
The easiest way is perhaps to generate a label at ingestion time reflecting this logic, using relabel_config
relabel_configs:
- source_labels: [host]
regex: ^(.*)\.\d+$
target_label: host_group
It will generate the label you need for matching:
host=host1.1 => host_group=host1
host=host1.2 => host_group=host1
You can then use it for your alerting rules.
sum(icmpping) on(host_group) == 0
If this is not possible, you can use label_replace to achieve the same (only on instant vectors)
sum(label_replace(icmpping,"host_group","$1","host","(.*)\._\\d+")) on(host_group) == 0

Neo4j slow concurrent merges

I have been experiencing some extremely bad slowdowns in Neo4j, and having spent a few days on the issue now, I still can't figure out why. I'm really hoping someone here can help. I've also tried the neo slack support group already, but to no avail.
My setup is as follows: the back-end is a django app that connects through the official drivers (pip package neo4j-driver==1.5.0) to a dockerized Neo4j Enterprise 3.2.3 instance. The data we write is added in infrequent bursts of around 15 concurrent merges to the same portion of the graph, and is triggered when a user interacts with some part of our product (each interaction causing a separate merge).
Each merge operation is the following query:
MERGE (m :main:entity:person {user: $user, task: $task, type: $type,
text: $text})
ON CREATE SET m.source = $list, m.created = $timestamp, m.task_id=id(m)
ON MATCH SET m.source = CASE
WHEN $source IN m.source THEN m.source
ELSE m.source + $list
END SET m.modified = $timestamp
RETURN m.task_id AS task_id;
A PROFILE of this query looks like this. As you can see, the individual processing time is in the ms range. We have tested running this 100+ times in quick succession with no issues. We have a Node key configured as in this schema.
The running system however seems to seize up and we see execution times for these queries hit as high as 2 minutes! A snapshot of the running queries looks like this.
Does anyone have any clues as to what may be going on?
Further system info:
ls data/databases/graph.db/*store.db* | du -ch | tail -1
249.6M total
find data/databases/graph.db/schema/index -regex '.*/native.*' | du -hc | tail -1
249.6M total
ps
1 root 297:51 /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -cp /var/lib/neo4j/plugins:/var/lib/neo4j/conf:/var/lib/neo4j/lib/*:/var/lib/neo4j/plugins/* -server -Xms8G -Xmx8G -XX:+UseG1GC -XX:-OmitStackTraceInFastThrow -XX:+AlwaysPr
printenv | grep NEO
NEO4J_dbms_memory_pagecache_size=4G
NEO4J_dbms_memory_heap_maxSize=8G
The machine is has 16GB total memory and there is nothing else running on it.

Cost of each pipeline job

My team at Moloco runs a lot of Dataflow pipelines (hourly and daily, mostly batch jobs), and from time to time we wish to calculate each pipeline's total cost to identify what improvements we can make to save costs.
In the past few weeks, one of our engineers usually goes to the job monitoring UI webpage (via https://console.cloud.google.com/dataflow?project=$project-name), and manually calculates the cost by looking up the number of workers, worker machine type, total PD and memory used, etc.
Recently, we noticed that now the page shows the "resource metrics" which will help us save our time when it comes to calculating the costs (along with the new pricing model that was announced a while ago).
On the other hand, because we run about 60-80 dataflow jobs every day, it is time consuming for us to calculate the cost per job.
Is there a way to obtain total vCPU, Memory, and PD/SSD usage metrics via API given a job id, perhaps via ''PipelineResult'' or from the log of the master node? If it is not supported now, do you guys plan to in near future?
We are wondering if we should consider writing our own script or something that would extract the metrics per job id, and calculate the costs, but we'd prefer we don't have to do that.
Thanks!
I'm one of the engineers on the dataflow team.
I’d recommend using the command line tool to list these metrics and writing a script to parse the metrics from the output string and calculate your cost based on those. If you want to do this for many jobs, you may want to also list your jobs as well using gcloud beta dataflow jobs list. We are working on solutions to make this easier to obtain in the future.
Make sure you are using gcloud 135.0.0+:
gcloud version
If not you can update it using:
gcloud components update
Login with an account that has access to the project running your job:
cloud auth login
Set your project
gcloud config set project <my_project_name>
Run this command to list the metrics and grep the resource metrics:
gcloud beta dataflow metrics list <job_id> --project=<my_project_name> | grep Service -B 1 -A 3
Your results should be structured like so:
name:
name: Service-mem_mb_seconds
origin: dataflow/v1b3
scalar: 192001
updateTime: '2016-11-07T21:23:46.452Z'
--
name:
name: Service-pd_ssd_gb_seconds
origin: dataflow/v1b3
scalar: 0
updateTime: '2016-11-07T21:23:46.452Z'
--
name:
name: Service-cpu_num
origin: dataflow/v1b3
scalar: 0
updateTime: '2016-11-07T21:23:46.452Z'
--
name:
name: Service-pd_gb
origin: dataflow/v1b3
scalar: 0
updateTime: '2016-11-07T21:23:46.452Z'
--
name:
name: Service-pd_gb_seconds
origin: dataflow/v1b3
scalar: 12500
updateTime: '2016-11-07T21:23:46.452Z'
--
name:
name: Service-cpu_num_seconds
origin: dataflow/v1b3
scalar: 50
updateTime: '2016-11-07T21:23:46.452Z'
--
name:
name: Service-pd_ssd_gb
origin: dataflow/v1b3
scalar: 0
updateTime: '2016-11-07T21:23:46.452Z'
--
name:
name: Service-mem_mb
origin: dataflow/v1b3
scalar: 0
updateTime: '2016-11-07T21:23:46.452Z'
The relevant ones for you are:
Service-cpu_num_seconds
Service-mem_mb_seconds
Service-pd_gb_seconds
Service-pd_ssd_gb_seconds
Note: These metric names will change in the future soon, to:
TotalVCPUUsage
TotalMemoryUsage
TotalHDDPersistentDiskUsage
TotalSSDPersistentDiskUsage

Resources