So, using Datadog is proving to be quite difficult when attempting to parse some of the data we have.
For example, imagine we have a log that has the properties
Type
Count
Cat
3
Dog
4
How do we get a Datadog dashboard to appreciate this data? I have full control over how the data is added to the log but I don't know how to iterate over it or parse it in a meaningful manner
For example, it may come out like
Animals:[{"Cat":3},{"Dog":4}]
Related
My goal is to render time-series data from set locations on a map. Essentially, I have about 30 predefined (static) locations in Switzerland from which I will be receiving real-time data. The data itself is relatively simple, just the signal/noise ratio of the signal we're receiving, which should be updated every few seconds or every minute. I am using InfluxDB as my database. Are there any specific setups I should be using for this kind of visualization?
My first question is: is it best to use the worldmap panel or the geomap panel at this time? I seem to be finding more information/documentation on the worldmap panel even though i have also read that geomap is (or at least will be) its replacement.
Second, I assume that since I'm using time-series data, that I should be using the Time-Series format, and not the Table format. However, I have not been able to render any data points using the time-series feature, even by following the simplest of examples in your documentation. The best I can do is use the Table feature, and internally remove previous points from my database at every iteration (so that multiple points aren't rendered at the same time for each location). Here are two screenshots of when I'm able to render data on the geomap using the Table format, and then after switching to Time-Series format that the points are no longer there (note that I have the same problem with the Worldmap application as well).
I'm able to render data using the Table method:
...but not using time series:
Thanks for any help!
For rendering timeseries data on the geomap, you must convert your lat/long fields to a single geohash field. You'll have to do that prior to inserting the lat/longs into influxDB
See this answer
I've been working on building a data analysis sheet, which is quite verbose at the moment and a bit more complicated than it should be as I've been trying to figure this out. Please note, I work doing student data in a school.
Basically, I have two sets of input data:
Data imported from a CSV file that includes test data and codes for Common Core Standards and the questions tied to those standards as a whole class summary
Data imported from a CSV file that includes individual scores by question
I am looking to construct 2 views:
A view that collates and displays data of individual standards per student that includes a dropdown to change the standard allowing a teacher to see class performance by standard in a broad view. The drop-down is populated dynamically from the input data (so staff could eventually dump data and go directly to reports)
A view that collates and displays data of individual students broken down by performance on each standard allowing a teachers to see the broader spectrum for each student. The student drop-down is populated from Source list 2.
I have been able to build the first view, but am struggling with the second. I've been able to separate the question codes and develop strings of cell references to the scoring data, including a dynamic reference to the row the selected student's score data appears on in the second source set from above.
I tried to pass through an indirect() formula into a sum() so as to process for a mean evaluation, and have encountered errors. I think SUM() doesn't process comma-separated cell reference lists from Indirect() [or in general] or there is something that I am missing to help parse it. Here is the formula I have tried:
=Sum(vlookup(D7,CCCodeManip!$A:$C,3,false))
CCCodeManip!C:C includes the created text (based on the dynamic standards and question codes, etc), here's an example of what would be found there:
'M-ADI'!M17, 'M-ADI'!N17, 'M-ADI'!O17, 'M-ADI'!P17, 'M-ADI'!Q17, 'M-ADI'!R17, 'M-ADI'!J17
I need these to be dynamic so that teachers can input different sets of standards, question, and student data and the sheet automatically collates and reports it in uniform ways (with an upward bound of 20 standards as I currently have it built)
Here is a link to the sheet I built, with names and ID anonymized. There's a CRAP TON of sub-tabs, and that's really just being able to split apart and re-combine data neatly without things error-ing out due to data overlapping, aside from a few different attempts and different approaches to parse the cell reference strings.
The first two tabs are the current status of the data views. I plan to hide a bunch of the functional stuff that is there to help pull data accurately.
The 3rd and 4th tab are the source data sets. 5th is a modified version of source data that allows me to reference things better, and I've tried to arrange the sheets most relevant towards the front of the set.
https://docs.google.com/spreadsheets/d/1fR_2n60lenxkvjZSzp2VDGyTUO6l-3wzwaV4P-IQ_5Y/edit?usp=sharing
Some have a different approach? I am aware that I might be as far as I cn go with this and perhaps should consider scripts - my coding experience is a bit out of date and my strength is more with the formulas, but I can dig into things with some direction, if anyone can help.
Ok so I noticed something.
It seems the failure is in the indirect reference:
=indirect(CCCodeManip!C3)
The string I am trying to parse via indirect is going to be generated into something like this, dynamic from reference to other data:
'M-ADI'!M17, 'M-ADI'!N17, 'M-ADI'!O17, 'M-ADI'!P17, 'M-ADI'!Q17, 'M-ADI'!R17, 'M-ADI'!J17
The indirect returns the error that the above string is not a cell reference with the #REF code.
Can someone give me a clue as to what is causing this? I am going to dig into the docs on Indirect() from google and will post anything that I find.
Perhaps it is that indirect() can't handle lists, but only specific references and arrays, which may require me a to build a sheet to do the SUM formula on for each question set (?)
So I think I figured it out, but i Ended up parsing the data differently, basically doing the sum based on individual cell references and a separate sum formula, bypassing the need to do it all at once, it jsut makes my sheets a lot dirtier! I am eventually going to see if code could do it better if I need to, but this is closed for now.
Basically, I did individual cell references to recall scores in a row, then used a separate SUM formula, and created references / structures to be able to pull those sum() results. Achieves the same end, but with extra crap on the sheet.
I've just started using SPSS after using R for about five years (I'm not happy about it, but you do what your boss tells you). I'm just trying to do a simple count based on a categorical variable.
I have a data set where I know a person's year of birth. I've recoded into a new variable so that I have their generation as a categorical variable, named Generation. I also have a question that allows for multiple responses. I want a frequency of how many times each response was collected.
I've created a multiple response variable (analyze>multiple response > Define variable sets). However, when I go to create crosstabs, the Generation variable isn't an option to select. I've tried googling, but the videos I have watched have the row variables as numeric.
Here is a google sheet that shows what I have and what I'm looking to achieve:
https://docs.google.com/spreadsheets/d/1oIMrhYv33ZQwPz3llX9mfxulsxsnZF9zaRf9Gh37tj8/edit#gid=0
Is it possible to do this?
First of all, to double check, when you say you go to crosstabs, is this Analyze > Multiple Response > Crosstabs (and not Analyze > Descriptive Statistics > Crosstabs)?
Second, with multiple response data, you are much better off working with Custom Tables. Start by defining the set with Analyze > Custom Tables > Multiple Response Sets. If you save your data file, those definitions are saved with it (unlike the Mult Response Procedure).
Then you can just use Custom Tables to tabulate mult response data pretty much as if it were a regular variable, but you have more choices about appropriate statistics, tests of significance etc. No need in the CTABLES code to explicitly list the set members.
Try CUSTOM TABLES, although this is an additional add-on modules that you need to have a licence for:
CTABLES /TABLE Generation[c] by (1_a+ 1_b + 1_c)[s][sum f8.0 'Count'].
I am able to create a time-series "hash" using the statistics gem:
=> #<OrderedHash {"2010-10-23"=>2, "2010-09-22"=>3, "2010-09-11"=>1, "2010-08-27"=>1, "2010-10-15"=>
1, "2010-09-15"=>1, "2010-08-08"=>2, "2010-10-17"=>14, "2010-10-06"=>2, "2010-09-28"=>1, "2010-10-19
"=>1, "2010-09-20"=>1}>
I want to create a simple graph with this data -- I am trying to use the Seer gem but am confused -- it looks like rather than passing in a series, you pass in a method and it runs it live based on the data.
Is there a way I can take data starting with a hash and display it?
I have used highcharts. It is a javascript library but it's really easy to use. Use ruby to get your statistics and then display them with this library.
It does not matter if the data is in a hash or an array. It depends what kinda graph you want. Looking at your data you could just create a line graph. the x-axis would be the time and the y-axis would be the amount.
Then you can just loop over the hash and for each value have the key as the y-value and the value as the x-value.
You will have to create a different hash for each type of data you wanted to store.
Google Charts (http://code.google.com/apis/chart/) is another option, although I have to say that HighCharts looks pretty impressive.
I have a lot of logfile data that I want to display dynamic graphs from, for basically arbitrary time periods, optionally filtered or aggregated by different columns (that I could pregenerate). I'm wondering about the best way to store the data in a database and access it for displaying charts, when:
the time resolution should be variable from one second to a year
there are entries that span several 'time buckets', e.g. a connection might have been open for a few days and I want to count and display the user for every hour she was connected, not just in the hour 'slot' the connection was created or finished
Are there best practices, or tools/plugins for rails that help handle this kind and amount of data? Are there maybe database engines specifically tailored towards this, or having helpful functions (e.g. CouchDB indexes)?
EDIT: I'm looking for a scalable way to handle this data and access pattern. Things we considered: Run a query for each bucket, merge in app - probably way too slow. GROUP BY timestamp/granularity - does not count connections correctly. Preprocessing data into rows by smallest granularity and downsampling on query - probably the best way.
I think you can use mysql timestamps for this.
The way I solved it in the end was to pre-process the data into per-minute buckets, so there's one row for every event and minute. That makes it easy and fast enough to select and yields correct results. To get different granularity, you can do integer arithmetic on the timestamp columns - select abs(timestamp/factor)*factor and group by abs(timestamp/factor)*factor.