Monitoring code metrics in a multi-language product

Monitoring code metrics in a multi-language product - time-series

We've got a product that's made up of C++ and Java parts. The C++ stuff is build using make and the java projects are made up of some ant projects and some maven2 projects.
I'm looking for a tool that will help me get useful metrics out of the build system over time. examples include
* Total build time
* C++ project build time
* Java build time
* Number of compiler warnings
* Number of unit tests (run/passed/failed/errors). (Tests are written in cxxTest and JUnit)
* Acceptance test metrics (run/passed/failed/errors)
* Total number of files
* LOC (to keep the managers happy)
There's probably loads of other metrics I could think of, but you get the idea.
Getting these metrics for a once-off report is pretty simple. What I really need is a simple tool that will let me plot these metrics over time.
A simple use case where this would be pretty useful would be compiler warnings as we could see the number of warnings trending towards zero over time. (we can't fix them all at once as it's a pretty big project and we just don't have the time for a big-bang approach). It would also help us quickly spot new warnings as they're introduced.
I've seen this question Monitoring code metrics in Java over longer time period, but I'm looking for something a little more language agnostic
So, to sum up. I'm looking for something that reports metrics over time, that's easily extensible, has a web-based reporting gui and preferably cheap. (not asking for much huh!)
Edit: Just to be clear, we're using CruiseControl as our CI server. I just haven't seen an easy way to add metrics or time-based metrics to it's output. Maybe I'm missing something obvious. I've seem this page about adding custom metrics, but it's a little clunky for me.
Ideally I'd love to write out the metrics to a file in a simple format and have something generate the metrics dynamically. Ideally I'd like to turn something like the output below into a simple chart
Build Id | Build Time | Metric | Value
00000001 10:45 TestPassRate 95
00000001 10:45 BuildTime 300
00000001 10:45 C++BuildTime 200
00000001 10:45 JavaBuildTime 50
00000001 10:45 TestTime 50
00000002 11:45 ......

If you're using the Java CruiseControl you can get the kind of metrics you want easily. You can include arbitrary .xml in the log file with and then reference any of the values in the reporting .jsp pages. That's exactly how the trend chart for PMD, and checkstyle and Javadoc errors is done. From metrics.jsp:
<jsp:useBean id="xpathData" class="net.sourceforge.cruisecontrol.chart.XPathChartData" />
<%
xpathData.add("CheckStyle", "count(/cruisecontrol/checkstyle/file/error)");
xpathData.add("PMD", "count(/cruisecontrol/pmd/file/violation)");
xpathData.add("Javadoc", "count(/cruisecontrol/build//target/task[#name='javadoc']/message[#priority='warn' or #priority='error'])");
%>
<cewolf:chart id="chart" title="Coding violations" type="timeseries" xaxislabel="date" yaxislabel="violations">
<cewolf:data>
<cewolf:producer id="xpathData">
<cewolf:param name="build_info" value="<%=build_info%>" />
</cewolf:producer>
</cewolf:data>
<cewolf:chartpostprocessor id="xpathData" />
</cewolf:chart>
<cewolf:img chartid="chart" renderer="cewolf" width="400" height="300"/>
You can just paste this into the metrics.jsp replace the xpath queries w/the xpath to your metrics and you're good to go.

You're probably looking for a CI server (continuous integration). These servers watch a source repository (CVS, Subversion, whatever) and build all projects that have changed plus all dependent projects.
In our place, we use TeamCity but there are lots more (list on wikipedia)
[EDIT] Most CI servers show some kind of report after the build (how long it took, how many tests ran, etc). All you need to do is trigger a program after the build which fetches this info and saves it in a database.
You could harvest the historical build pages of the CI server but they usually don't reach very far back which is why it's better to save the data in a different place. If you're looking for something simple for the harvesting, try Python with Beautiful Soup. Otherwise, Java and HTTP Client plus jTidy is the way to go.

rrdtool might provide the kind of historical view you're looking for. You just need to get your CI server to dump a build report in the right place each time it runs, and rrdtool can take it from there.

Related

Lead time for changes

I am working on some project where i need to generate lead time for changes per application, per day..
Is there any prometheus metric that provides lead time for changes ? and How we integrate it into a grafana dashboard?

There is not going to be a metric or dashboard out of the box for this, the way I would approach this problem is:
You will need to instrument your deployment code with the prometheus client library of your choice. The deployment code will need to grab the commit time, assuming you are using git, you can use git log filtered to the folder that your application is in.
Now that you have the commit date, you can do a date diff between that and the current time (after the app has been deployed to PRD) to get the lead time of X seconds.
To get it into prometheus, use the node_exporter (or windows_exporter) and their textfile collectors to read textfiles that your deployment code writes and surface them for prometheus to scrape. Most of the client libraries have logic to help you write these files, and even if there is not, the format of the textfiles is pretty easy to use by writing the files directly.
You will want to surface this as a gauge metric, and have a label to indicate which application was deployed. The end result will be a single metric that you can query from grafana or set up alerts that will work for any application/folder that you deploy. To mimic the dashboard that you linked to, I am pretty sure you will want to use the over_time functions.
I also want to note that it might be easier for you to store the deployment/lead time in a sql database/something other than prometheus and use that as a data source into grafana. For applications that do not deploy frequently you would easily run into missing series when querying by using prometheus as a datastore, and the overhead of setting up the node_exporters and the logic to manage the textfiles might outweigh the benefits if you can just INSERT into a sql table.

Debugging slow reads from BigQuery on Google Cloud Dataflow

Background:
We have a really simple pipeline which reads some data from BigQuery (usually ~300MB) filters/transforms it and puts it back to BigQuery. in 99% of cases this pipeline finishes in 7-10minutes and is then restarted again to process a new batch.
Problem:
Recently, the job has started to take >3h once in a while, maybe 2 times in a month out of 2000 runs. When I look at the logs, I can't see any errors and in fact it's only the first step (read from BigQuery) that is taking so long.
Does anyone have a suggestion on how to approach debugging of such cases? Especially since it's really the read from BQ and not any of our transformation code. We are using Apache Beam SDK for Python 0.6.0 (maybe that's the reason!?)
Is it maybe possible to define a timeout for the job?

This is an issue on either Dataflow side or BigQuery side depending on how one looks at it. When splitting the data for parallel processing, Dataflow relies on an estimate of the data size. The long runtime happens when BigQuery sporadically gives a severe under-estimate of the query result size, and Dataflow, as a consequence, severely over-splits the data and the runtime becomes bottlenecked by the overhead of reading lots and lots of tiny file chunks exported by BigQuery.
On one hand, this is the first time I've seen BigQuery produce such dramatically incorrect query result size estimates. However, as size estimates are inherently best-effort and can in general be arbitrarily off, Dataflow should control for that and prevent such oversplitting. We'll investigate and fix this.
The only workaround that comes to mind meanwhile is to use the Java SDK: it uses quite different code for reading from BigQuery that, as far as I recall, does not rely on query size estimates.

Creating a structured Jenkins Failing Test Report

The situation right now:
Every Monday morning I manually check Jenkins jobs jUnit results that ran over the weekend, using Project Health plugin I can filter on the timeboxed runs. I then copy paste this table into Excel and go over each test case's output log to see what failed and note down the failure cause. Every weekend has another tab in Excel. All this makes tracability a nightmare and causes time consuming manual labor.
What I am looking for (and hoping that already exists to some degree):
A database that stores all failed tests for all jobs I specify. It parses the output log of a failed test case and based on some regex applies a 'tag' e.g. 'Audio' if a test regarding audio is failing. Since everything is in a database I could make or use a frontend that can apply filters at will.
For example, if I want to see all tests regarding audio failing over the weekend (over multiple jobs and multiple runs) I could run a query that returns all entries with the Audio tag.
I'm OK with manually tagging failed tests and the cause, as well as writing my own frontend, is there a way (Jenkins API perhaps?) to grab the failed tests (jUnit format and Jenkins plugin) and create such a system myself if it does not exist?

A good question. Unfortunately, it is very difficult in Jenkins to get such "meta statistics" that spans several jobs. There is no existing solution for that.
Basically, I see two options for getting what you want:
Post-processing Jenkins-internal data to get the statistics that you need.
Feeding a database on-the-fly with build execution data.
The first option basically means automating the tasks that you do manually right now.
you can use external scripting (Python, Perl,...) to process Jenkins-internal data (via REST or CLI APIs, or directly reading on-disk data)
or you run Groovy scripts internally (which will be faster and more powerful)
It's the most direct way to go. However, depending on the statistics that you need and depending on your requirements regarding data persistance , you may want to go for...
The second option: more flexible and completely decoupled from Jenkins' internal data storage. You could implement it by
introducing a Groovy post-build step for all your jobs
that script parses job results and puts data of interest in a custom, external database
Statistics you'd get from querying that database.
Typically, you'd start with the first option. Once requirements grow, you'd slowly migrate to the second one (e.g., by collecting internal data via explicit post-processing scripts, putting that into a database, and then running queries on it). You'll want to cut this migration phase as short as possible, as it eventually requires the effort of implementing both options.

You may want to have a look at couchdb-statistics. It is far from a perfect fit, but at least seems to do partially what you want to achieve.

Redis replication without lua

Some information that's important to the question before describe the problems and issues.
Redis lua scripting replicates the script itself instead of
replicating the single commands, both to slaves and to the AOF file.
This is needed as often scripts are one or two order of magnitudes
faster than executing commands in a normal way, so for a slave to be
able to cope with the master replication link speed and number of
commands per second this is the only solution available.
More information about this decision in Lua scripting: determinism,
replication, AOF (github issue)).
Question
Is here is any way or workaround to replicates single commands instead of executing LUA script itself?
Why?
We use Redis as Natural language processing (Multinomial Naive Bayes) application server. Each time you want to learn on new text you should update big list of word weights. The word list with approximately 1,000,000 words in it. Processing time using LUA ~350 ms per run. Processing using separate applicaton server (hiredis based) is 37 seconds per run.
I think about workaround like this:
After computation are done transfer key to other (read only server) with MIGRATE
From time to time save and move RDB to other server and load it my hands.
Is here is any other workaround to solve this?

Yes, in the near future we're gonna have just that: https://www.reddit.com/r/redis/comments/3qtvoz/new_feature_single_commands_replication_for_lua/

SonarQube 5.x performance changes

We currently use SonarQube 4.3.3 on a fairly big project with 30+ developers, 300k LOC, 2200 Unit Tests and it is currently used to help us tracking issues on our Continuous Integration cycle (Merge Requests, etc).
With the increase of custom XPath rules, we're seeing a big increase on the analysis time. Moreover, we currently can't run more than one analysis at the same time (the second one hangs on "waiting for a checkpoint from Job #xxx) so this usually makes things slow. We're seeing build times (Jenkins + JUnit + SonarQube) around 30 minutes. With the release of the 5.x branch, we'd like to know if the effort to update everything is worth the while, so we've got some questions?
Are Java rules faster than the old XPath rules to process?
Will we be able to run more than one analysis at the same time?
I saw on SonarSource's JIRA that the 5.2 release (which will detach the analysis from the DB) is due to the 23rd of September. Will this really happen? Are the users expected to see a big performance increase with this?
We're curious to hear real-world cases on this, so any input will be highly appreciated.
Thanks!

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart