Machine learning - Helmholtz Machine implementation [closed] - machine-learning

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I am looking for a implementation of the Helmholtz Machine.
References:
http://www.gatsby.ucl.ac.uk/~dayan/papers/hm95.pdf
http://www.cs.toronto.edu/~hinton/absps/helmholtz.pdf
I am looking for open source or free implementations. I have preferences for Java implementations, but implementations in other languages (c, C++, c# or Python, mainly) will help me.
In my search in the web i have found only abstrac descriptions of this approach, withou any concrete implementation. My hope is found any expert in the subject that have more information about.

Deeplearning4j is an open-source implementation of various deep-learning machines that Hinton might classify as "Helmholtz." http://deeplearning4j.org/

I have had a quick look at this on the link you gave.
I have been "working on my own with a small team on AI sentience" ..... since 1968 !!!
My thoughts are as follows:
All events happen in a "time series".
There is a past time series that has a probability "high" as far as the sentient observer is concerned.
There is a future "predicted" time series predicted ahead on the "best" (time series) model the sentient observer can create and as the time series disappears into the future the probability of that time series "becoming the past time series" diminishes down towards zero and that could occur in milliseconds or in billions of years - depending on the model dynamics.
I do not think there ever is "a present time".
Unfortunately - after studying Kalman Filters and Predictors and utilising them in missile targeting I have concluded that the whole "topic" of "mathematically representing" the best algorithms (i.e. models) that humans could come up with was a waste of time as even the simplest "program" is doing a task that could not be represented by mathematical symbols... and so I have concluded that "computer algorithms" "ARE" mathematical formulas ... i.e. formulas that normal symbolic mathematics does not have the tools to describe (i.e. programs are superior to a complex mathematical systems of notation).
Mathematics is fine for "proofs" and "big statistical ideas" but ... (and i am getting near the end now) ... i would "trust" your own instincts to create a "model" that predicts the future best .... i.e it might have to have the concept of "on alternate Wednesdays in the US", in it ... and also thousands of other such non-mathematical "states" or various "axioms" ... which is fine !
so how you ask could this be mathematically correct !!!!
Well the answer is quite simple really >> the best model - is the best model at predicting the future !
And the future keeps popping up surprisingly often - and so it's easy to test - and keep testing !
All you need to know that you have the best "mathematics" (i.e. program) is to see how much "noise" or "deviation from prediction" exists in the prediction vs the actual outcome in the time series.
"State-Space" is the best "maths" to use for this ... i.e. assume that there is an "underlying state" and then assume that your "observations" are just flawed "noisy or just wrong" observations of that underlying state - i.e. the system output signals are "somehow" based on these "invisible" internal system states.
There is an AI sentience "computer language" called MTR that we created (mainly in the 1980's) which is designed for this sort of dynamic model creation - but the down side for us (humans) is that it is designed for IA entities to use and not humans although we are going to put a "Pascal Like" front end onto it soon to allow normal humans to use it. IBM, Intel, GCHQ, MOD, DOD etc all had licences - but we then shelved it !
We intend to re-start the project soon.
Anyway, that's what i think - i hope it is not too abstract for your purposes !
We could say ... (and in this i am joking) .... that programmers that try to use "pure mathematics" to write programs "have the horns by the bull" ?
So hopefully programmers can be much more relaxed when they do not to understand the entirety of all the maths !!!
I hope that thought might also help any "non-maths" readers .... of this response.

Related

Where to begin for basic machine algorithms for, say, document recognition and organization?

Pardon if this question is not appropriate. It is kind of specific and I am not asking for actual code but moreso guidance on whether or not this task is worth undertaking. If this is not the place, please close the question and kindly point me in the correct direction.
Short background: I have always been interested in tinkering. I used to play with partitions and OS X scripts when I was younger, eventually reaching basic-level "general programming" aptitude before my father prohibited my computer usage. I am now going to law school and working at a law firm but I love development and I want to implement more tech innovation in the field.
Main point: At our firm, we have a busy season every year from mid march to the first week of april (immigration + H1B deadline). We receive a lot of documents and scanned files that need to be verified, organized, and checked.
I added (very) simple lines of code to our online platform to help in organization; basically, I attached tags to all incoming documents, and once they were verified, the code would organize them by tag (like "identification doc", "work experience doc" etc.). This would my life much easier every year, as I end up working 100+ hour weeks this season.
I want to take this many steps further with an algorithm that can check for signatures and data mismatches between documents and ultimately organize the documents so they are ready to print. Eventually, I would like to maybe even implement machine learning and a very basic neural network to automate the whole mind-numbing and painful process...
Actual Question(s): I just wanted to know the best way for me to proceed or get started. I know a decent amount of python and java, and we have an online platform already with the documents. What other resources would you recommend in terms of books, videos, or even classes? Is there a name for this kind of basic categorization? Can I build something like this through my own effort without an advanced degree?
Stupid and over-dramatic epilogue: Truth be told, a part of me feels like I wasted my life thus far by not pursuing what I knew I loved at the age of 12. This is my way of making amends I guess, and if I can do this then maybe I can keep doing it in law and beyond...
You don't give many specifics about the task but if you have a finite number of forms in digital form as images, then this seems very possible.
I have personally used OpenCV with Python a lot and more complex machine learning tasks have become increasingly simple in the past 10 years.
Take for example object detection (e.g. 1, 2) to check whether there is anything in a signature field or try extracting the date from an image (e.g 1, 2).
I would suggest you start with the simplest thing that would improve your work. A small and easy task will let you build up your knowledge on how to do things.

Concept Based Text Summarization (Abstraction) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I am looking for an engine that does AI text summarization based on the concept or meaning of the sentence, I looked at open-source projects like (ginger, paraphrase, ace) but they don't do the job.
The way they work is that they try to find synonyms for each word and replace with the current words, this way they generate alot of alternatives to a sentence but the meaning is wrong most of the times.
I have worked with Stanford's engine to do something like highlights to an article and based on that extract the most important sentences, but still this is not abstraction, its extraction.
It would also make sense that the engine I'm looking for learns over time and results are improved after each summary.
Please help out here, your help is greatly appreciated!
I don’t know any open source project which fits your requirements about abstraction and a meaning as I assume.
But I have an ideas how to build such engine and how to train it.
In a few words I think we all keep in mind some Bayesian-network like structure in our minds, with helps us not only to classify some data, but also to form an abstract meaning about text or message.
Since it is impossible to extract all that abstract categories structure from our mind I think it’s better to build mechanism which allow as to reconstruct it step-by-step.
Abstract
The key idea of the proposed solution is in the extraction of meaning of a conversation using approaches which easier in operation with it from an automated computer system. This will allow creating the good level of illusion of real conversation with another person.
Proposed model supports two levels of abstraction:
First of them, less complex level consists in the recognition of groups of words or a single word as a group which related to the category, instance or to the instance attribute.
Instance means instantiation from the general category of the real or abstract subject, object, action, attribute or other kind of instances. As an example – concrete relation between two or more subjects: concrete relations between employer and employee, concrete city and country where it’s situated and so on.
This basic meaning recognition approach allows us to create bot with ability sustain a conversation. This ability based on recognition of basic elements of meaning: categories, instances and instances attributes.
Second, the most complicated method based on scenario recognition and storing them into the conversation context with instances/categories as well as using them for completion some of recognized scenarios.
Related scenarios will be used to complete the next message of the conversation as well as some of scenarios can be used to generate the next message or for recognizing meaning element by using of conditions and by using meaning elements from the context.
Something like that:
Basic classification should be entered manually and with future correction/addition of the teachers.
Words from sentence in conversation and scenarios from sentence can be filled from context
Conversation scenarios/categories can be fulfilled by previously recognized instances or with instances described in future conversation (self-learning)
Pic 1 – word detection/categorization basically flow vision
Pic 2 – general system vision big picture view
Pic 3 - meaning element classification
Pic 4 – basically categories structure could be like that

How to evaluate a device to the relation with theories such as: "senses (Visual, Auditory, Haptic) and cognition (short term and long term memory)"?

How can I evaluate a computerized device or a software application in the HCI field to the relation with these theories such as: "Senses (Visual, Auditory, Haptic) and cognition (short term and long term memory)" and based on the context where the device is used? Any help or advice is appreciated.
My guess would be that the senses part would be covered by:
how pleasing is the device/software.
how real is the application.
and following from that:
immersion
Virtual reality is a big thing in the HCI world. Fire fighters, pilots, the army etc etc use virtual worlds to do more and more of their training, it would be important for them to actually feel like they are there so they react more naturally.
What I can think of for short term and long term cognition:
menu sizes.
categorization.
how many clicks does it take to do X.
These all help a user to remember how to achieve task X and where it was located in the software. (I guess that's all long term...)
I hope this inspires you a bit. Go to http://scholar.google.com/ and find some papers on the subjects, at the very least these papers will explain how they evaluate what they are testing, if you can't find a paper that discusses the evaluation techniques themselves.
Hint: If you are studying at a university the university usually has already paid for full access to the papers. Access scholar.google from a computer at university or use a vpn to connect through your university. Direct links to the papers are located on the right of the search result. As a bonus you can configure scholar to add a link with the bibTeX information!
The first result I got was a chapter of a book on user interfaces, which is about testing the user interface. Happy hunting!
There are a set of heuristics that, at least from my knowledge, have collectively become an industry standard method for evaluating interfaces. They were developed by Jakob Nielsen and can be found here:
http://www.useit.com/papers/heuristic/heuristic_list.html
Usually when someone says they are performing a "Heuristic Evaluation" or an "Expert Evaluation" that means they are basing it off of these 10 criteria. It's possible your professor is looking for something along these lines. I had a very similar experience in two courses I took recently, I had to write papers evaluating several interfaces on Nielsen's Heuristics.
A couple other useful links:
http://www.westendweb.com/usability/02heuristics.htm
http://www.morebusiness.com/getting_started/website/d913059671.brc
http://www.stcsig.org/usability/topics/articles/he-checklist.html
Hope this helps, good luck!

RUP (Rational Unified Process) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have chosen to use the development method RUP (Rational Unified Process) in my project. This is a method I've never used before. I've also included some elements from Scrum in the development process. The question is what the requirement specifications should contain in a RUP-model? Is it functional and non-functional requirements? And what should be included in a technical analysis and security requirements for RUP? Can’t find any information. Notes about this would be helpful.
Hope people with RUP experience can share some useful experiences
RUP has 3 main parts:
Roles
Activities
Work Products
Each ROLE do an ACTIVITY and as a result a produce a WORK PRODUCTS...
For example Analyst [Role] Develop Vision [Activity] as a result we will have Vision [Work Product]...
Besides this RUP gives us some GUIDELINES and CHECKLIST to do right our ACTIVITY and WORK PRODUCTS...
RUP gives us templates for WORK PRODUCTS but they are just to give an idea what they may be look like...
Suppose for vision you can use RUP template but you can just use a post-it notes and just write an "elavator statement" like this:
For [target customer] Who [statement of the need or opportunity] The
(product name) is a [product category] That [statement of key
benefit; that is, the compelling reason to buy] Unlike [primary
competitive alternative] Our product [statement of primary
differentiation]
Even Work products can be simple statements that you write on your WIKI...They can be in any form...
They must not be "static written" docs... They can even be "video" .
Suppose instead of writing Softaware Architecture docs [Architecture Notebook in OpenUP] you can just create a video in which your team explain main architecture on white board....
****WARNING FOR RUP WORKPRODUCTS TEMPLATES:**
DO NOT BECAME A TEMPLATE ZOMBIE.YOU SHOULD NOT FILL EVER PARTS OF IT...
YOU SHOULD ASK YOURSELF, WHAT KIND OF BENEFIT WILL I GET BY WRITING THIS...IF YOU HAVE NO VALID ANSWER, DO NOT WRITE...
DOCUMENTATION SHOULD HAVE REAL REASONS, DO NOT MAKE DOCUMENTATION JUST FOR "DOCUMENTATION"...**
RUP has rich set of WORK PRODUCTS...So chose minumum of them which you will get most benefit...
For a typical projects generally you will have those Requirements Work Products:
Vision : What we do and Why we do? Agrement of StakeHolders...
Suplemantary Specification [ System-Wide Requirements in OpenUP] :
Generally capture non-functional [ which the term i do not like] or
"quality" [ which i like"] requirements of system.
Use-Case Model : Capture function requirements as Use-Cases
Glossary : To make concepts clear...
RUP is commercial but OpenUP is not...So you can look OpenUP WORK PRODUCTS templates just to get an idea what kind of info is recorded in them...
Download it from and
Eclipse Process Framework Project http://www.eclipse.org/epf/downloads/configurations/pubconfig_downloads.php and start reading from index page:
...-->
...--->
--->
----->
--->
....>.........................................
---->.......................................
Lastly you can find usage of those WORK PRODUCTS in an agile manner at Larman book Applying UML and Patterns...
And again : DO NOT BECAME A TEMPLATE ZOMBIE!!!
Try the Rational Unified Process page at Wikipedia for an overview.
The core requirements should be documented in the project description. RUP tends to place a lot of emphasis on "use cases", however it is very important not to lose sight of the original requirements at all levels of detail, because these will answer the "Why?" questions. If the developers only see the uses cases, they will know What they are supposed to build (effectively the functional requirements) but not Why it is required. Unless the developers have easy access to the original analysts, this can cause very serious problems.

Software development metrics and reporting [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I've had some interesting conversations recently about software development metrics, in particular how they can be used in a reasonably large organisation to help development teams work better. I know there have been Stack Overflow questions about which metrics are good to use - like this one, but my question is more about which metrics are useful to which stakeholders, and at what level of aggregation.
As an example, my view is that code coverage is a useful metric in the following ways (and maybe others):
For a team's own internal use when combined with other measurements.
For facilitating/enabling/mentoring
teams, where it might be instructive
when considered on a team-by-team
basis as a trend (e.g. if team A and
B have coverage this month of 75 and
50, I'd be more concerned with team A
than B if the previous month they'd
had 80 and 40).
For senior management
when presented as an aggregated
statistic across a number of teams or
a whole department.
But I don't think it's useful for senior management to see this on a team-by-team basis, as this encourages artifical attempts to bolster coverage with tests that merely exercise, rather than test, code.
I'm in an organisation with a couple of levels in its management hierarchy, but where the vast majority of managers are technically minded and able (with many still getting their hands dirty). Some of the development teams are leading the way in driving towards agile development practices, but others lag, and there is now a serious mandate from the top for this to be the way the organisation works. A couple of us are starting a programme to encourage this. In this sort of an organisation, what sort of metrics do you think are useful, to whom, why, and at what level of aggregation?
I don't want people to feel their performance is being assessed based on a metric that they can artificially influence; at the same time, the senior management are going to want some sort of evidence that progress is being made. What advice or caveats can you provide based on experience in your own organisations?
EDIT
We are definitely wanting to use metrics as a tool for organisational improvement not as a tool for individual performance measurement.
A tale from personal experience. Apologies for the length.
A few years ago our development group tried setting "proper" measurable objectives for individuals and team leaders. The experiment lasted for just one year, because hard metrics didn't really work very well for individual objectives (see my question on the subject for some links and further discussion).
Note that I was a team leader, and involved in planning it all with my technical boss and the other team leaders, so the objectives weren't something dictated from on high by clueless upper management -- at the time we really wanted them to work. It is also worth noting that the bonus structure inadvertently encouraged competition between developers. Here are my observations on the things we tried.
Customer-visible issues
In our case, we counted outages on the service we provided to customers. In a shrink-wrapped product it might be the number of bugs reported by customers.
Advantages: This was the only real measure that was visible to upper management. It was also the most objective, being measured outside the development group.
Disadvantages: There weren't that many outages -- just around one per developer for the whole year -- which meant that failing or exceeding the objective was a matter of "pinning blame" for the few outages that did occur in each team. This led to bad feeling and loss of morale.
Amount of work completed
Advantages: This was the only positive measure. Everything else was "we notice when bad things happen," which was demoralising. Its inclusion was also necessary because, without it, a developer who did nothing all year would exceed all the other objectives, which clearly wouldn't be in the interests of the company. Measuring the amount of work completed checked the natural optimism of developers when estimating task size, which was useful.
Disadvantages: The measure of "work completed" was based on estimates provided by the developers themselves (usually a good thing), but making it part of their objectives encouraged gaming of the system to inflate estimates. We had no other viable measure of work completed: I think the only possible valuable way of measuring productivity is "impact on the company bottom line," but most developers are so far removed from direct sales that this is rarely practical at an individual level.
Defects found in new production code
We measured defects introduced into new production code during the year, as it was felt that bugs from previous years should not count against any individual in this year's objectives. Defects spotted by internal quality teams were included in the count even if they didn't impact customers.
Advantages: Surprisingly few. The time lag between the introduction of a defect and its discovery meant that there was really no immediate feedback mechanism to improve code quality. Macro trends at a team level were more useful.
Disadvantages: There was a heavy focus on the negative, since this objective was only invoked when a defect was found and we needed someone to blame for it. Developers were reluctant to record defects they found themselves, and a simple count meant that minor bugs were as bad as severe problems. Since the number of defects per individual was still quite low, the number of minor and severe defects didn't even out as it might with a larger sample. Old defects were not included, so the group's reputation for code quality (based on all bugs found) did not always match the measurable introduced-this-year count.
Timeliness of project delivery
We measured timeliness as the percentage of work delivered to internal QA teams by the stated deadline.
Advantages: Unlike counting defects, this was a measure that was under immediate, direct control of the developers, as they effectively decided when the work was complete. The presence of the objective focused the mind on completing tasks. This helped the team commit to realistic amounts of work, and improved the perception by internal customers of the development group's ability to deliver on promises.
Disadvantages: As the only objective directly under the developers' control, it was maximised at the expense of code quality: on the day of a deadline, given the choice between saying a task is complete or doing further testing to improve confidence in its quality, the developer would choose to mark it complete and hope any resulting bugs never come to the surface.
Complaints from internal customers
To gauge how well developers communicated with internal customers during development and subsequent support of their software, we decided that the number of complaints received about each individual would be recorded. The complaints would be validated by the manager, to avoid any possible vindictiveness.
Advantages: Really nothing I can recall. Measured at a sufficiently large group level it becomes a more useful "customer satisfaction" score.
Disadvantages: Not only highly negative, but also a subjective measure. As with other objectives, the numbers for each individual were around the zero mark, which meant that a single comment about someone could mean the difference between "infinitely exceeded" and "did not meet".
General comments
Bureaucracy: While our task management tools held much of the data for these metrics, there was still quite a lot of manual effort involved to collate it all. The time spent obtaining all the numbers was not enjoyable, generally focused on negative aspects of our work and may not even have been reclaimed by increased productivity.
Morale: For the measures where individuals were blamed for problems, not only did those with "bad" scores feel demotivated, but so did those with "good" scores, as they didn't like the loss in team morale and sometimes felt they were ranked higher not because they were better but because they were luckier.
Summary
So what did we learn from the episode? In later years we tried to re-use some of the ideas but in a "softer" way, where there was less emphasis on individual blame and more on team improvement.
It is impossible to define objectives for individual developers that are objectively measurable, add value to the company and cannot be gamed, so don't bother to try.
Customer issues and defects can be counted at a wider team level, if the location of the defect is unequivocally the responsibility of that team -- that is, you don't ever have to play the "blame game".
Once you measure defects only at the level of responsibility for a code module, you can (and should) measure old bugs as well as new ones, since it is in that group's interest to eliminate all defects.
Measuring defect counts at a group level increases the sample size per group, and so anomalies between minor and severe defects are smoothed out and a simple "number of bugs" measure can mean something, such as to see if you are improving month-on-month.
Include something that upper management care about, because keeping them happy is your primary purpose as a development group. In our case it was customer-visible outages, so even if the measure is sometimes arbitrary or seemingly unfair, if it's what the bosses are measuring then you need take notice too.
Upper management don't need to see metrics they don't have in their own objectives. This way it avoids the temptation to blame individuals for errors.
Measuring timeliness of project delivery did change developer behaviour and put a focus on completing tasks. It improved estimation and allowed the group to make realistic promises. If it were easy to collect the timeliness information then I would consider using it again at a team level to measure improvement over time.
All of this doesn't help when you are required to set measurable objectives for individual developers, but hopefully the ideas will be more useful for team improvement.
The key thing about metrics is knowing what you are using them for. Are you using them as a tool for improvement, a tool for reward, a tool for punishment, etc. It sounds like you're planning to use them as a tool for improvement.
The number one principle when setting metrics is to keep the information relevant so that the person receiving it can use it to make a decision. Most likely a senior manager cannot dictate the micro level of whether you need more tests, less complexity, etc. But a team leader can do that.
Therefore, I don't believe a measure of code coverage is going to be useful to management beyond the individual team. At the macro level, the organisation is probably interested in:
Cost of delivery
Timeliness of delivery
Scope of delivery & external quality
Internal quality won't be high on their list of things to cover off. It's a development team's mission to make it clear that internal quality (maintainability, test coverage, self-documenting code, etc) is a key factor in achieving the other three.
Therefore you should target metrics to more senior managers which cover off those three such as:
Overall Velocity (note that comparing velocity between teams is often artificial)
Expected vs Actual scope delivered to agreed timelines
Number of production defects (possibly per capita)
And measure things like code coverage, code complexity, cut 'n' paste score (code repetition using flay or similar), method length, etc at a team level where the recipients of the information can really make a difference.
A metric is a way of answering a question about a project, team or company. Before you start looking for the answers, you need to decide what questions you want to ask.
Typical questions include:
what is the quality of our code?
is the quality improving or degrading over time?
how productive is the team? Is it improving or degrading?
how effective is our testing?
...and so on.
Each question will require a different set of metrics to answer. Collecting metrics without knowing what questions you want answered is at best a waste of time and at worst counterproductive.
You also need to be aware that there is an 'uncertainty principle' at work - unless you are very careful the act of collecting metrics will change people's behaviour, often in unexpected and sometimes detrimental ways. This is especially so if people believe they are being evaluated on the metrics, or worse still have the metrics tied to some reward or punishment scheme.
I recommend reading Gerald Weinberg's Quality Software Management Vol 2: First Order Measurement. He goes into a lot of detail on software metrics, but says the most important are often what he calls "Zero Order Measurement" - asking people their opinion on how a project is going. All four volumes in the series are expensive and hard to get hold of, but well worth it.
Software writing
What must be optimised?
CPU(s) use, memory(s) use, memory cache(s) use, user time use, code size at run-time, data size at run-time, graphics performance, file access performance, network access performance, bandwidth use, code conciseness and readability, electricity use, (count of) distinct API calls used, (count of) distinct methods and algorithms used, maybe more.
How much must it be optimised?
It must be optimised the minimum reasonable amount (except in areas where surpassing acceptance test criteria is desirable) required to pass acceptance tests, facilitate maintenance, facilitate audit and meet user requirements.
("... for legal/illegal input test data and legal/illegal test events in all test states at all required test data volumes and test request volumes for all current and future test integration scenarios.")
Why the minimum reasonable amount?
Because optimised code is harder to write and so costs more.
What leadership is required?
Coding standards, basic structure, acceptance criteria and guidance on levels of optimisation required.
How can success of software writing be measured?
Cost
Time
Acceptance test passes
Extent to which acceptance tests it is desirable to surpass are surpassed
User approval
Ease of maintenance
Ease of audit
Degree of absence of over-optimisation
What cost/time should be ignored in assessing aggregate performance of programmers?
Wasted cost/time incurred because of requirements (inc architecture) changes
Extra cost/time incurred because of deficiencies in platforms/tools
But this cost/time should be included in assessing aggregate performance of teams (inc architects, managers).
How can success of architects be measured?
Other measures plus:
Instances of "avoiding early" being affected by deficiencies in platforms/tools
Degree of absence of changes in architecture
As I said in What is the fascination with code metrics?, metrics include:
different populations, meaning the scope of interest is not the same for developer or for manager
trends meaning any metrics in itself is meaningless without its associated trend, in order to take the decision to act upon it or to ignore it.
We are using a tool able to provide:
lots of micro-level metrics (interesting for developers), with trends.
lots of rules with multi-level (UI, Data, Code) static analysis capabilities
lots of aggregations rules (meaning those vast number of metrics are condensed in several domains of interests, adequate for higher level of populations)
The result is an analysis which can be drilled-down, from high level aggregation domains (security, architecture, practices, documentation, ...) all the way down to some line of code.
The current feedback is:
project managers can get defensive very quickly when some rules are not respected and make their global note significantly lower.
Each study has to be re-tailored to respect each project quirks.
The benefit is the definition of a contract where exceptions are acknowledged but rules to be respected are defined.
higher levels (IT department, stakeholder) use the global notes just as one element of their evaluation of the progress made.
They will actually look more closely at other elements based on delivery cycles: how often are we able to iterate and put an application into production?, how many errors did we had to solve before that release? (in term of merges, or in term of pre-production environment not correctly setup), what immediate feedbacks are generated by a new release of an application?
So:
which metrics are useful to which stakeholders, and at what level of aggregation
At high level:
the (static analysis) metrics are actually the result of low-level metric aggregations, and organized by domains.
Other metrics (more "operational-oriented", based on the release cycle of the application, and not just on the static analysis of the code) are taken into account
The actual ROI is achieved through other actions (like six-sigma studies)
At lower level:
the static analysis is enough (but has to encompass multi-level tiers applications, with sometimes multi-languages developments)
the actions are piloted by the trends and importance
the study has to be approved/supported by all levels of hierarchy to be accepted/acted upon (in particular, budget for the ensuing refactoring has to be validated)
If you have some Lean background/knowledge, then I would suggest the system that Mary Poppendieck recommends (that I've already mentioned in this previous answer). This system is based on three holistic measurements that must be taken as a package:
Cycle time
From product concept to first release or
From feature request to feature deployment or
From bug detection to resolution
Business Case Realization (without this, everything else is irrelevant)
P&L or
ROI or
Goal of investment
Customer Satisfaction
e.g. Net Promoter Score
The aggregation level is product/project level and I believe that these metrics are helpful for everybody (developers should never forget that they don't write code for fun, they write code to create value and should always keep that in mind).
Teams may (and actually do) use technical metrics to measure quality standards conformance which are integrated in the Definition of Done (as "no increase of the technical debt"). But high quality is not a end in itself, it's just a mean to achieve short cycle time (to be a fast company) which is the real target (with Business Case Realization and Customer Satisfaction).
This is a bit of a side note to the main question, but I had a very similar experience to Paul Stephensons answer above. One thing I would add to that is about collection of data and visibility of metrics.
In our case, the development director was meant to collate a bunch of data from various disparate systems and distribute individual metric results once a month. This often didn't happen, as it was a time consuming job and he was a busy man.
The results of this were:
Unhappy developers, as performance bonuses were based on metrics and people didn't know how they were getting on.
Some time consuming multiple entry of data into various different systems.
If you are going down this route, you need to be sure that all metric data can be collated automatically and is easily visible to those it affects.
One of the interesting approaches that's currently getting some hype is Kanban. It's fairly Agile. What's particularly interesting is that it permits a metric of "work done" to be applied. I havn't used/encountered this in actual practice yet, but I'd like to work towards getting a kanban-ish flow going at my job.
Interestingly I just finished reading PeopleWare, and the authors strongly discourage individual metrics being made visible to superiors (even direct managers), but that aggregate metrics should be very visible.
As far as code specific metrics I think it's good for a team to know the state of the code at the current time, and to know the trends affecting the code as it matures and grows.
The question is obviously not focussed on .NET, but I think the .NET product NDepend has done a lot of work to define and document common metrics that are useful.
The documentation section on metrics is educational reading, even if you're not doing .NET.
Software metrics have been with us for a long time and as best I
can tell nothing to date has emerged individually or in aggregate
that is capable of guiding projects during development. The nut of
the problem is that we want to use objective measures and these
can only measure what has happened,
not what is happening or about to happen.
By the time we have measured, analyzed and interpreted some
series of metrics we are reacting to things that
have already gone wrong, or very occasionally, gone right.
I don't want to underplay the importance of learning from
objective metrics but I do want to
point out that this is a reactive not a pro-active response.
Developing a "confidence index" may be a better way of monitoring
whether project is on-track or headed for trouble. Try
developing a voting system where a reasonable number of
representatives from each project area of interest are asked
to anonymously vote their
confidence from time to. Confidence is voted in two areas:
1) Things are on-track 2) Things will continue to be on-track or get
back on-track.
These are purely subjective measurements from people closest to the
"action".
Feed the results into a Kanban type chart where the
columns represent voting areas and you
should have a pretty good idea where to focus your attention. Use
question 1 to evaluate whether management reacted to the
previous voting cycle appropriately. Use question 2 to identify
where management should focus next.
This idea is based on each of us having a comfort level
within our own area of responsibility. Our confidence level
is a product of experience, knowledge within our
domain of expertise, the number and severity of problems
we are facing, the amount of time we have to accomplish our
tasks, the quality of the information we are working with and
a whole bunch of other factors.
MBWA (Management By Walking Around) is often touted as
one of the most effective tools we have - this is a variation of it.
This technique is not much use at the level of
individual teams because it only reflects the general mood
of the team. Kind of like using someone’s watch to tell them
the time. However, at higher levels of management it should
be quite informative.

Resources