Is there a programmatic way to gather JIRA search term data for analysis? - jira

I would like to analyze the search terms submitted by our JIRA users so we can formalize best practices for creating subjects and descriptions.
I'd like to avoid having to pull the search terms out of log files, where I believe they live if the right log levels are set.
I am familiar with jira-python and server-side JIRA customization, but this one's stumping me.
Is there a programmatic way to generate a list of the search terms submitted to JIRA? (Client-side/API is ideal, but server-side is okay too.)
Appreciate any advice folks can share, pointers to references, and so forth!

There is no API for this, but you can get this information from the webserver logs or by looking at saved_searches directly in the database. Clearly these would be only the saved filters, not all queries performed.

What about catching everything that is submitted to search field (client-side) and log it on server side (POST to servlet).

Related

Jira to know how many poeple have visited an issue

I want to know how many people have visited a particular isssue in order to know its popularity( I can't trust on number of watchers of the issue) .Is there any way (JiraDB or anything) by which i can know how many people(just the count) have visited any particular issue.
The question can be modified like this : Top 10 mostly visited issues in a week or so.
Seb's earlier answer provides a possible solution for JIRA Cloud. I am not aware of any off-the-shelf product for behind-the-firewall installations of JIRA, and I do not believe that views are tracked anywhere in the JIRA database.
For behind-the-firewall instances, you could certainly write a script to parse the JIRA access logs (stored in $JIRA_HOME/logs/access_log*) to count issue accesses that way.
The JIRA access logs are stored in a format that is similar to the Apache access log format, so you just need to parse out accesses to individual issues by looking for URLs of the format "http://MYJIRA/browse/ABC-123".
Out of the box this is not possible. Jira does not log view counts for single issues.
You could have a look if there is any plugin for this at https://marketplace.atlassian.com/search?application=jira
E.g. https://marketplace.atlassian.com/plugins/communardo.connect.usage.statistic.addon looks like it could fit your requirements, but I personally have never heard of it.

Craigslist or Kijiji - Is it possible to extract posters email address?

Hi and thanks in advance for helping me with my question.
Is it possible to write a script that would extract the following information when provided with a craigslist or kiji post ie http://toronto.en.craigslist.ca/tor/atq/3346994296.html:
email address (default one provided by craigslist)
items in the post
address of poster
Above 1-3 is information that can be manually obtained but would like to just input a posting or ad ID and be able to extract this info.
The short answer to this question is...
Yes, automatically extracting the info listed from web pages similar to the one provided as example can be done by a relatively simple script.
In general, this activity [of automatically extracting info from web pages] is known as Web Scraping, a particular form of Data Scraping.
There are both off-the-shelf products that can be used (no or little programming involved; the parametrization of the desired pages and desired fields within the pages is specified by way of configuration.), as well as software libraies which can be used in relation with scripting languages such as python or java and which facilitate the parsing of HTML page, and more generally provide support for the various tasks associated with this activity.
Aside from technical considerations, you need to assert the etiquette and legality of performing this kind of scraping. Whereby some data and sites may be explicitly copyright-protected, it is always a good idea to perform big scraping jobs at low traffic hours and by throttling the requests as to not burden the host site unduly. Also many sites may provide an API or data dumps to supply the same info in a simpler and more controlled fashion.

How best to aggregate site statistics (especially search demand)

I am working on a rails application that uses sunspot solr for search. I have been asked to log (or capture in some way) each search that happens on the site; the query, the user that did the search, the result count that resulted from their search...etc, so that the company can report on what people are searching on (demand), and other things.
Before I go and make a table, that will receive an ever-growing number of rows of search data, I'm wondering if anyone has done this in a better way? Can I use analytics (google?) in some way for this? Is there some kind of service I can send this information too, such that we could easily pull reports, or create reports from?
In short, is there some better/smarter way than creating my own table and storing this all in our own DB?
I had never done this, but here are some thoughts.
If you just need to store that data I think you should do it yourself.
If you need to also provide a way to analyse the data yes, see if there is anything already done (I'm not sure but it seems google analytics only support internal search using their search bar).
If your client already have some BI tool they just need a way to access the data, and it would be easier to have it in a owned DB wich you can easily be query instead of using a provider api.

Search Engine without crawling?

Is there a way to collect web content in order to use it in a search engine without passing by the web crawling phase? Any alternative to web crawling?
Thanks
No, to collect the content you have to...collect the content. :-)
Yes (and sort-of no).
:)
You can download existing data dumps from various websites (wikipedia, stackoverflow, etc.) and construct a partial index that way. It obviously won't be a complete index of the internet.
You could also use meta-search to construct your search engine. This is where you use the APIs of other search engines and use THEIR search results as the basis of your index. Examples include citosearch and opensearch. duckduckgo uses yahoo's boss api (and now yahoo uses bing...) as part of their search engine.
There are also real-time streaming APIs that you could use instead of crawling the web. Look at datasift as an example. There are lots more resources you could cleverly use and avoid/minimize crawling.
If you want to be updated with the latest content on pages, then you can use something like pubsubhubbub protocol to get push notifications for subscribed links.
Or use paid services like superfeedr that make use of the same protocol.
directly or indirectly you have to crawl the web in order to get the content.
Well if you don't want to crawl, you can follow a wiki-like approach, where users can submit links to sites (with title, description and tags). So a collaborative link collection can be built.
To avoid spam a +/- system can be involved, to vote useful sites or tags up and useless ones down.
To avoid spammers mass voting SERPs you can weight votes by user reputation.
User reputation can be gained by submitting useful sites. Or somehow tracing usage patterns.
And considering other abuse patterns too.
Well, you got the point, I think.
As spammers gradually discover weaknesses of traditional search engines (see Google bomb, content scraper sites, etc.), a community based approach may work. But it would suffer severely from the cold start effect, and when community is small the system is easy to abuse and poison...
At least Wikipedia and Stack Exchange is not spammed to useless levels so far...
PS: http://xkcd.com/810/

Resume parser in Ruby/(Rails Plugin/Gem)

Is there any ruby gem/ rails plugin available for parsing the resume and importing that information into an object/form ?
I may be wrong, but I don't think you'll find anything completely automated to do this, because a résumé (or CV) can be structured in so many different ways and can contain very different types of data. Any completely automated solution is likely to have accuracy problems, since it is technically a difficult problem to solve.
You may find this answer useful.
Here are some other suggestions that might help :-
Require a user to enter their details into a form on your website instead of uploading a Word document. You'll then be able to explicitly ask for the data you want and you'll be able to store the data in a structure that suits you. However, this may be too much of a barrier to entry for your users.
Allow a user to submit the URL of their résumé published using the hResume microformat. Sites like LinkedIn already publish résumés in this format. There is a Ruby gem mofo which can parse microformats including hResumes. However, not all users will have an on-line résumé like this.

Resources