Trying to make a search engine for issues - machine-learning

Our company has a lot of data that are issue which are stored in a database.We want to create a search engine so that people can check how the issues were previously dealt with.We cannot use any 3rd party api as there is sensitive data an we want to keep it as in house. Right now the approach is as following :-
Clean up the data and then use a DOC2VEC to represent each issue as a vector .
Find the closest 5 issue using some distance metric.
The problem is that the results are not at all useful.The problem is most of the data is one liner and some issue description.There are spelling mistakes and stack traces and other things.
Is this the right approch or should we switch to something else?
Right now we are testing on 200K data.
Thanks for the help.

Related

Dose anybody know a source having multi-electrode (several channels) recorded data of multiple Neurons (I prefer real data not artificial)?

I mean recorded data of multiple Neurons with multi-electrode. I need this data as the input for my experiment.
Google just released a Beta Dataset Search Engine.
You can find data Here
A good way to tackle stuff like this is to email the various labs that work with or generate data of your particular interest. A lot of labs I have worked for in the past have tons of data lying around that is not used, or usefull for them in any current way, and people are generally enthousiastic about your interest in their study.
Adittionally, there are many projects funded with the idea of sharing data and tools for the benefit of science. One such project is the miniscope project from UCLA (which has a ton of calcium imaging data lying around and have very helpfull people willing to share and assist you in the analysis. I am sure a quick google around can help you find similar labs more specialized in electrofysiology over calcium imaging.
I hope you find what your are looking for!

Best way to graph some data

I am wanting to display a line graph with details containing wind speed/gust/lulls etc for out local club paragliding app I am making. wondering what the best way to go about it with is?
The current data is just on the web so something similar that I can pinch zoom maybe:
http://www.acthpa.org/wind/
Willy weather has an awesome app that display very nicely, anyone know what they might have used to make it so? the web is very similar to their iPHone app:
http://wind.willyweather.com.au/vic/western-district/wild-dog-beach.html
after looking around at various 3rd party's, i'm using Core Plot :-)
I've put together a really simple class for displaying line graphs, check it out at https://github.com/johnyorke/JYGraphViewController
Perhaps rotate the phone for the graph view then swipe up and down to go between the different metrics you wish to show?
There are various third party graphing libraries for iOS. I looked into CorePlot a while back, but did not end up needing it. It looked full-featured, although I've seen posts that it is rather complex and involved to use.

Create a Diagram from Database Information

I've been trying to look around lots of libraries but wasnt able to find one that could help me, I already looked at diagramo and gojs and lots of canvas libraries that let you draw your diagram from the browser.
Im developing an application that creates a Cause and Effect Diagram from information that the users type, this information is saved on a database and I need to create a diagram like that from the database its something like this
http://www.fao.org/WAIRdocs/x5405s/x5405s0h.gif
Any good libraries you might know that could help me? Im using ROR as development
What's wrong with either of these?
I'm doing exactly this with GoJS and it's great - unfortunately the price of that library is high, though.

multiple choice test mark reader - where to start?

I was assigned a project (in school) for automated multiple choice test scoring and I do not know where to start.
I think his is a kind of popular program and you already know about it. Enter an image file scanned of the answer sheet and return results.
Everything I know about computer vision is a few examples of photo editing with OpenCV. I hope you can give me a few keywords related to the problem or maybe a couple of blog articles, documents and related libraries.
Is there any free open source programs that I can refer to?
Thanks!
Edit: Add 2 example of the answer sheet (sory that I cannot find a sheet in English):
I think there are basically two steps to the problem
bring the form into a normalized position
now you know where the boxes are and can look at them by thresholding the gray values in that region.
What methods to use for step 1 depends on your actual images and how much the vary. Do you have some example images you can upload?
Also I think it is a good idea, especially if you are a beginner, to start with some simple examples and work your way up from there by adding more and more variation.

Ideal method for storing hierarchical data in HDF5

Hello Oracles of StackOverflow,
First time I managed to ask a question on stack overflow, so feel free to throw your cabbages at me. (or correct the way I should be asking my question)
I have this problem. I'm using HDF5 to store massive quantities of cookie information.
My Data is structured in the following way:
CookieID -> Event -> Key_value Pair
There are multiple events for each cookieID. But only one key_value pair per event.
I'd like to know what the best way I should store this in a HDF5.
Currently, I'm storing each cookie as a seperate table within a group in the HDF5, using the cookieID as the name of the table. Unfortunately for me, with 10,000,000 cookies, HDF5 (or specifically PyTables) doesn't approve of this type of storage.
Specifically throwing this error:
/CookieData`` is exceeding the recommended maximum number of children (16384)
I'm wondering if you could recommend the best way of storing this information.
Should I create a flat table? Should I keep this method? Is there something else I can do?
Help is appreciated. Thanks for reading.
Several hours of research later, I've discovered that what I was attempting to do was categorically impossible.
The following link gives details as to the impossibility of using HDF5 with variable-length nested children.
I've decided to go with a flat file for the time being and hope that this is more efficient than a database store. The problem with a flat file in the end is that I have to replicate values in the file, which otherwise should not exist.
If anyone else has any better ideas it would be appreciated.

Resources