How to extract the citation metadata from scientific publications/books

How to extract the citation metadata from scientific publications/books - machine-learning

I am planning to create an application which involves extraction of citation metadata.By doing this, i'm planning to reduce the human effort taken to search for the cited information,and make it accessible easily(by automating that process).
So far,i have read various scientific papers,and found that FLUX-CiM(by Eli Cortez) has the highest efficiency out of all the existing models.Unfortunately, there is no existing implementation of it anywhere, and no details regarding the its implementation(usage of any libraries/Programming Language/any other algorithm) has been provided in that paper.
So i would like to know,if there is any implementation of it ,or if there is any existing application similar to the FLUX-CiM(i.e.unsupervised learning style,as it is more reliable). I am very new to machine learning, and i would like to know if there is any tutorials/starting point to learn about implementing my application.

I'm not too familiar with the various tools, but AnyStyle seems to have the functionality you're looking for, and is machine-learning based.
Repo: https://github.com/inukshuk/anystyle-parser
Demo: http://anystyle.io/

Related

where does specification by example complement/replace traditional requirements documentation?

I'm trying to understand where SBE's complement or replaces traditional requirements documentation. The diagram levels of requirements shows three levels of traditional software requirements.
Which of the items below (from the diagram) does SBE replace and which ones does it complement:
Vision and Scope Document
Business Requirements
Use Case Document
User Requirements
Business Rules
Software Requirements Specification
System Requirements
Functional Requirements
Quality Attributes
External Interfaces
Constraints
My naive understanding of SBE's would say that the SBE's are just an alternative form of the Software Requirements Specification. Is this correct?

BDD and SBE are normally used by Agile teams, who don't focus as much on documentation as traditional software development teams do.
BDD is the art of using examples in conversation to illustrate behaviour. SBE then uses the examples as a way of specifying the behaviour that you decide to address (I always think of it as a subset of BDD, since talking through examples often ends up to eliminating scope, discovering uncertainty or finding different options, none of which end up as specifications).
There are a couple of things that are hard to do with BDD. One of them is anything which isn't discrete in nature, or which needs to always be true throughout the lifetime of the system - non-functionals, quality attributes, constraints, etc. It's hard to talk through examples of these. These continuous aspects of requirements lend themselves better to monitoring, and that's discrete, so BDD can even be used to help manage these.
Since an initial vision is usually created to help the company make money, save money, or protect existing revenue (stopping customers going elsewhere, for instance), you can even come up with examples of how the project will do this. In fact, if you can't, the project is likely to fail anyway. So BDD / SBE can also be used to help complement an initial vision and scope.
Therefore, BDD / SBE can complement all of these documents, and in Agile teams, the documents themselves are usually replaced by conversations about the requirements and rules (illustrated by examples), story cards to represent placeholders for those conversations, and perhaps some lightweight capture of those conversations on a Wiki.
It is unlikely that any Agile team captures all of their examples up-front, as this leads to excessive investment in the requirements and tends to turn it into a traditional Waterfall /SDLC project instead.
This blog post I wrote about BDD in the Large may also be of interest.

A production-ready, real-time recommendation engine thats easy to setup

I want to store large number of data ppints for user actions, like likes, tags etc (I have plans for both e-commerce and document management).
With the data points, I want to support functions such as
"users who loved X loved Y,Z" recommendations
"fetch more stuff similar to X,Y" clustering.
By production-ready, real-time; I mean that I can enter data points and make queries at the same time, the server will take care of answering queries and updating scores by itself.
I searched around the interwebs and the solutions that come up are either of:
Data-mining libraries that are mostly academic-oriented and are meant for large batch operations, not for heavy real-time queries
Hadoop/Mahout, which is production-ready and support real time updates and queries, but have a steep learning curve and tough to administer.

For recommenders, Mahout has a non-distributed recommender implementation that does not use Hadoop. In fact, this is the only part that is real-time; the Hadoop-based parts are not.
I think there is little learning curve to it; see here and here for a pretty complete writeup.
Mahout in Action chapters 2-5 cover this quite well too.

Please understand that for useful recommendations, the various parameters of such a system must be carefully fine tuned. The out of the box functionality many systems have (Oracle data mining, Microsoft data mining extensions etc.) just offer the core functionality.
So in the end, you will not get around the "steep learning curve", I guess. That is why you need experts for data mining. If there were a point-and-click solution, it would already be integrated everywhere.
Example "similar items". I laughed hard, when Amazon once recommended me to buy two products: Debian Linux Administrators Handbook and ... Debian Linux Admininstrators Handbook WITH CD.
I hope you get the key point of this example: to a plain algorithm, the two books appear "similar", and thus a sensible combination. To a human, it it pointless to buy the same book twice. You need to teach such rules to any recommendation system, as they cannot be trivially learned from the data. There will always be good results and useless results, and you need to tune and parameterize the system carefully.

Open Alternatives to Google Prediction API

A recent announcement by Google about the Google Prediction API sounded very interesting. It could be useful for a project that is coming up, and would probably do a better job than some custom code I was considering.
However, there is some vendor lock-in. Google retain the trained model, and could later choose to overcharge me for it. It occurred to me that there are probably open-source equivalents, if I was willing to host the training myself (I am) and live without their ability to throw hardware at the problem at a moment's notice.
Last time I looked at 3rd Party computer training code was many years ago, and there were a lot of details that needed to be carefully considered and customised for your project. Google appear to have hidden those decisions, and take care of them for you. To me, this is still indistinguishable from magic, but I would like to hear whether others can do the same.
So my question is:
What alternatives to Google Prediction API exist which:
categorise data with supervised machine learning,
can be easily configured (or don't need configuration) for different kinds and scales of data-sets?
are open-source and self-hosted (or at the very least, provide you with a royalty free use of your model, without a dependence on a third party)

Maybe Apache Mahout?

PredictionIO is an open source machine learning server for software developers to create predictive features, such as personalization, recommendation and content discovery.

Have been looking recently at tools like google prediction API, one of the first ones I got put on to was Weka machine learning tool which could be worth checking out for anyone looking.

I'm not sure if it's relevant, but directededge seams to be doing exactly that :)

There is good free for use service Yandex Predictor with 100000/day request quota. It works for text only, supports several languages and spell correction.
You need to get free API Key, then you can use simple RESTful API. Api support JSON, XML and JSONP as output.
Unfortunately I cannot find documentation in English. You can use Google Translate.
I can translate docs if there is some demand.

Is medical imaging a separate programming specialty?

Sorry if it's yet another question with no response:), feel free to close. There's a bunch of questions on SO like: does technology/language/database used in medicine is quite different from that used in traditional area like banking/industry etc. You can hear in response or 1) no difference 2) it's vague and hard according to lack of standards.
But medical imaging is attractive not only due to general concerns: humanistic & scientific. Job opportunities are strict and obvious. C++/Сom/ActiveX/C#, some open source libraries, DICOM/HL7, Python. It looks like a separate specialty - you don't need to explain on interview what exactly you did.
So my question is: Is medical imaging a separate specialty as it seems to stranger? Do the vendors mostly go in the same direction and you can change them without changing world view as it happens in enterprise? Or it is just kind of C++ programming which is usually interchanged with some other image processing, trading, drivers, op. system programming etc.?

I would say that programming is universal. Whether you're developing a web application or designing for an embedded system, you will face similar challenges. However, some aspects do change. And in this case, I would say that the focus on algorithms is what sets medical imaging apart from other fields.
I'm not an expert, but there is definitely some advanced math and algorithms involved in medical imaging. For example, consider image registration - a common algorithm used in the field. Not only would an MI expert have to have a good mathematical understanding of the registration, but he would also have to be able to readily implement and optimize it - not a trivial task.

Metamodelling tools

What tools are available for metamodelling?
Especially for developing diagram editors, at the moment trying out Eclipse GMF
Wondering what other options are out there?
Any comparison available?

Your question is simply too broad for a single answer - due to many aspects.
First, meta-modelling is not a set term, but rather a very fuzzy thing, including modelling models of models and reaching out to terms like MDA.
Second, there are numerous options to developing diagram editors - going the Eclipse way is surely a nice option.
To get you at least started in the Eclipse department:
have a look at MOF, that is architecture for "meta-modelling" from the OMG (the guys, that maintain UML)
from there approach EMOF, a sub set which is supported by the Eclipse Modelling Framework in the incarnation of Ecore.
building something on top of GMF might be indeed a good idea, because that's the way existing diagram editors for the Eclipse platform take (e.g. Omondo's EclipseUML)
there are a lot of tools existing in the Eclipse environment, that can utilize Ecore - I simply hope, that GMF builts on top of Ecore itself.

Dia has an API for this - I was able to fairly trivially frig their UML editor into a basic ER modelling tool by changing the arrow styles. With a DB reversengineering tool I found in sourceforge (took the schema and spat out dia files) you could use this to document databases. While what I did was fairly trivial, the API was quite straightforward and it didn't take me that long to work out how to make the change.
If you're of a mind to try out Smalltalk There used to be a Smalltalk meta-case framework called DOME which does this sort of thing. If you download VisualWorks, DOME is one of the contributed packages.

GMF is a nice example. At the core of this sits EMF/Ecore, like computerkram sais. Ecore is also used for the base of Eclipse's UML2 . The prestige use case and proof of concept for GMF is certainly UML2 Tools.

Although generally a UML tool, I would look at StarUML. It supports additional modules beyond what are already built in. If it doesn't have what you need built in or as a module, I supposed you could make your own, but I don't know how difficult that is.

Meta-modeling is mostly done in Smalltalk.
You might want to take a look at MOOSE (http://moose.unibe.ch). There are a lot of tools being developed for program understanding. Most are Smalltalk based. There is also some java and c++ work.
Two of the most impressive tools are CodeCity and Mondrian. CodeCity can visualize code development over time, Mondrian provides scriptable visualization technology.
And of course there is the classic HotDraw, which is also available in java.
For web development there is also Magritte, providing meta-descriptions for Seaside.

I would strongly recommend you look into DSM (Domain Specific Modeling) as a general topic, meta-modeling is directly related. There are eclipse based tools like GMF that currently require java coding, but integrate nicely with other eclipse tools and UML. However there are two other classes out there.
MetaCase which I will call a pure DSM tool as it focuses on allowing a developer/modeler with out nearly as much coding create a usable graphical model. Additionally it can be easily deployed for others to use. GMF and Microsoft's Beta software factory/DSM tool fall into this category.
Pure Meta-modeling tools which are not intended for DSM tooling, code generation, and the like. I do not follow these tools as closely as I am interested in applications that generate tooling for SMEs, Domain Experts, and others to use and contribute value to an active project not modeling for models sake, or just documentation and theory.
If you want to learn more about number 1, the tooling applications for DSMs/Meta-modeling, then check out my post "DSMForum.org great resources, worth a look." or just navigate directly to the DSMForum.org

In case you are interested in something that is related to modelling and not generation of code, have a look at adoxx.org. As a metamodelling platform it does provide functionalities and mechanisms to quickly develop your own DSL and allows you to focus on the models needs (business requirements, conceptual level design/specification). There is an active community from academia and practice involved developing prototypical as well as commercial application based on the platform. Could be interesting ...

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart