I was going through my mails, and saw that gmail automatically suggested me to add coming friday around 5pm to an event on 21st Feb. I am surprised how gmail does this ?
I mean how did it correctly figure out that this friday meant the coming friday, and also that the 5 PM is linked with Friday.
I am a newbie in NLP and machine learning, so if someone can explain it to me in layman terms I would be very glad
I don't think this needs a lot of machine learning as such. A bit of NLP is helpful to get the dependencies from the sentence but even that isn't strictly necessary.
You could start off with just looking at keywords monday,tuesday etc. and then do a look around to see what is around them last monday, next monday, coming monday, previous monday and so on. These are called window features because they provide a window +/- 1,2,3 ... around the feature you are interested in monday. The around 5pm you could theoretically also get from just looking at window features, I don't have an intuition as to how noisy that would be. Try to think of all the ways of expressing time in that context and then think of those ways can be mixed up with something else. Of the top of my head it would seem relatively easy to do that.
Anyhow, the other way is to use a dependency parser to extract the grammatical relations of the elements in the sentence. This requires you to Part of Speech (POS) tag the sentence (after splitting it into tokens). The POS tagger would need to be trained to recognize that friday and monday are nouns, perhaps even that they are temporal expressions, same goes for 5pm and around 5pm. That does require machine learning and a lot of it. The benefit Google has as opposed to others is that they have a lot of data, which allows them to have lots and lots and lots of examples of different ways expressing what essentially is the same thing. This gives their models a lot of breadth. Once you've got the sentence POS tagged, you feed it to a dependency parser (such as the Stanford Dependency Parser) which tells you what the relation between all the different tokens in the sentence is.
Again Google has a lot of data which helps. On top of all this Google has had years to hone the output of the models so that when the models isn't entirely sure what is going on it won't highlight/extract the result. In terms of actually applying NLP in the real world this last step is very important because it given people confidence in what the system is doing. Basically if the software isn't sure what is happening do nothing, because doing something risks doing the wrong thing which then reduces people's confidence on the system as a whole.
Releasing a reliable easy to use NLP application requires a tradeoff between the quality of the NLP/Machine Learning and general software engineering to hide all the parts where the NLP fails from the users.
Try sending yourself email(s) with time expressed in different ways and see which ones Google gets and which ones it doesn't. For instance
Can we meet Friday next week?
How about coffee next week's Friday at 2pm
I can't do Friday but I can meet Wednesday at 4pm
and so on, it's always interesting to poke holes in technology. It can also reveal quite a lot about what it is doing, and how it is doing it.
Related
Pardon if this question is not appropriate. It is kind of specific and I am not asking for actual code but moreso guidance on whether or not this task is worth undertaking. If this is not the place, please close the question and kindly point me in the correct direction.
Short background: I have always been interested in tinkering. I used to play with partitions and OS X scripts when I was younger, eventually reaching basic-level "general programming" aptitude before my father prohibited my computer usage. I am now going to law school and working at a law firm but I love development and I want to implement more tech innovation in the field.
Main point: At our firm, we have a busy season every year from mid march to the first week of april (immigration + H1B deadline). We receive a lot of documents and scanned files that need to be verified, organized, and checked.
I added (very) simple lines of code to our online platform to help in organization; basically, I attached tags to all incoming documents, and once they were verified, the code would organize them by tag (like "identification doc", "work experience doc" etc.). This would my life much easier every year, as I end up working 100+ hour weeks this season.
I want to take this many steps further with an algorithm that can check for signatures and data mismatches between documents and ultimately organize the documents so they are ready to print. Eventually, I would like to maybe even implement machine learning and a very basic neural network to automate the whole mind-numbing and painful process...
Actual Question(s): I just wanted to know the best way for me to proceed or get started. I know a decent amount of python and java, and we have an online platform already with the documents. What other resources would you recommend in terms of books, videos, or even classes? Is there a name for this kind of basic categorization? Can I build something like this through my own effort without an advanced degree?
Stupid and over-dramatic epilogue: Truth be told, a part of me feels like I wasted my life thus far by not pursuing what I knew I loved at the age of 12. This is my way of making amends I guess, and if I can do this then maybe I can keep doing it in law and beyond...
You don't give many specifics about the task but if you have a finite number of forms in digital form as images, then this seems very possible.
I have personally used OpenCV with Python a lot and more complex machine learning tasks have become increasingly simple in the past 10 years.
Take for example object detection (e.g. 1, 2) to check whether there is anything in a signature field or try extracting the date from an image (e.g 1, 2).
I would suggest you start with the simplest thing that would improve your work. A small and easy task will let you build up your knowledge on how to do things.
I need to solve the next problem:
I am working in a system where you, as a user, can ask for some request to be attended. To have some idea, it is something like Uber: you as a user can, at any time, ask for a car to pick you up.
I have the historic of that requests (time and location) for the last two years and now I want to predict the amount of "jobs requested" for the next hour, day or week. I know some machine learning algorithms and procedures, but.
What do you think is the best way (or algorithm) to tackle this task?
stochastic process markov chain
It is a mathematic method to calculate the probability of changing your current state in future.
Take a look it could be quite helpful if you want to aproximate the number of job requests.
Since you have data of last 2 years, then using time series would be helpful to figure out any pattern either hourly based or weekly based as per requirement. you can also see if any pattern exist for a particular location in some particular time period. as in case of Uber, how many request are being made in time span of 12 noon to say 3p.m. for last month, that might give us a pattern to be followed in coming days.
I will go with time series(*if some pattern can be figured out).
I want to create reports for sequential, predetermined periods.
Essentially, I want to be able to:
Set a time period, for example from the 10th of one month to the 9th of the next. Then I want to be able to run a report and have the current period attached to the report, which in this example would be August 10, 2011 to September 9, 2011. Then suppose 6+ months later I run the report again, even though it's 6+ months later the period should be September 10, 2011 to October 9, 2011.
I've thought of creating a period model that would have 'begin', 'end', and 'current' fields. The 'begin' and 'end' fields would hold the numeric day values, i.e. continuing with the above example, 10 and 9 respectively. The 'current' field would hold the current end period date, which (using the above example) would be September 9, 2011. With the 'current' period and the 'beginning' and 'end' values I could then create the next logical period on demand. Further, I'd also have the opportunity to modify the period as needed.
While the above approach should work, it doesn't seem that efficient of an approach. Are there alternative approaches that are better? What can I do, if anything, to improve my approach?
Thanks.
There is an ideal technique to solve your problem. It's called test-driven development. It's ideal in this case since you can describe the solution without having to code it. Just by describing the solution, you develop many clues about what you should actually be coding.
Since you might really have no idea what code to write yet, Cucumber seems ideal in your case. You can describe the problem you want to solve in English, run that using Cucumber, and it will actually point you all the way towards the solution.
If you have some idea about what code you want to write, you might be better off using Test::Unit or RSpec directly. These methods are closer to the actual code: the problem you want to solve needs to be written in Ruby.
Pragmatic Programmer has excellent books about these two techniques.
Since the problem you've described is of particular value only to you, you will be hard pressed to find many actual implementations on StackOverflow I think. Using cucumber to describe the problem for yourself seems the best way to go forward.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am sure there are many developers out here who have team spread across different time zones. What are some of the challenges people face and whats the best way to tackle them?
I currently work in a different time zone than the rest of my team. The biggest challenge is early in the morning and late in the afternoon when some of us haven't started work for the day or some have already left for the day.
It's just part of the work effort and we all respect each others valuable time. If it's something critical (and that's a relative term) then we just call the team member or page/text the whole team. If that happens then we all respond as needed. No big deal. Because of the respect factor, we know to only use this if necessary.
During the normal working day we just use the standard stuff like email, phone, and IM.
In all honesty, any company that has split development of a project across time zones is out of touch with the realities of engineering. The MBAs, in an attempt to save themselves a buck or two, foist upon their engineers the unenviable position of altering the schedules of their lives - leading to high stress, longer work hours, lower morale, and higher turnover. Quality suffers, ship dates suffer, feature lists suffer. The only increase you'll see in your project is the bug count.
You can have engineering projects split up like this if you don't have any need for low-latency communication between project units. In other words - if they are working on almost entirely independent segments of the system.
We have a team that is distributed or has to work across 3 or 4 different timezones. In the course of this we have faced several challenges primarily relating to communication.
Meetings are tough to arrange at a convenient time for all team members to attend, so there can sometimes be a need to have subsets of the team meeting, or to forego the team meeting approach for an individual update approach where one primary team member is responsible for a particular overseas team.
Another issue is work handover and communication. For example, we have a resource in India and if they have a problem that causes them to stop work, it can take 2 or 3 days out of their schedule if we don't respond quickly enough, all due to the time difference. Therefore, it is imperitive that we not only schedule diverse work to fill these delays but that we also respond to their queries in a timely manner. We often assign testing tasks to resources in this particular timezone as that is often an asynchronous task without an end.
In addition, you need to have a good change management system and code repository. The more asynchronous you can make the communication channels, the better, and this goes for information exchange as well (such as source and issue tracking).
There is no reason why you can't make distributed teams work, especially in the current age where we can work from almost anywhere as long as we have a link to the Internet. However, it's important to know where your bottlenecks are in a project and to ensure that work is distributed accordingly.
If you dont have a great process which has tuned exclusively for this kind of a scenario, then different time zone is going to kill your project deadline. Atleast one side should be very flexible to adjust their time for the meetings. But ofcourse that will eventually create frustrations among the team members.
Check out this SO thread which talks about Outsourcing and its practical issues, I think you will get some points from there too https://stackoverflow.com/questions/111948/outsourcing
Or Outsourcing Tag - https://stackoverflow.com/questions/tagged/outsourcing
We have these kinds of issues with support - we are using 3 commercial SDKs and the support teams are in distant time zones (8-10 hours difference). Moreover, not all work days overlap.
This fact had a big impact on my reverse engineering abilities :)
Plan, plan and plan some more. A few other things to note:
1) Be aware of local holidays if the team is in other locales, e.g. different countries may have different holidays. For example, some sects celebrate some Chrisian holidays 2 weeks later than most Christians,e.g. some orthodox sects I'm thinking.
2) Plan on meetings at a specific time that may be outside the normal work hours. This was particularly true when there was a 13 hour time difference between the rest of the team and myself.
3) Be aware of "Core hours" with the time change, e.g. if I'm on Pacific time and want to update something in New Jersey on Eastern time and I do it at 5pm Pacific time, that is 8 pm Eastern time and there may not be anyone there to notice the change or test it before the following morning which may mean some support person gets up at 4:30am Pacific time as in the East some have started to show up for work and go, "Huh? Why isn't this working like it did yesterday?"
There is also the other obvious things of being aware if there are various alphabets involved, e.g. Latin, Cyrillic, Arabic, etc. and this may affect how a computer interprets some text entered.
Put it another way: what code have you written that cannot fail. I'm interested in hearing from those who have worked on projects dealing with heart monitors, water testing, economic fundamentals, missile trajectories, or the O2 concentration on the space shuttle.
How did you prepare for writing this sort of code: methodologically, intellectually, and emotionally?
Edit
I've marked this wiki in case the rep issue is keeping people from replying. I thought there would be a good deal more perspective on this issue than there has been.
While I am not personally involved in what is described there, this article will hopefully contribute to the spirit of your question: They Write the Right Stuff.
I wrote a driver for a blood pressure measuring device for hospital use. If it "fails", the patient will not have his blood pressure checked at the scheduled time; if his blood pressure is abnormal, no alarm (in the larger system) will be triggered. Such an event could be clinically significant.
My approach was to thoroughly read the spec/documentation in a non-work environment (to avoid the temptation to start coding right away), then read it again at work. After that, I summarized the possible states and actions on paper and "flowcharted" an algorithm, and annotated all the potential real-world "bad events" (cables getting unplugged, batteries dying, etc). Finally, I wrote and rewrote the driver three times, each with different mechanisms (e.g. FSM), and compared their results. Each iteration helped me identify weaknesses I hadn't yet discovered. The third rewrite was the "official" result. I reviewed each iteration with my co-worker.
Emotional preparation consisted of convincing myself that should the unthinkable happen, at least I wasn't willfully negligent -- just incompetent (the old "I'm only human" excuse). ;-)
I have written computer interface to a MRI machine. It had no chance of hurting the end user as it was just record management, but it could potentially have given an incorrect diagnosis or omit important information.
Tests, lots and lots of tests.
Unit tests, mid and high level tests. Simulate all possible input combinations. Also a great deal of testing with the hardware itself. Testing must be done in a complete and methodical way. It should take a great deal more time to test than to write.
Error Reporting
All errors must be reported and be obvious. If it won't hurt the patent to do so, fail fast.
For something that is actively keeping a person alive things are even worse. It must never stop working. If it fails it needs to restart and keep trying. Redundant internals are also a must in case the hardware fails.
At the wrong company it can really a difficult kind of situation to work in. However, if things are going well, you are well funded and release pressure is not high, it can be a very rewarding space to work in.
Not really an answer, but:
I've got a friend who writes embedded control software for laser eye surgery machines. When he had laser eye surgery himself, he made sure to go to an ophthalmologist who used his company's system. I have great admiration for this guy. I can't think of a piece of software I've ever written whose level of quality was high enough that I'd trust my own eyesight to it.
Right now I'm working on some base code for a system that retrieves medical patient information from clinics and hospitals for a medical billing office. We're starting out with a smaller client and a long break-in period to ensure quality, but eventually this code needs to securely handle a large variety of report formats from a number of clients at different facilities.
It's not quite in the same scale as your examples, but a bad mistake could result in the wrong people being billed or the right person billed to a defunct address (screwing up credit reports) or open people up to identity theft, so it's still pretty critical. Oh yeah, and it could mean doctors don't get paid quite as quick. That's important, too, especially from a business perspective, but not in the same class as data protection and integrity.
I've heard crazy stories of the processes used to write code at NASA for the spaceshuttles. Every line of code has about 10-20 lines of documentation, along with tests, full revision history, etc. Every time a bug is found, not only is the code evaluated and repaired, but the entire procedure of writing code, the entire command chain, etc. is reviewed to answer the question: "What happened wrong in our process that allowed this bug to get included in the first place?"
While nothing quite so important as an MRI machine or a blood pressure monitor, I did get tapped to do a rewrite of Blackjack when I worked for an online gambling provider. Blackjack is by far the most popular online game, and millions of dollars was going to go through this software (and did).
I wrote the game engine separate from the server and the client, and used Test Driven Development to ensure that what I was assuming was coming through in the results. I also had a wrapper "server" that had console output that would allow me to play. This was actually only useful in that it mimicked the real server interface, since playing a text version of blackjack isn't very fun or easy ("You draw a 10. You now have a 10 and a 6, while the dealer has a 6 showing. [bsd] >")
The game is still being run on some sites out there, and to my knowledge, has never had any financial bugs after years of play.
My first "real" software job was writing a GUI app for planning stereotactic brain surgery. Testing, testing, testing... absolutely no formal methods, engineering-style thoughts, just younger programmers cranking it out. When they started talking about using the software to control a robotic arm with a laser, without any serious engineering methods in place, i got a bit worried, left for more officey lands.
I've created information system application for local government cultures and tourism department in Bali island which were installed in several tourism denstinations, providing extensive informations about the culture, maps, accomodations etc.
if it failed then probably tourists couldnt get the right informations they need most, cheat by brookers, or lost somewhere :)