Prediction of amount of job for the next day - machine-learning

I need to solve the next problem:
I am working in a system where you, as a user, can ask for some request to be attended. To have some idea, it is something like Uber: you as a user can, at any time, ask for a car to pick you up.
I have the historic of that requests (time and location) for the last two years and now I want to predict the amount of "jobs requested" for the next hour, day or week. I know some machine learning algorithms and procedures, but.
What do you think is the best way (or algorithm) to tackle this task?

stochastic process markov chain
It is a mathematic method to calculate the probability of changing your current state in future.
Take a look it could be quite helpful if you want to aproximate the number of job requests.

Since you have data of last 2 years, then using time series would be helpful to figure out any pattern either hourly based or weekly based as per requirement. you can also see if any pattern exist for a particular location in some particular time period. as in case of Uber, how many request are being made in time span of 12 noon to say 3p.m. for last month, that might give us a pattern to be followed in coming days.
I will go with time series(*if some pattern can be figured out).

Related

How to estimate number of days and log daily work in Jira? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I am a newbie to Agile methodology and I somewhat am afraid when I am working on Jira User Stories assigned to me. I have below questions.
How to provide estimate for a task or user story assigned to me? I mean what all factors should I consider? There are some stories for which I have to do some research and then develop. While other members in the team provide estimate in days for their user stories, I feel afraid to ask for more.
Once I am working on the user story, how should I log my work. I mean, once I am researching for the topic, does those hours also count or only those hours when I am actually writing code. Also, out of the total 9 hours per day, do I have to log 9 hours complete for the day or just those hours in which I actually worked and not counting lunch time and meeting times.
If I don't log 9 hours per day, the number of days I will work will surpass the assigned number of days. Is that so?
Any help is appreciated.
How to provide estimate for a task or user story assigned to me? I mean what all factors should I consider? There are some stories for which I have to do some research and then develop.
An approach that works for me is to start with a guess of how long something will take and then add some extra time for:
Any uncertainty
Unusually complex tasks
Tasks with lots of dependencies
While other members in the team provide estimate in days for their user stories, I feel afraid to ask for more.
My advice would be to under promise and over deliver. For example, if you estimate 2 days and it actually takes less than 2 days then people will be happy. If you continually underestimate how much time it takes to do tasks then it will be disruptive and unpopular.
Once I am working on the user story, how should I log my work. I mean, once I am researching for the topic, does those hours also count or only those hours when I am actually writing code.
Everything you do towards completing a task should be included in the estimate. That includes if you have to do research or background reading. Remember that when you learn something new it is valuable to your organisation as it improves your capability. They should want you to be learning!
If I don't log 9 hours per day, the number of days I will work will surpass the assigned number of days. Is that so?
In development we usually estimate in ideal days. An ideal day imagines if you only worked on the one task and had no other distractions. The number of ideal days worked is never the same as the number of actual days worked. It is not unusual for an ideal day to take 1.5 or more real days.
This is what I do in my company:
i. Business requirements gathered from manager/product owner (PO) with businesss owner / stakeholders.
ii. PO and developers list down all the features that are needed to implement, from business scopes into features.
iii. Discussion to put high/medium/low priorities for v1, v1,1, v1,2 and so on (Google Minimum Viable Product (MVP)), agreed by business owner.
iv. Product owner/manager/developers and designer to come out wireframes (usually sketches) with those features discusssed earlier.
v. Confirmed wireframe by product owner and developers with stakeholders, the designer comes out final UI.
vi. Create stories, from there you plan how long each stories. If the feature is huge, make it as Epic, split into smaller stories, and plan the time for implementation (based on the developer who is going to do, NOT your manager to decide your implementation time). If you afraid to ask for more, and if you can't finish the task on time, that's your problem. Always be honest and frank to everyone, that's your team, they won't bite. If you fail, everyone fails. So if you finish earlier, pick new stories inside Backlog, and plan your estimation better in next sprint. You can ask for some "buffer" time to do research on certain features which is unclear in implementation for stories estimation, before implementation is officially started.
Log ONLY your working hours on that stories. Never log your lunch time (of course!). My previous company was doing "realistic" and "honest" Agile, putting realistic estimation of 6 hours/day for developers (admit that devs as human beings are using the rest 2 hours to watch cats videos, drink water, toileting, flirting, chit-chat, slacking etc).
Refer to 2. You should plan your estimation better, based on your experience and ability on spending on your tasks. If you finish all the tasks very earlier than expected estimation time, or over-commit your promise, are considered bad/failed sprint. Improve it on next sprint.

Where to begin for basic machine algorithms for, say, document recognition and organization?

Pardon if this question is not appropriate. It is kind of specific and I am not asking for actual code but moreso guidance on whether or not this task is worth undertaking. If this is not the place, please close the question and kindly point me in the correct direction.
Short background: I have always been interested in tinkering. I used to play with partitions and OS X scripts when I was younger, eventually reaching basic-level "general programming" aptitude before my father prohibited my computer usage. I am now going to law school and working at a law firm but I love development and I want to implement more tech innovation in the field.
Main point: At our firm, we have a busy season every year from mid march to the first week of april (immigration + H1B deadline). We receive a lot of documents and scanned files that need to be verified, organized, and checked.
I added (very) simple lines of code to our online platform to help in organization; basically, I attached tags to all incoming documents, and once they were verified, the code would organize them by tag (like "identification doc", "work experience doc" etc.). This would my life much easier every year, as I end up working 100+ hour weeks this season.
I want to take this many steps further with an algorithm that can check for signatures and data mismatches between documents and ultimately organize the documents so they are ready to print. Eventually, I would like to maybe even implement machine learning and a very basic neural network to automate the whole mind-numbing and painful process...
Actual Question(s): I just wanted to know the best way for me to proceed or get started. I know a decent amount of python and java, and we have an online platform already with the documents. What other resources would you recommend in terms of books, videos, or even classes? Is there a name for this kind of basic categorization? Can I build something like this through my own effort without an advanced degree?
Stupid and over-dramatic epilogue: Truth be told, a part of me feels like I wasted my life thus far by not pursuing what I knew I loved at the age of 12. This is my way of making amends I guess, and if I can do this then maybe I can keep doing it in law and beyond...
You don't give many specifics about the task but if you have a finite number of forms in digital form as images, then this seems very possible.
I have personally used OpenCV with Python a lot and more complex machine learning tasks have become increasingly simple in the past 10 years.
Take for example object detection (e.g. 1, 2) to check whether there is anything in a signature field or try extracting the date from an image (e.g 1, 2).
I would suggest you start with the simplest thing that would improve your work. A small and easy task will let you build up your knowledge on how to do things.

How to store last 'n' day/week/month/year aggregates on a stream of data?

What is the best way to store data in such a way that I can get real time answers of queries like "give me count of last 2 weeks of failed transactions", "give count of accounts created in last 2 years from now". Counting number of rows every time is not an option as number of individual entries in table is huge and may take hours to compute.
I am only interested in finding aggregates in real time in a rolling window fashion. Also, I do not want to retain data older than 2 years and want that to get removed automatically.
Is there any standard way of solving this problem? Do services like redshift/kinesis be helpful?
Thanks in anticipation.
For most data warehousing solutions, we construct aggregate tables with resolutions down to the business date which makes reporting 2 or more years of data very fast. Kinesis can certainly help Redshift ingest data at a high throughput which would then allow you to update the day's aggregation counts in real time. The only difficulty with this approach is that you need to know what aggregations you want to report on ahead of time so you can set them up, but a decent business analyst should be able to provide you with the majority of covering metrics at the start.

How does gmail extract time and date from text

I was going through my mails, and saw that gmail automatically suggested me to add coming friday around 5pm to an event on 21st Feb. I am surprised how gmail does this ?
I mean how did it correctly figure out that this friday meant the coming friday, and also that the 5 PM is linked with Friday.
I am a newbie in NLP and machine learning, so if someone can explain it to me in layman terms I would be very glad
I don't think this needs a lot of machine learning as such. A bit of NLP is helpful to get the dependencies from the sentence but even that isn't strictly necessary.
You could start off with just looking at keywords monday,tuesday etc. and then do a look around to see what is around them last monday, next monday, coming monday, previous monday and so on. These are called window features because they provide a window +/- 1,2,3 ... around the feature you are interested in monday. The around 5pm you could theoretically also get from just looking at window features, I don't have an intuition as to how noisy that would be. Try to think of all the ways of expressing time in that context and then think of those ways can be mixed up with something else. Of the top of my head it would seem relatively easy to do that.
Anyhow, the other way is to use a dependency parser to extract the grammatical relations of the elements in the sentence. This requires you to Part of Speech (POS) tag the sentence (after splitting it into tokens). The POS tagger would need to be trained to recognize that friday and monday are nouns, perhaps even that they are temporal expressions, same goes for 5pm and around 5pm. That does require machine learning and a lot of it. The benefit Google has as opposed to others is that they have a lot of data, which allows them to have lots and lots and lots of examples of different ways expressing what essentially is the same thing. This gives their models a lot of breadth. Once you've got the sentence POS tagged, you feed it to a dependency parser (such as the Stanford Dependency Parser) which tells you what the relation between all the different tokens in the sentence is.
Again Google has a lot of data which helps. On top of all this Google has had years to hone the output of the models so that when the models isn't entirely sure what is going on it won't highlight/extract the result. In terms of actually applying NLP in the real world this last step is very important because it given people confidence in what the system is doing. Basically if the software isn't sure what is happening do nothing, because doing something risks doing the wrong thing which then reduces people's confidence on the system as a whole.
Releasing a reliable easy to use NLP application requires a tradeoff between the quality of the NLP/Machine Learning and general software engineering to hide all the parts where the NLP fails from the users.
Try sending yourself email(s) with time expressed in different ways and see which ones Google gets and which ones it doesn't. For instance
Can we meet Friday next week?
How about coffee next week's Friday at 2pm
I can't do Friday but I can meet Wednesday at 4pm
and so on, it's always interesting to poke holes in technology. It can also reveal quite a lot about what it is doing, and how it is doing it.

How to Create and Manage Time Periods

I want to create reports for sequential, predetermined periods.
Essentially, I want to be able to:
Set a time period, for example from the 10th of one month to the 9th of the next. Then I want to be able to run a report and have the current period attached to the report, which in this example would be August 10, 2011 to September 9, 2011. Then suppose 6+ months later I run the report again, even though it's 6+ months later the period should be September 10, 2011 to October 9, 2011.
I've thought of creating a period model that would have 'begin', 'end', and 'current' fields. The 'begin' and 'end' fields would hold the numeric day values, i.e. continuing with the above example, 10 and 9 respectively. The 'current' field would hold the current end period date, which (using the above example) would be September 9, 2011. With the 'current' period and the 'beginning' and 'end' values I could then create the next logical period on demand. Further, I'd also have the opportunity to modify the period as needed.
While the above approach should work, it doesn't seem that efficient of an approach. Are there alternative approaches that are better? What can I do, if anything, to improve my approach?
Thanks.
There is an ideal technique to solve your problem. It's called test-driven development. It's ideal in this case since you can describe the solution without having to code it. Just by describing the solution, you develop many clues about what you should actually be coding.
Since you might really have no idea what code to write yet, Cucumber seems ideal in your case. You can describe the problem you want to solve in English, run that using Cucumber, and it will actually point you all the way towards the solution.
If you have some idea about what code you want to write, you might be better off using Test::Unit or RSpec directly. These methods are closer to the actual code: the problem you want to solve needs to be written in Ruby.
Pragmatic Programmer has excellent books about these two techniques.
Since the problem you've described is of particular value only to you, you will be hard pressed to find many actual implementations on StackOverflow I think. Using cucumber to describe the problem for yourself seems the best way to go forward.

Resources