Occasionally FedEx or UPS may be unavailable to my app servers, or I need to process 100s of packages for a single transaction.
In these cases an estimate is better than nothing.
Currently I cache results for the exact parameters and "rounded" parameters, eg from_zip[:2], round(weight, 10)
What techniques should I look at to do better than this?
I think that a better approach would be to use some kind or interpolation to perform a proximity search of the target price. It can be as simple as finding 2 bounding "points" and interpolating the price of the target point, probably also using a "distance" threshold to not generate too "wild" guesses.
Either way, it's very important to inform the users that the prices are estimates, and subject to change.
Rather than using distance, just use a zone lookup table. You should be able to download a zone chart for your account plus associated rates to create a simpler lookup.
Related
I am currently using Firestore for my iOS app and I need to implement a scalable solution for my posts feed. I need to get posts within say 20 miles, order them by date, and limit the amount of posts fetched for pagination. Any and all database solutions would very much appreciated! Thank you!
As a low budget/time alternative to libraries, we have implemented storing the first few digits of lat/long coordinates as a document or collection name and then accessed data that way. The first decimal place gives resolution to around 10 miles or so (exact values for longitude change depending on what latitude you are at). So in your database you could have a collection or document named something like +33.6-112.0. This would mark a reference in Firestore to put all data within (33.8 N, 112.0 W). Be careful with how you round the exact location data before placing it in the respective document or collection.
Then you can retrieve all data at any location you want. This may not give you exactly 20 miles, but some client side sorting can handle that. Note you could make the reference go to any decimal place necessary to achieve the level of precision you are looking for to minimize data base calls (to save you money) and minimize impact on the user's cell data plan.
This is a rather simple solution with limitations, maybe for an MVP, and if not careful could pull way more data than anticipated.
Below is a chart showing the approximate physical distance between each decimal place at the equator. So for example, the distance between (33.3 N, 0 W) and (33.5 N, 0 W) would be about 14 miles.
Neither of those databases have native geospatial querying capabilities. You would have to use some sort of add-on library to help with that. Geofire and Geofirestore are popular for this.
I am predicting stock price of a company , I have used everyday changes in time series, but the negative changes are needed and I cant use log transformation on it. So is it ok if I model the sign as one more variable.
If you normalize and have a single variable, time-series has the property to take care of crests and troughs . There wont be any need of having the signs.
I faced this same dilema sometime back. I figured its not that needed unless you want to give pointer on weather the prices will go up or down also in future.Otherwise no.
Can you give a more clear picture on whats the objective and what data are you considering !
Is it possible to aggregate measurements or create custom queries beyond the standard dateFrom dateTo queries?
As an example, I have measurements which have a time delta of 1 minute (2015-01-01T05:05:00, 2015-01-01T05:05:00, 2015-01-01T05:05:00, ...) and I would like to query the measurements at 15 minute intervals (2015-01-01T05:15:00, 2015-01-01T05:30:00, 2015-01-01T05:45:00, ...)
So far I have only come up with these solutions:
Using the standard api request as in
https://tenant.cumulocity.com/measurement/measurements?dateFrom=2015-10-01&dateTo=2015-11-05
and then throwing away most of the data will use a massive amount of time loading the data.
Using cep (cumulocity event language) to generate a new measurement every 15 minutes using the nearest 1 minute measurement seems like a bit of overkill and not very elegant.
Batch requesting the exact minute
https://tenant.cumulocity.com/measurement/measurements?dateFrom=2015-11-05T05:15:00%2B01:00&dateTo=2015-11-05T05:16:00%2B01:00
which will in a massive amount of API requests and also does not seem very efficient.
Use the /measurements/series endpoint which will only give me all series, even those I do not want, as well as only having the aggregation options hourly and daily (as far as I can tell).
Is there a better way of doing this?
you have captured nearly all of the mechanisms that are currently available. There is one more possibility -- not sure if this is an option for you:
Mark the fifteenth measurement when sending it from the device, using e.g. a different type.
I would normally use 2. It's actually quite efficient, it's similar to a materialized view in traditional SQL, plus you can use the data everywhere and in all widgets.
Good luck :-)
Cheers,
André
I would prefer the CEP solution. The rule wouldn't be that complicated. You would of course then store these measurements twice which is not that nice but having your desired measurement with a specific type or fragment will give you the fastest way to query it.
Instead of copying the measurement you could just add a special fragment to the measurement every 15 min in the CEP rule. You cannot update measurements so you would have to delete the measurement incoming every 15 min and then create a new measurement with exactly the same values but add a fragement (e.g. "aggregatedMeasurement": {}).
Your query then looks like this:
https://tenant.cumulocity.com/measurement/measurements?dateFrom=2015-10-01&dateTo=2015-11-05&fragmentType=aggregatedMeasurement
One more idea for point 3:
You could use SmartREST to create a template with the query string and leave the dateFrom and dateTo as placeholders.
From the client side you then would have to make only one request using the bulking feature in SmartREST.
On the server side this would still be transformed into the single requests so you wouldn't gain anything in speed.
I have problem in my new rails project.I want to implement a function which can show the user's info completeness by a bar like Linkedin.
I think I can use a variable to record the completeness,but I don't have any idea about how to calculate it.
P.S I have two Model,one is the User Model,another is the Info Model.
This is, in fact, completely arbitrary. It's based entirely on which activities on the site you want to encourage.
A couple of mechanisms you can consider:
Model "accomplishments" with a completed/not completed status. Count up the ones you care about. Store the accomplishments based on activity either as they happen or at the end of the day in some batch job. For each user, calculate the percentage with the usual math (accomplishments completed/sum of available accomplishments) * 100 = percentage.
A variation of the same, but weighted based on what you consider more valuable contributions. In this case, the math is basically sum of (weight n * accomplishment n)/total weight.
The previous Careers.stackoverflow.com model made a geeky joke about Spinal Tap by making it possible to have counts greater than 100%. You can do that simply by undercounting the maximum accomplishments.
I have a lot of logfile data that I want to display dynamic graphs from, for basically arbitrary time periods, optionally filtered or aggregated by different columns (that I could pregenerate). I'm wondering about the best way to store the data in a database and access it for displaying charts, when:
the time resolution should be variable from one second to a year
there are entries that span several 'time buckets', e.g. a connection might have been open for a few days and I want to count and display the user for every hour she was connected, not just in the hour 'slot' the connection was created or finished
Are there best practices, or tools/plugins for rails that help handle this kind and amount of data? Are there maybe database engines specifically tailored towards this, or having helpful functions (e.g. CouchDB indexes)?
EDIT: I'm looking for a scalable way to handle this data and access pattern. Things we considered: Run a query for each bucket, merge in app - probably way too slow. GROUP BY timestamp/granularity - does not count connections correctly. Preprocessing data into rows by smallest granularity and downsampling on query - probably the best way.
I think you can use mysql timestamps for this.
The way I solved it in the end was to pre-process the data into per-minute buckets, so there's one row for every event and minute. That makes it easy and fast enough to select and yields correct results. To get different granularity, you can do integer arithmetic on the timestamp columns - select abs(timestamp/factor)*factor and group by abs(timestamp/factor)*factor.