Methods of reducing URL size? - url

So, we have a very large and complex website that requires a lot of state information to be placed in the URL. Most of the time, this is just peachy and the app works well. However, there are (an increasing number of) instances where the URL length gets reaaaaallllly long. This causes huge problems in IE because of the URL length restriction.
I'm wondering, what strategies/methods have people used to reduce the length of their URLs? Specifically, I'd just need to reduce certain parameters in the URL, maybe not the entire thing.
In the past, we've pushed some of this state data into session... however this decreases addressability in our application (which is really important). So, any strategy which can maintain addressability would be favored.
Thanks!
Edit: To answer some questions and clarify a little, most of our parameters aren't an issue... however some of them are dynamically generated with the possibility of being very long. These parameters can contain anything legal in a URL (meaning they aren't just numbers or just letters, could be anything). Case sensitivity may or may not matter.
Also, ideally we could convert these to POST, however due to the immense architectural changes required for that, I don't think that is really possible.

If you don't want to store that data in the session scope, you can:
Send the data as a POST parameter (in a hidden field), so data will be sent in the HTTP request body instead of the URL
Store the data in a database and pass a key (that gives you access to the corresponding database record) back and forth, which opens a lot of scalability and maybe security issues. I suppose this approach is similar to use the session scope.

most of our parameters aren't an issue... however some of them are dynamically generated with the possibility of being very long
I don't see a way to get around this if you want to keep full state info in the URL without resorting to storing data in the session, or permanently on server side.
You might save a few bytes using some compression algorithm, but it will make the URLs unreadable, most algorithms are not very efficient on small strings, and compressing does not produce predictable results.
The only other ideas that come to mind are
Shortening parameter names (query => q, page=> p...) might save a few bytes
If the parameter order is very static, using mod_rewritten directory structures /url/param1/param2/param3 may save a few bytes because you don't need to use parameter names
Whatever data is repetitive and can be "shortened" back into numeric IDs or shorter identifiers (like place names of company branches, product names, ...) keep in an internal, global, permanent lookup table (London => 1, Paris => 2...)
Other than that, I think storing data on server side, identified by a random key as #Guido already suggests, is the only real way. The up side is that you have no size limit at all: An URL like
example.com/?key=A23H7230sJFC
can "contain" as much information on server side as you want.
The down side, of course, is that in order for these URLs to work reliably, you'll have to keep the data on your server indefinitely. It's like having your own little URL shortening service... Whether that is an attractive option, will depend on the overall situation.
I think that's pretty much it!

One option which is good when they really are navigatable parameters is to work these parameters into the first section of the URL e.g.
http://example.site.com/ViewPerson.xx?PersonID=123
=>
http://example.site.com/View/Person/123/

If the data in the URL is automatically generated can't you just generate it again when needed?
With little information it is hard to think of a solution but I'd start by researching what RESTful architectures do in terms of using hypermedia (i.e. links) to keep state. REST in Practice (http://tinyurl.com/287r6wk) is a very good book on this very topic.

Not sure what application you are using. I have had the same problem and I use a couple of solutions (ASP.NET):
Use Server.Transfer and HttpContext (PreviousPage in .Net 2+) to get access to a public property of the source page which holds the data.
Use Server.Transfer along with a hidden field in the source page.
Using compression on querystring.

Related

Should I worry about my API Keys being extracted from the iOS app

I need to make requests to the Google Books API form my app which includes the API key in the URL.
I thought about just creating it as file private variable in my app, though this is a big problem because it would then be uploaded to Github.
Then I thought about environment variables but I heard they aren't included if the app isn't run by Xcode.
I'm aware that this way the key could be extracted, but should I worry?
Can't users anyway just use Wireshark or something similar and see the key in the URL?
And I can restrict the key so it is only valid when called from my Bundle ID.
What do you think would be the best option for making the calls? I mean other than that, the app barely gets 10 downloads a week so this can't be too big of an issue, right?
Whether it is an issue entirely depends on your usecase and threat model. Consider your api key public if you include or send it in any way in/from your app, and think about things like what can people do with it. What level of harm can they cause you? This gives yo the impact. Would they be motivated, for example is there a financial benefit for them somehow? This estimates the likelihood of this happening. This together, impact x likelihood = risk, which you can either accept (do nothing about it), mitigate (decrease the impact or likelihood), eliminate (fix it) or transfer (rg. buy some kind of an insurance).
As for mitigations, can you limit the api key scope, so that only necessary things can be done on the api with it? Can you set up rate limiting? Monitoring, alerting? I'm not familiar with the Books api, but these could be mitigating controls.
As for eliminating the risk, you should not put the api key in the app. You could set up your own server, which would hold the api key, and would pretty much forward requests to theBooks api, augmented with thr api key. Note though that you would still need some kind of authentication and access control in your server, otherwise it can just be used as an oracle by an attacker to perform anything in the actual Books api the same as if they had the key, only in this case they don't need it. This role could also be fulfilled by some kind of an api gateway, which can also add data to api queries.
Eliminating the risk is obviously more expensive. Defenses should be proportionate to risk, so you have to decide whether it is worth it.

Automatic model cache expiry in Rails

I was reading a few guides on caching in Rails but I am missing something important that I cannot reconcile.
I understand the concept of auto expiring cache keys, and how they are built off a derivative of the model's updated_at attribute, but I cannot figure out how it knows what the updated_at is without first doing a database look-up (which is exactly what the cache is partly designed to avoid)?
For example:
cache #post
Will store the result in a cache key something like:
posts/2-20110501232725
As I understand auto expiring cache keys in Rails, if the #post is updated (and the updated_at attribute is changed, then the key will change. But What I cannot figure out, is how will subsequent look-ups to #post know how to get the key without doing a database look-up to GET the new updated_at value? Doesn't Rails have to KNOW what #post.updated_at is before it can access the cached version?
In other words, if the key contains the updated_at time stamp, how can you look-up the cache without first knowing what it is?
In your example, you can't avoid hitting the database. However, the intent of this kind of caching is to avoid doing additional work that is only necessary to do once every time the post changes. Looking up a single row from the database should be extremely quick, and then based on the results of that lookup, you can avoid doing extra work that is more expensive than that single lookup.
You haven't specified exactly, but I suspect you're doing this in a view. In that case, the goal would be to avoid fragment building that won't change until the post does. Iteration of various attributes associated with the post and emission of markup to render those attributes can be expensive, depending on the work being done, so given that you have a post already, being able to avoid that work is the gain achieved in this case.
As I understand your question. You're trying to figure out the black magic of how caching works. Good luck.
But I think the underlying question is how do updates happen?
A cache element should have a logical key based on some part of the element, e.g. compound key, some key name based on the id for the item. You build this key to call the cache fragment when you need it. The key is always the same otherwise you can't have certainly that you're getting what you want.
One underlying assumption of caching is that the cache value is transient, i.e. if it goes away or is out of date its not a big deal. If it is a big deal then caching isn't the solution to your problem. Caching is meant to alleviate high load, i.e. a lot of traffic hitting the same thing in your database. Similar to a weblog where 1,000,000 people might be reading a particular blog post. Its not meant to speed up your database. That is done through SQL optimization, sharding, etc.
If you use Dalli as your cache store then you can set the expiry.
https://stackoverflow.com/a/18088797/793330
http://www.ruby-doc.org/gems/docs/j/jashmenn-dalli-1.0.3/Dalli/Client.html
Essentially a caching loop in Rails AFAIK works like this:
So to answer your question:
The key gets updated when you update it. An operation that is tied to the update of the post. You can set an expiry time, which essentially accomplishes the desired result by forcing the cache update via a new lookup/cache write. As far as the cache is concerned its always reading the cache element that corresponds to the key. If it gets updated, then it will read the updated element, but its not the cache's responsibility to check against the database.
What you might be looking for is something like a prepared statement. Tenderlove on Prepared Statements or a faster datastore like a less safe Postgres (i.e. tuned to NoSQL without ACID) or a NoSQL type of database here.
Also do you have indexes in your database? DB requests will be slow without proper indexes. You might just need to "tune" your database.
Also there is a wonderful gem called cells which allows you to do a lot more with your views, including faster returns vs rendering partials, at least in my experience. It also has some caching functions.

Getting most recent paths visited across sessions in Rails app

I have a simple rails app with no database and no controllers. It uses High Voltage for routing queries, then uses javascript to go get data using the params hash.
A typical URL looks like this:
http://example.com/?id=37ed660aa222e61ebbbc02db
I'd like to grab the ten unique URLs users have most recently visited and pass them to a view. Note that I said users, preferably across concurrent sessions.
Is there a way to retrieve this using ActiveSupport::Notifications or Production.log? Any examples, including where the code should best go, would be greatly appreciated!
I think that Redis would be ideally suited to this. It's one of the NoSQL key-value store db's, but its support for the value part being an ordered list, queue, etc. should make it easy to store unique urls in a FIFO list as they are visited, limit the size of that list (discard urls at the 'old' end of the list), and retrieve the most recent N urls to pass to your view. Your list should stay small enough that it would all stay in memory and be very fast. You might be able to do this with memcached or mongo or another one as well; I think it would be best though if the solution kept the stored values in memory.
If you aren't already using redis (or similar), it might seem like overkill to set it up and maintain just for this feature. But you can make it pay for itself by also using it for caching, background job processing (Resque / Sidekiq), and probably other things in your app.

Multiple Uploads to website simultaneosly

I am building a ASP.Net website and the website accepts a PDF as input and processes them. I am generating an intermediate file with a particular name. But I want to know if multiple users are using the same site at the same time, then how will the server handle this.
How can I handle this. Will Multi-Threading do the job? What about the file names of the intermediate files I am generating? How can I make sure they won't override each other. How to achieve performance?
Sorry if the question is too basic for you.
I'm not into .NET but it sounds like a generic problem anyways, so here are my two cents.
Like you said, multithreading (as usually different requests run in different threads) takes care for most of that kind of problems, as every method invocation involves new objects run in a separate context.
There are exceptions, though:
- Singleton (global) objects whose any of their operations have side effects
- Other resources (files, etc. ), this is exactly your case.
So in the case of files, I'd ponder these (mutually exclusive) alternatives:
(1) Never write the uploaded file to disk, instead hold it into memory and process it in there (like in byte array). In this case you're leveraging the thread-per-request protection. This one cannot be applied if your files are really big.
(2) Choose very randomized names (like UUID) to write them into a temporary location so their names won't clash if two users upload at the same time.
I'd go with (1) whenever possible.
Best

Core Data on client (iOS) to cache data from a server Strategy

I have written many iOS apps that was communicating with the backend. Almost every time, I used HTTP cache to cache queries and parse the response data (JSON) into objective-C objects. For this new project, I'm wondering if a Core Data approach would make sense.
Here's what I thought:
The iOS client makes request to the server and parse the objects from JSON to CoreData models.
Every time I need a new object, instead of fetching the server directly, I parse CoreData to see if I already made that request. If that object exists and hasn't expired, I use the fetched object.
However, if the object doesn't exist or has expired (Some caching logic would be applied here), I would fetch the object from the server and update CoreData accordingly.
I think having such an architecture could help with the following:
1. Avoid unnecessary queries to the backend
2. Allow a full support for offline browsing (You can still make relational queries with DataCore's RDBMS)
Now here's my question to SO Gods:
I know this kinda requires to code the backend logic a second time (Server + CoreData) but is this overkill?
Any limitation that I have under estimated?
Any other idea?
First of all, If you're a registered iOS Dev, you should have access to the WWDC 2010 Sessions. One of those sessions covered a bit of what you're talking about: "Session 117, Building a Server-driven User Experience". You should be able to find it on iTunes.
A smart combination of REST / JSON / Core Data works like a charm and is a huge time-saver if you plan to reuse your code, but will require knowledge about HTTP (and knowledge about Core Data, if you want your apps to perform well and safe).
So the key is to understand REST and Core Data.
Understanding REST means Understanding HTTP Methods (GET, POST, PUT, DELETE, ...HEAD ?) and Response-Codes (2xx, 3xx, 4xx, 5xx) and Headers (Last-Modified, If-Modified-Since, Etag, ...)
Understanding Core Data means knowing how to design your Model, setting up relations, handling time-consuming operations (deletes, inserts, updates), and how to make things happen in the background so your UI keeps responsive. And of course how to query locally on sqlite (eg. for prefetching id's so you can update objects instead of create new ones once you get their server-side equivalents).
If you plan to implement a reusable API for the tasks you mentioned, you should make sure you understand REST and Core Data, because that's where you will probably do the most coding. (Existing API's - ASIHttpRequest for the network layer (or any other) and any good JSON lib (eg. SBJSON) for parsing will do the job.
The key to make such an API simple is to have your server provide a RESTful Service, and your Entities holding the required attributes (dateCreated, dateLastModified, etc.) so you can create Requests (easily done with ASIHttpRequest, be they GET, PUT, POST, DELETE) and add the appropriate Http-Headers, e.g. for a Conditional GET: If-Modified-Since.
If you already feel comfortable with Core Data and can handle JSON and can easily do HTTP Request and handle Responses (again, ASIHttpRequest helps a lot here, but there are others, or you can stick to the lower-level Apple NS-Classes and do it yourself), then all you need is to set the correct HTTP Headers for your Requests, and handle the Http-Response-Codes appropriately (assuming your Server is REST-ful).
If your primary goal is to avoid to re-update a Core-Data entity from a server-side equivalent, just make sure you have a "last-modified" attribute in your entity, and do a conditional GET to the server (setting the "If-Modified-Since" Http-Header to your entities "last-modified" date. The server will respond with Status-Code 304 (Not-Modified) if that resource didn't change (assuming the server is REST-ful). If it changed, the server will set the "Last-Modified" Http-Header to the date the last change was made, will respond with Status-Code 200 and deliver the resource in the body (eg. in JSON format).
So, as always, the answer is to your question is as always probably 'it depends'.
It mostly depends what you'd like to put in your reusable do-it-all core-data/rest layer.
To tell you numbers: It took me 6 months (in my spare time, at a pace of 3-10 hours per week) to have mine where I wanted it to be, and honestly I'm still refactoring, renaming, to let it handle special use-cases (cancellation of requests, roll-backs etc) and provide fine-grained call-backs (reachability, network-layer, serialization, core data saving...), . But it's pretty clean and elaborate and optimized and hopefully fits my employer's general needs (an online market-place for classifieds with multiple iOS apps). That time included doing learning, testing, optimizing, debugging and constantly changing my API (First adding functionality, then improving it, then radically simplifying it, and debugging it again).
If time-to-market is your priority, you're better off with a simple and pragmatic approach: Nevermind reusability, just keep the learnings in mind, and refactor in the next project, reusing and fixing code here and there. In the end, the sum of all experiences might materialize in a clear vision of HOW your API works and WHAT it provides. If you're not there yet, keep your hands of trying to make it part of project budget, and just try to reuse as much of stable 3'rd-Party API's out there.
Sorry for the lenghty response, I felt you were stepping into something like building a generic API or even framework. Those things take time, knowledge, housekeeping and long-term commitment, and most of the time, they are a waste of time, because you never finish them.
If you just want to handle specific caching scenarios to allow offline usage of your app and minimize network traffic, then you can of course just implement those features. Just set if-modified-since headers in your request, inspect last-modified headers or etags, and keep that info persistent in your persistet entities so you can resubmit this info in later requests. Of course I'd also recommend caching (persistently) resources such as images locally, using the same HTTP headers.
If you have the luxury of modifying (in a REST-ful manner) the server-side service, then you're fine, provided you implement it well (from experience, you can save as much as 3/4 of network/parsing code iOS-side if the service behaves well (returns appropriate HTTP status codes, avoids checks for nil, number transformations from strings, dates, provide lookup-id's instead of implicit strings etc...).
If you don't have that luxury, then either that service is at least REST-ful (which helps a lot), or you'll have to fix things client-side (which is a pain, often).
There is a solution out there that I couldn't try because I'm too far in my project to refactor the server caching aspect of my app but it should be useful for people out there that are still looking for an answer:
http://restkit.org/
It does exactly what I did but it's much more abstracted that what I did. Very insightful stuff there. I hope it helps somebody!
I think it's a valid approach. I've done this a number of times. The tricky part is when you need to deal with synchronizing: if client and server can both change things at the same time. You almost always need app-specific merging logic for this.

Resources