I'm going to attempt to create an open project which compares the most common MP3 download providers.
This will require a user to enter a track/album/artist name i.e. Deadmau5 this will then pull the relevant prices from the API's.
I have a few questions that some of you may have encountered before:
Should I have one server side page that requests all the data and it is all loaded simultaneously. If so, how would you deal with timeouts or any other problems that may arise. Or should the page load, then each price get pulled in one by one (ajax). What are your experiences when running a comparison check?
The main feature will to compare prices, but how can I be sure that the products are the same. I was thinking running time, track numbers but I would still have to set one source as my primary.
I'm making this a wiki, please add and edit any issues that you can think of.
Thanks for your help. Look out for a future blog!
I would check amazon first. they will give you a SKU (the barcode on the back of the album, I think amazon calls it an EAN) If the other providers use this, you can make sure they are looking at the right item.
I would cache all results into a database, and expire them after a reasonable time. This way when you get 100 requests for Britney Spears, you don't have to hammer the other sites and slow down your application.
You should also make sure you are multithreading whatever requests you are doing server side. Curl for instance allows you to pull multiple urls, and assigns a user defined callback. I'd have the callback send a some data so you can update your page with as the results come back. GETTUNES => curl callback returns some data for each url while connection is open that you parse it on the client side.
Related
When looking at how websites such as Facebook stores profile images, the URLs seem to use randomly generated value. For example, Google's Facebook page's profile picture page has the following URL:
https://scontent-lhr3-1.xx.fbcdn.net/hprofile-xft1/v/t1.0-1/p160x160/11990418_442606765926870_215300303224956260_n.png?oh=28cb5dd4717b7174eed44ca5279a2e37&oe=579938A8
However why not just organise it like so:
https://scontent-lhr3-1.xx.fbcdn.net/{{ profile_id }}/50x50.png
Clearly this would be much easier in terms of storage and simplicity. Am I missing something? Thanks.
Companies like Facebook have fairly intense CDNs. They may look like randomly generated urls but they aren't, each individual route is on purpose and programed to be handled in that manner.
They aren't after simplicity of storage like you would be if you were just using a FTP to connect to a basic marketing website server. While you may put all your images in a /images folder, Facebook is much too complex for this. Dozens of different types of applications accessing hundreds if not thousands of CDNs and servers world wide.
If you ever build a web app, such as a Ruby on Rails app, and you work with a services such as AWS (Amazon Web Services) you'll also encounter what seems like nonsensical urls. But it's all part of the fast delivery network provided within the architecture. Every time you "push" your app up to the server new urls are generated for each unique resource automatically, css files, JavaScript files, image files, etc all dynamically created. You don't have to type in each of these unique urls individually each time you publish the app, the code simply knows where to look for those as a part of the publishing process.
Example: you tell the web app to look for
//= require jquery
and it returns you http://example.com/assets/jquery-eb3e278249152b5b5d5170b73d9dbf52.js?body=1 in your header.
It doesn't matter that the url is more complex than it should be, the application recognizes it, and that's all that matters.
Simply put, I think it can boil down to two main reasons: Security and Cache:
Security - Adding these long unpredictable hashes prevent others from guessing photo URLs and makes it pretty hard to download photos you aren't supposed to.
Consider what would happen if I could easily guess your profile photo URL and download it, even when you explicitly chose to share it only with friends.
Cache - by adding "random" query params to each photo, you make sure each photo instance gets its own URL. Thus you can store the photo in browser's cache for a long time, knowing that whenever you replace it with a new one, the new photo will have a fresh URL and the browser won't keep showing you the old photo.
If you were to keep the same URL for each user's profile photo (e.g. https://scontent-lhr3-1.xx.fbcdn.net/{{ profile_id }}/50x50.png), and then upload a new photo, either one of these can happen:
If you stored the photo in browser's cache for a long time, the browser will keep showing you the cached version (as long as URL is the same, and cache hasn't expired, there's no need to re-download the image).
If, instead, you only keep the image in cache for short period of time, you end up hitting your server much more then actually needed, increasing the load and hurting performance.
I hope this clarifies it.
With your route scheme, how would you avoid strangers to access the pictures of a private account? The hash also prevent bots to downloads all the pictures.
I get your pain :-) I might not stay with describing how this problem could appear more, but rather let me speak of a solution. Well it is normal that in general code while dealing with hashed value or even base64ed value it seems likes mess to deal with, but with an identifier to explain along, it does not remain much!
I use to work in a company where we use to collate Facebook post, using Graph API get its Insights Object and extract information from it for easy passing around within UI and sending back to our Redis cache store; and once we defined a data-structure in TaffyDB how an object organization is going to look like, everything just made sense with its ability to query the useful finite from long junk looking stream of minified Javascript stream
Refer: http://www.taffydb.com/
The extra values in the URL are useful to:
Track access. This is like when a newspaper appends "&homepage" vs. "&email" to an article URL, so their system knows how a reader found the page.
Avoid abuse and control access. Imagine that a user loaded a small, popular pornographic image into a profile image. They could then hijack the CDN to be a free web host for their porn site. But that code is used internally by the CDN to limit the number of views.
I've built a Rails app, basically a CRUD app for memos/notes.
A notes title must be unique. If a user enters a name already taken a warning message is shown prompting them to chose another.
My question is how to make this latency for this feedback as close to zero as possible. When creating a note little UX speed bumps like this will get annoying for user quickly.
Of course the main bottleneck is the network. Inspired by Meteor (and mini-mongo) I was thinking some kind of local storage could be a solution?
I.E. When app first loads, send ALL JSON to the client with ALL note titles. The app (front end is Angular JS) could check LocalStorage (or App Cache, Web SQL?) instead of incurring a network round trip. The feedback would be instant.
I've used LocalStorage in the past to augment an app, but in the scenario it'd really seriously depend on it. I'm not sure how confident I'd be building on something that user might not have. Also as the number of user Notes/Memos I have doubts how feasible it is to send a JSON object down the wire with ALL the note titles. That might get pretty big. On the other hand MeteorJS seems to do this with no probs.
Has anyone done something similar or have any pointers? Thanks!
I don't know how Meteor works here, but you're right that storing all note titles in localStorage is not a good idea. Actually, you don't need localStorage here, you can just put it in a JS array, because you need this data only once (when checking new note title).
I think, there could be 2 possible solutions:
You can change your business requirements and allow non-unique title. Is there really a necessity for titles to be unique?
You can verify note title when user submits form. In this case you can provide suggestions for users, so they not spend time guessing vacant title.
Or, if titles must be unique only within a user (two users can have same title for their notes), you can really load all note titles in JS array and check uniqueness while users types in a title.
Or you can send an AJAX request checking title uniqueness as soon as user finished typing the title. In this case you can win some seconds.
Or you can send an AJAX request as soon as user typed in 3 symbols. The request will return all titles that begin with these 3 symbols, so you don't need to load all the titles.
Im developing a azure website where users can upload blob and metadata. I want uploaded stuff too be deleted after some time.
The only way i can think off is going for a cloudapp instead of a website with a worker role that checks like every hour if the uploaded file has expired and continue and delete it. However im going for a simple website here without workerroles.
I have a function that checks if the uploaded item should be deleted and if the user do something on the page i can easily call this function, BUT.. If the user isnt doing anything and the time runs out it wont delete it because the user never calls the function.. The storage will never be deleted. How would you solve this?
Thanks
Too broad to give one right answer, as you can solve this in many ways. But... from an objective perspective because you're using Web Sites I do suggest you look at Web Jobs and see if this might be the right tool for you (as this gives you the ability to run periodic jobs without the bulk of extra VMs in web/worker configuration). You'll still need a way to manage your metadata to know what to delete.
Regarding other Azure-specific built-in mechanisms, you can also consider queuing delete messages, with an invisibility time equal to the time the content is to be available. After that time expires, the queue message becomes visible, and any queue consumer would then see the message and be able to act on it. This can be your Web Job (which has SDK support for queues) or really any other mechanism you build.
Again, a very broad question with no single right answer, so I'm just pointing out the Azure-specific mechanisms that could help solve this particular problem.
Like David said in his answer, there can be many solutions to your problem. One solution could be to rely on blob itself. In this approach you can periodically fetch the list of blobs in the blob container and decide if the blob should be removed or not. The periodic fetching could be done through a Azure WebJob (if application is deployed as a website) or through a Azure Worker Role. Worker role approach is independent of how your main application is deployed. It could be deployed as a cloud service or as a website.
With that, there are two possible approaches you can take:
Rely on Blob's Last Modified Date: Whenever a blob is updated, its Last Modified property gets updated. You can use that to identify if the blob should be deleted or not. This approach would work best if the uploaded blob is never modified.
Rely on Blob's custom metadata: Whenever a blob is uploaded, you could set the upload date/time in blob's metadata. When you fetch the list of blobs, you could compare the upload date/time metadata value with the current date/time and decide if the blob should be deleted or not.
Another approach might be to use the container name to be the "expiry date"
This might make deletion easier, as you then could just remove expired containers
I am using the Twitter public stream API to search for some keywords. I am writing my script in Java and therefore I use twitter4j. Now I stumbled over the information about status deletion notices:
Status deletion notices (delete)
These messages indicate that a given Tweet has been deleted. Client
code must honor these messages by clearing the referenced Tweet from
memory and any storage or archive, even in the rare case where a
deletion message arrives earlier in the stream that the Tweet it
references.
https://dev.twitter.com/docs/streaming-apis/messages#Status_deletion_notices_delete
So I created methods to remove records from my database when such a notice occurs. Unfortunately such a notice never occurs. I searched to find out what I am doing wrong and found some posts in the twitter developer section concerning the same problem:
https://dev.twitter.com/discussions/17393
https://dev.twitter.com/discussions/19943
https://dev.twitter.com/issues/1355
https://dev.twitter.com/discussions/12836
but unfortunately all these discussions got no answer. So for me it seems like I did no mistake with my code but twitter4j never sends me an deletion notice.
I want to respect the privacy of the twitter users - at least for legal reasons. So my question is:
What can I do to respect the privacy of the users ?
What do I have to do to satisfy my legal duties ?
One alternative seems to be to periodically iterate through all saved Tweets in my Database and request them from twitter to see whether I get a result back or not (so they were deleted). But this doesn't seem to be a practicable way because the data will get more and more and therefore at some point of time I will have limitations (in time, allowed twitter requests, ...). So what should I do?
Thanks in advance! Your help is greatly appreciated.
Ludwig
twitter4j v.3.0.6
Given the nature of the volume of tweets, it's unreasonable to assume that you would check to see if all the tweets are still there. You should make sure that you properly act on a delete notice from twitter. The onus is on them to actually send the delete notification.
That being said, I receive delete notifications from twitter. However, we aren't using the public stream, we are using sitestreams, which relies on authorizing specific social accounts and streaming all updates for those accounts (e.g. favorites, follows, blocks, tweets, retweets, etc) to us in realtime.
If you are doing a stream with filters, for example, it's probably not feasible (or at least very taxing) to run all deleted items through the same pipeline as new items. Or perhaps, to guess at which you were sent based on the times that you were running your filter.
As noted in the issue you linked to, the public streaming API will not necessarily send them out. I'd endeavor to handle them, and possibly provide a tool to manually remove any if a request comes in through another channel, but not worry too much about it, given that twitter doesn't provide the proper facility to be notified of such instances.
I need to develop an application which should help me in getting all the status,messages from different servers like Twitter,facebook etc in my application and also when i post a message it should gets updated in all the services. I am using authlogic for authentication. Can anyone suggest me what gems/plug-ins i can use..
I need API help to get all the tweets/messages to be displayed in my application and also ways to post the messages to the corresponding services by posting it from my application. Can anyone help me from design point.
Walk through what you'd want to do in your head. Imagine the working site, imagine your webapp working before you start. So your user logs in (handled by authlogic) and sees a textbox called "What are you doing right now?". The user fills in a status message and clicks "post". The status message appears at the top of their previously posted messages.
Start with the easy part. Create a class that posts to two services. Use the twitter gem and rfacebook to post to two already defined services. In the future, you'll want to let the user associate services to their account and you would iterate through the associated services and post the message to each. Once you have this working, you can refactor or polish the UI a bit to round out this feature. I personally would do the "add a social media account to my profile" feature towards the end.
Harder is the reading of the data (strangely enough) because you're going to have to figure out how to store it. You could store nothing but I suspect you'd run into API limits just searching all the time (could design around this). I would keep a little cache of posts associated to the user's social media account. In this way, the data model would look like this:
A user has many social media accounts.
A social media account has many posts. (cache)
Of course, now you need to schedule the caching of the posts. This could be done manually, based on an event (like when they login) or time based. So when the update happens, you load up the posts for that social media account and the user will see the posts the next time they hit the page. For real-time push to the client's browser while they stare at the screen, use faye (non-trivial) and ajax to pull the new posts to the top of the social media stream view.
The time based one is tricky because you'd either have to have a cron job run or have rails handle it all with a gem like clockwork. But then you have to leave rails running. I've also solved this by having a class in /lib do all the work and a simple web call kicks off the update. But it wasn't in a multi-user use case. So that might not work. In any case, you'll want to have some nice reusable code for these problems since update requests can come from many different sources.
You'll also have to deal with the API limits. When pulling down content from twitter, you won't get everything. That will just have to be known by the user or you'll have to indicate a "break in time" somehow.
The UI should be pretty easy (functionally anyway), because you know which source the post/content is coming from. It'd be easy to throw a little icon next to the post to display which social media site it's coming from.
Anyway, good luck, sounds like a fun project.