How do image hosting sites enforce content policies? - image-processing

I'm trying to figure out how to best implement a public data hosting service.
How do websites that let users upload pictures enforce their terms of service regarding obscene pictures? Do they use image processing algorithms to flag potential violations (too many skin-colored pixels)? I think Imageshack looks at the websites that their pictures are hotlinked on, and checks for keywords. If it detects anything porn related, then it removes the picture and bans the account. Are there other methods?
Is enforcement largely automated or is it based more on user reports?

I suppose it depends on the scale of your "public data hosting service".
If it's something small with maybe a couple hundreds pictures per day flowing in, you can moderate them on your own.
If it's a couple hundred thousands you'll need an amount of human beings sorting the weeds out. It's either a moderator team or users themselves who submit abuse reports.
Which one to go, can be dependent on your budget/financial success of your service as well as on the type of the service. If it's something simple like Rapidshare where one does not see what the other does, the chances that users will see each others content and through this notice and hopefully report unacceptable content are small. If it's something very social like Flickr you can bet on it reports will be flowing in.
I suppose you could automate something but it's almost an impossible task. You can't automatically detect porn. You can't automatically detect images violating copyrights - making footprints of copyrighting material in order to compare them with the uploaded stuff is a real challenge for companies with resources like Rapidshare, Youtube and others. For now this kind of work can effectively be done only by humans.
There are also legal issues to it. In some countries the service owner is not liable for what users contribute (well, if he's cooperative enough to delete certain content at request), in others he will get the charges himself for not having premoderated all the incoming content. Also think of this with regard to whatever and wherever you are going to launch.

I don't have links, but while it's certainly a difficult task prone to errors, software to detect improper content does exist. Or at least that's what the Security Manager at NASA told me - if if was just a means to scare me I don't know ;-)

Related

How trustworthy are polls by pinpoll

I was recently looking at security issues from online-polls and the problem with online-elections and how they can sometimes very easily be tampered with.
Now it sprung to my eye that a lot of websites that I visit and even local newspapers in my area use "pinpoll" for online-polls.
So I wanted to know how trustworthy and secure these polls are?
Tobias here, Founder and CEO of Pinpoll.
I agree with #GreyFairer, let's not discuss this on SO (unless you want to know why fingerprinting libraries shouldn't be used to identify individual clients or how Pinpoll is applying ws to broadcast live updates across the globe).
Just send me an e-mail to privacy#pinpoll.com and I'm happy to explain to you in more detail what we do (and cannot do) to protect polls agains bots and fake votes.
And let me make one thing clear: we're one of the most trustworthy providers in Europe, especially when it comes to complying with the EU's strict data protection laws.
One example: You won't find a single request to a server other than our own (located in the EU) in our interactive elements.
And one last thing: What might be annoying to you (which is fully accepted), is interesting and entertaining to others. So let's agree to disagree when it comes to online polls in news portals ;)

Achieving 25K Concurrent connections in RubyOnRails Application

I am trying to build a suggestion board application. where a users raises a query and multiple people will post at the same time. expected to be supporting atleast 25k concurrent users. now the question format also has checkboxes or radio buttons, in thats case they will be writing to DB.
Please let me know how can achieve this in Ruby on Rails.
- hardware support (specific Hardware LB)
- software support like (DB clustering/App server clustering/ Web traffic resolution)
I think your best plan is to worry about scaling to this level once you have that many users. There's nothing stopping you from achieving this in Rails, or indeed any other framework/language.
The problem with trying to design your architecture up-front to scale to this level is that, at this point, you have absolutely no idea where the pain points are going to be. Are there specific pages which are going to hit the database harder, are some of your pages heavy on HTML and images... there are so many questions to ask that simply cannot be answered effectively until you've gotten something out there.
This doesn't mean that you shouldn't worry about scaling - by all means, try to design your data structures in such a way as to allow you to scale later. But put off any major decisions, and think about them later when you have some hard data to work with.

How do I allow safely and inexpensively allow images on my site?

I have developed a social networking site for gardeners website, and am interested in giving users the ability to add images to their "tweets".
If I allow them to upload images to the actual site, it seems like this will quickly become expensive (this is a side project, not funded by anyone than myself and my own obsessions). Let's say the site becomes moderately popular, with 100K users posting one image a week, of only 250K in size. That's (100000 * .1 * 52 / 1024) = 508 MB/year in storage (and that doesn't take into account increased bandwidth). Plus I'd have to increase the server load to scale the images. I'm not sure if I should just go ahead with this, or if there are better possibilities.
Linking to other sites seems better in some ways. You do have broken links, but a larger concern for me is security: XSS.
The application is on Rails 3, using MongoDB / Mongoid as the backend, if that matters.
I'm looking for solutions such as:
APIs that store images on external sites. What would be ideal is the ability to upload it to my site, and make an API call to store it on an external site.
APIs (perhaps Javascript APIs) that make it easy to link to one or more external image hosting sites securely.
Markdown or similar markup that allow linking to external images securely. I am interested in giving users the ability to format their posts in limited ways, so this might solve two problems at the same time. I notice that this is what Stack Overflow does.
Security libraries that whitelist image URL patterns
Advice on why I am thinking about this problem wrong. For example, maybe I should just store the images. 500MB a year is really not all that expensive, and it does allow me to create a very clean user experience.
My objectives are (in order):
- Secure, both for my own site, and to not allow XSS attacks against other sites
- Best possible user experience
- Easy to maintain and implement
What have you done to allow user-supplied images on your site?
You're thinking about the problem wrong ;) or rather not at the right time.
Don't worry about the bandwidth now, when you don't have that many users yet. Concentrate on making the site user friendly and popular first. Performance, bandwidth, disk space - these are the things you'll work on when they become problems. By the time you've 100k users the cost of buying that space and bandwidth on, say, Amazon S3 may not be an issue anymore.
Why not using a service like Amazon s3? Is cheap, very cheap (With the Reduced Redundancy Storage), and the most important plugins like Paperclip support it out of the box...
You will need to look at the T&C of picture hosts (flickr etc...) and see if your usage is applicable. Flickr has an API, not sure about the others just search for HOST api.
Flickrs api is at:
http://www.flickr.com/services/api/

How to prevent hackers from scraping our database? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How do you stop scripters from slamming your website hundreds of times a second?
I am building a web application in RubyOnRails, which is based on a large body of data. The application makes for powerful navigation and intersection of the data, as well as a community model for adding more data.
In that respect one could compare it with StackOverflow.com: a big bunch of data, structured in a fairly simple way.
I intend to offer the content under a CreativeCommons license, but if the site "hits it off", I need to discourage copycats. My biggest fear is screen scraping scripters, not only leeching away the raw data, but also incurring huge usage peaks on my servers.
I wonder if RubyOnRails offers any way to throttle (obviously automated) requests, e.g. to reduce their response-time at the benefit of regular users. Perhaps this requires Apache or Phusion Passenger settings?
EDIT: My target is not to recognize user types, but to reduce responsiveness to overly active users, e.g. maximize the number of requests handled per IP address per unit of time (?)
My suggestion would be to limit any easy iterative navigation of your websites which was the primary way I have seen harvesting programs work. The simple encryption of your id numbers used as GET variables would make stripmining your info more difficult. You can only try and make getting your information onerous. You won't be able to prevent it completely.
You could present a captcha to the "overly active users", just like SO does when you edit too fast. That should effectively hinder automatic spider like scraping.
You might also want to look into using some Rack middleware to do rate limiting, like this recent article covered for doing API limiting (such as what you'd want at Twitter or similar).
I believe all you could do is put hoops for the user to jump though. Ultimately there is no foolproof way to distinguish a regular user from a bot.

Best practices for customer involvement in Agile development? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
We need to involve our customer development partners in our development process. We're more or less following Agile methodologies. Some customer partners are remote, others closer. We need to minimize travel costs.
Our customers are in health care and tend to be busy, expensive, and hard to schedule.
What practices and technologies have worked to support customer involvement? We're using phone calls, phone conferences and email. We're curious about leveraging wiki techniques and would love to hear what's worked for others.
it doesn't matter whether the customer is in the same cubicle or halfway around the planet, except for communication delays - the critical factor is availability.
a customer that is too busy to answer your emails for several days is going to cause your iteration to be late, or fail
the customer has two critical commitments for agile:
available to answer questions in a timely manner
not to change their mind/priorities during an iteration
the customer must commit to a reasonable service-level agreement (SLA) on availability, e.g. 1-hour response time, or 24-hour response time, etc., and you will need to adjust all estimates and schedules by the lag factor. If the customer will not commit or does not follow through, cancel the iteration and re-plan, bringing the customer's commitment to the forefront again. Do not just "guess" at what you think the customer might want.
Bottom line: without a customer commitment, agile will not work.
My experience with Agile methods is mostly for desktop applications. When our customers are remote, we've spent time to get an engineer to the customer site to configure/install a demo rig. The engineer works with the customer on a test and demo setup/plan that will provide an environment that the customer believes replicates the important aspects of the deployment environment but isolates the demo system from existing infrastructure (so that we can push updates whenever we need to). The engineer also sets up deployment systems to move our applications into production, so that we can "deploy" without being on site. Our applications can self-update (either for each release or each build) and we carefully instrument the releases to log all errors and submit all crashes as bugs to our bug tracker. This way we at least know what went wrong, even if we don't know what's going right.
For each release/build that shows up on the customer's test rig, we provide a (short) screencast, narrated by the project lead or primary developer, demo-ing any new features. The release notes contain any long-term issues or questions we want the customer to think about (i.e. issues that can't be resolved immediately by a phone call or email), and the application displays these notes for the user.
Finally, and possibly most importantly, we get the customer and/or the customer's liaison an account on our calendar server and configure their calendar app to make use of that account. This then goes both ways--we can schedule time (on site, phone, email, etc.) with the customer and they can do the same with our developers.
One option: Install a customer proxy at the "customer partner" site who can extract the information that you need when those customers are available. Have these proxies build the solid relationships that allow them to represent the customer view. Their time is all yours. And when questions arise that they cannot answer, they have ready access to your customer partners - even if in the coffee line.
The whole point of the customer in agile is to have open and free discourse with the developers (IE immediate feedback). If your actual customers cannot provide this, then you need an intermediary/proxy that can fill this role. You don't need actual customers, you just need someone that can represent the customers' interests well enough to meet your customers' needs.
Just a few ideas:
If you do choose to use a Wiki, make sure it supports a whole-wiki-wide "recent changes" list, and preferably one that is specific to the users. The less distant from development people are, the more likely to have email as a metaphor for their computer use. If they can't immediately tell when there's something new for them to see, they will never explore it. You also preferably need ways to signal to them that you need their attention to matters, or they will treat changes like CCs.
I'm a big believer in creating video screen captures of interactions (narrated) and distributing them to users. Unlike a real demo, customers don't feel like they need to interrupt, and they can rewind and re-watch the same interaction over and over, paying attention to little details.
Finally, if you do distribute prototypes, make sure to send someone (or at least a screen sharing session) to see how the prototypes are used. Contextual design is effective. You can count on people using your prototype differently from the way you expect, and you have to understand how they use it to really understand where the issues are, even if they don't report them.
Have you considered something like LogMeIn.
This would allow customers to either log-in to a PC on your network already running your application, or alternatively allow you to install/update the application on one of their computers.
This would solve the remote customer issue and would also support the ongoing continual customer feedback requirement in the agile process.
I used it a previous company for technical support, but there is no reason (except maybe cost) that it would not work for your situation.
It is also a great way to actually see how users are using your application and therefore find out what works and what doesn't.
First of all, make sure that you have a product manager or a product owner close the the developers. This person will be managing the relationship with the customer.
Then, the product manager can demonstrate the product to the customer at the end of each iteration and also ask customers question when the developer need feedback to implement a user story.
It is amazing the positive feedback you can get from customers when you involve them.
We did not use a wiki and most of the communication is done via E-Mail, phone, and a screen sharing application (we are using GoToMeeting, but there are tons of alternative out there).
You should probably do a kick-off once with everyone at one place. Face-to-face time is invaluable. That includes all developers. Prepare some metaplan questions, but also have enough time to just mingle.
I think by most definitions of Agile processes that have high dependence on customer involvement you've already missed "best practice", which would be for an on-site, and preferably "in-team" customer present at all times. So I suppose we're looking for a "next-best practice". :)
There's the possibility of introducing a "proxy customer" on-site. I have to admit to being very sceptical about the value of a proxy customer. I'm concerned about the risk of introducing some sort of second-rate and otherwise unnecessary business analyst function to the mix, with the increased signal-to-noise ratio and potential for garbled messages. It also carries the risk of allowing busy real customers to reduce their involvement in the process, which is likely to lead to dissatisfaction. I wonder if there might be someone with good domain knowledge who has recently retired and might be available to act in this capacity as a consultant?
Communication bandwidth with remote customers is astonishingly lower than face-to-face, something I had not fully realised until I started dealing with users in another country. Even with video the loss is significant.
How long are your iterations? How hard is planning iterations? Might it be easier to go for longer iterations and get more planning done less frequently, or reduce iteration length and go to smaller, but more frequent planning sessions? Are more than one customer involv
Do you have a useable and available build at the end of each iteration? Is there time for involved users to have hands-on time before the next planning session? Keeping users engaged by shipping frequently would seem on the surface to be a Good Idea, which perhaps legislates for small frequent iterations (a week? two weeks?)
The wiki idea might work: have you looked at the FIT Framework? It's a sort of integrated acceptance test/wiki, which might help in getting acceptance tests from remote customers. I think I'd also look to provide some sort of (separate or integrated) "project dashboard", possibly pushed regularly to key customers as well as available on demand. use it as a substitute for things like post-its on whiteboards, Big Visible Charts and the like. There are a number of open-source or low-cost options that may serve - writing your own simple alternative need not be too time-consuming or costly, either.
Above all, remember that "Agile" is a kind-of catch-all label for developments that are carried out with an emphasis on the values and principles espoused in the Agile manifesto. What is considered "best" in one situation may not be so in another. If you understand the principles and regularly review your methods with a critical eye then you're probably going to be close enough to the best practice application to your situation.
I haven't looked at it for some time but with Beck and Fowler on the author list, there should be something useful in Planning Extreme Programming.
In my previous position #drchrono.com I aggregated data/feedback/iteration requests from 20,000 clinicians across the country. The best way to do this is to to evangelize a site like uservoice.com. I held "daily live web demonstrations" with sometimes 50 to 100 doctors (doctors signed up right from our website). In these demos I would demonstrate our current product and evangelize user voice to drive their feedback into a useful tool for our development team. All of this was done remotely and led to a 1,400% overall increase in recurring revenue growth.

Resources