This question has already been asked at htc files: Why not to use them?, but the answer didn't answer anything really.
The question is, why is something like CSS3 PIE
not in use on many sites? I'd expect smaller ones to not know about it, but the one that caught my eye was Twitter, who doesn't use it.
Is it because it's not standard? Or does it cause a noticeable slow-down of the site?
Thank you for any responses.
I can't speak for everyone, but my sense is that you don't see tools like these in use on large sites because:
1) They do incur a certain performance cost. CSS3 PIE in particular starts to create a noticeable rendering delay after use on about two dozen elements (in my experience, YMMV.) For that reason its use on large pages might cause a larger rendering delay than the time saved downloading image assets.
2) They start to show bugs with complex DOM changes. Lots of animation, showing/hiding, etc. can sometimes cause PIE to get out of sync.
3) Related to #2, the added layer of abstraction (and its associated bugs) can become a detriment on large development teams with a complex codebase. If you start spending more time debugging the abstraction than it would take to simply create rounded corner images, then the tool is getting in the way.
I'm speaking specifically about CSS3 PIE here because it's near and dear to me (I'm its creator), but similar caveats apply to other polyfills like Selectivizr. This goes for any tool: you always have to evaluate the pros/cons for your specific needs. For example I wouldn't recommend PIE for a high-traffic, performance-critical, highly interactive site like Twitter for the reasons stated above, but it really shines on simpler more static designs.
...Another thought is that it's perfectly valid in many cases to simply let IE degrade to square corners etc. This is always the preferred approach IMO, if possible given your particular situation. So in that case it's not due to any evaluation of the tool, but just a decision that what the tool provides is simply not needed in the first place. :)
Related
We are using Jasper Reports to generate reports from our application, based on Oracle DBMS.
The thing works fine, but it is likely we're going to use different paper formats, languages and orientations for the same document, or to add columns and other elements, or have the elements' contents change size.
Doing this in iReport/Jasper isn't easy AFAIU.
If something doesn't work you have to move or resize elements by hand, checking they're of appopriate size and position.
When I was a student I would use LaTeX for typesetting and it could easily handle "reshaping" well. Isn't there something like that?
I heard BRIT doesn't follow the "pixel position" paradigim of Jasper and Pentaho, and as such it strive to handle positioning and, possibily sizing, alone, after the user had specified the document's abstract structure, i.e. what elements are there and their relative position.
EDIT
Forgot to mention: we are looking for a solution that involves as little code as possible. The reasons are manifold, but the most important are:
first: avoid learning another library (we managed to stay away from Jasper's and liked it).
second: giving a tool that even those that aren't programmers, or hardcore ones, could manage.
The lower the entry barrier the better.
For example I know people in the humanities that can pick up LaTeX decently. They could even digest iReport. I don't know of anyone who can do the same with real-world Java.
Wicket uses the Session heavily which could mean “large memory footprint” (as stated by some developers) for larger apps with lots of pages. If you were to explain to a bunch of CTOs from Fortune 500 that they have to adopt Apache Wicket for their large web application deployments and that their fears about Wicket problems with scaling are just bad assumptions; what would you argue?
PS:
The question concerns only
scaling.
Technical details and real world
examples are very welcomed.
IMO credibility for Apache Wicket in very large scale deployment is satisfied with the following URL: http://mobile.walmart.com View the source.
See also http://mexico.com, http://vegas.com, http://adscale.de, and look those domains up with alexa to see their ranking.
So, yes it is quite possible to build internet scale applications using Wicket. But whether or not you are using Wicket, Struts, SpringMVC, or just plain old JSPs: internet scale software development is hard. No framework can make that easy for you. No framework can give you software with a next-next-finish wizard that services 5M users.
Well, first of all, explain where the footprint comes from, and it is mainly the PageMap.
The next step would be to explain what a page map does, what is it for and what problems it solves (back buttons and popup dialogs for example). Problems, which would have to be solved manually, at similar memory costs but at a much bigger development cost and risk.
And finally, tell them how you can affect what goes in the page map, the secondary page cache and thus how the size can be kept under control.
Obviously you can also show them benchmarks, but probably an even better bet is to drop a line to Martijn Dashorst (although I believe he's reading this post anyway :)).
In any case, I'd try to put two points across:
There's nothing Wicket stores in memory which you wouldn't have to store in memory anyway. It's just better organised, easier to develop, keep consistent, and test.
Java itself means that you're carrying some inevitable excess baggage around all the time. If they are so worried about footprint, maybe Java isn't the language they want to use at all. There are hundreds of large traffic websites written in other languages, so that's a perfectly workable solution. The worst thing they can do is to go with Java, take on the excess baggage and then not use the advantages that come with an advanced framework.
Wicket saves the last N pages in the session. This is done to be able to load the page faster when it is needed. It is needed mostly in two cases - using browser back button or in Ajax applications.
The back button is clear, no need to explain, I think.
About Ajax - each ajax requests needs the current page (the last page in the session cache) to find a component in it and call its callback method, update some model, etc.
From their on the session size completely depends on your application code. It will be the same for any web framework.
The number of pages to cache (N above) is configurable, i.e. depending on the type of your application you may tweak it as your find appropriate. Even when there is no inmemory cache (N=0) the pages are stored in the disk (again configurable) and the page will be find again, just it will be a bit slower.
About some references:
http://fabulously40.com/ - social network with many users,
several education sites - I know two in USA and one in Netherlands. They also have quite a lot users,
currently I work on a project that expects to be used by several million users. Wicket 1.5 will be improved wherever we find hotspots.
Send this to your CTO ;-)
We have developed a mildly sophisticated web application using JQueryUI and themes. We chose this approach because we could do it ourselves, using Themeroller to build a theme and JQueryUI classes or JQUI-aware plugins in Javascript, and have had very limited need to iterate over color schemes, fonts and other styling elements.
I've just started to receive input from our design staff, and want to create a workflow to allow for fluid changes in styling. What works for you?
Sadly good graphical/Ui designer, that is able to design/code in flash are rare jewels... For JavaScript: they are non-existent (at least for me in my country);
So the workflow that I work with (for designers) take this into account, and stays the same for my flash/JavaScript projects;
Roles
Firstly, from the designer point of view: there are 3 roles, as of followed.
1) the designer : in charge of artistic direction and graphical production.
2) the interface-r : works with the rest in implementing the graphics and animation.
3) the logic coder : Codes classes and functionality: logic isolated from interface.
Logic coder can have further sub roles, but is beyond the designer point of view. And roles should not be 100% enforced: it's good to learn and help from one another; The designer is not required to know coding, the coder does not need to know design. The interfacer however needs to know coding, and a bit of design: Not good: but know (especially animation)
work flow
1) Base functionality is worked out by everyone; while it is actually a coder role: getting everyone involved helps the idea generation progress (programers: the designers can sometime come up with really good wild ideas, your job is to logically see how it can be implemented and If it is worth doing so)
2) Mock up UI and Class interface This may be the worst UI you may ever seen, but it gives a general direction to work towards: This is done by everyone. With coder working out the isolated logic (no graphic at all; eg: sever logic) and the interfacer and designer does the mockup.
3.1) Graphic and animation the graphic designer works on their wonderful design / graphics. while the interfacer translate the designs to an actual interface (from photoshop to flash/HTML). If you are lucky, the designers would know how to do this even (slicing etc) and the interfacer focuses on implementation and animation. Any additional graphical animation (dynamic stuff, like something following or reacting to the mouse). Is to be discuss and developed by these 2. This process rarely involves the coder to step in.
3.2) The coder works on the logic while ensuring it correspond to a coding interface as agreeded with the interfacer. Focusing on getting the mockup fully functional (not the best looking). This is usually done by usually classes interface and/or global declaration (avoided if possible)
4) Interface merge The interfacer, then merges the 2 together: to form the final app. XD
Ending note
While in reality, after stage 2. The workflow goes in a contineous cycle of 3 and 4. The main advantage of having an interfacer, is to ensure the designer nor the coder slow the other down. Hence, limited slowdowns :) the interfacer however has a tough role, needing to be extremely flexible, and more often then not double as project lead in small teams. For only he will understand both sides and their limitations. Though he may not be alone nor the best at either.
Note this is used extensively in RAI when both sides are important roles. However if you have projects that has emphasis of 1 over the other. You will need to balance the manpower likewise (eg, a photographer interactive blog may mainly require the designer and interfacer to WOW instead, where most of the photo database code may be reused from already done or open source project (this is one of the most common job I encounter)
I'm not talking about HTML tags, but tags used to describe blog posts, or youtube videos or questions on this site.
If I was crawling just a single website, I'd just use an xpath to extract the tag out, or even a regex if it's simple. But I'd like to be able to throw any web page at my extract_tags() function and get the tags listed.
I can imagine using some simple heuristics, like finding all HTML elements with id or class of 'tag', etc. However, this is pretty brittle and will probably fail for a huge number of web pages. What approach do you guys recommend for this problem?
Also, I'm aware of Zemanta and Open Calais, which both have ways to guess the tags for a piece of text, but that's not really the same as extracting tags real humans have already chosen. But I would still love to hear about any other services/APIs to guess the tags in a document.
EDIT: Just to be clear, a solution that already works for this would be great. But I'm guessing there's no open-source software that already does this, so I really just want to hear from people about possible approaches that could work for most cases. It need not be perfect.
EDIT2: For people suggesting a general solution that usually works is impossible, and that I must write custom scrapers for each website/engine, consider the arc90 readability tool. This tool is able to extract the article text for any given article on the web with surprising accuracy, using some sort of heuristic algorithm I believe. I have yet to dig into their approach, but it fits into a bookmarklet and does not seem too involved. I understand that extracting an article is probably simpler than extracting tags, but it should serve as an example of what's possible.
Systems like the arc90 example you give work by looking at things like the tag/text ratios and other heuristics. There is sufficent difference between the text content of the pages and the surrounding ads/menus etc. Other examples include tools that scrape emails or addresses. Here there are patterns that can be detected, locations that can be recognized. In the case of tags though you don't have much to help you uniqely distinguish a tag from normal text, its just a word or phrase like any other piece of text. A list of tags in a sidebar is very hard to distinguish from a navigation menu.
Some blogs like tumblr do have tags whose urls have the word "tagged" in them that you could use. Wordpress similarly has ".../tag/..." type urls for tags. Solutions like this would work for a large number of blogs independent of their individual page layout but they won't work everywhere.
If the sources expose their data as a feed (RSS/Atom) then you may be able to get the tags (or labels/categories/topics etc.) from this structured data.
Another option is to parse each web page and look for for tags formatted according to the rel=tag microformat.
Damn, was just going to suggest Open Calais. There's going to be no "great" way to do this. If you have some target platforms in mind, you could sniff for Wordpress, then see their link structure, and again for Flickr...
I think your only option is to write custom scripts for each site. To make things easier though you could look at AlchemyApi. They have simlar entity extraction capabilities as OpenCalais but they also have a "Structured Content Scraping" product which makes it a lot easier than writing xpaths by using simple visual constraints to identify pieces of a web page.
This is impossible because there isn't a well know, followed specification. Even different versions of the same engine could create different outputs - hey, using Wordpress a user can create his own markup.
If you're really interested in doing something like this, you should know it's going to be a real time consuming and ongoing project: you're going to create a lib that detects which "engine" is being used in a page, and parse it. If you can't detect a page for some reason, you create new rules to parse and move on.
I know this isn't the answer you're looking for, but I really can't see another option. I'm into Python, so I would use Scrapy for this since it's a complete framework for scraping: it's complete, well documented and really extensible.
Try making a Yahoo Pipe and running the source pages through the Term Extractor module. It may or may not give great results, but it's worth a try. Note - enable the V2 engine.
Looking at arc90 it seems they are also asking publishers to use semantically meaningful mark-up [see https://www.readability.com/publishers/guidelines/#view-exampleGuidelines] so they can parse it rather easily, but presumably they must either have developed a generic rules such as #dunelmtech suggested tag/text ratios, which can work with article detection, or they might be using with a combination of some text-segmentation algorithms (from Natural Language Processing field) such as TextTiler and C99 which could be quite usefull for article detection - see http://morphadorner.northwestern.edu/morphadorner/textsegmenter/ and google for more info on both [published in academic literature - google scholar].
It seems that, however, to detect "tags" as you required is a difficult problem (for already mentioned reasons in comments above). One approach I would try out would be to use one of the text-segmentation (C99 or TextTiler) algorithms to detect article start/end and then look for DIV's / SPAN's / ULs with CLASS & ID attributes containing ..tag.. in them, since in terms of page-layout's tags tend to be generally underneath the article and just above the comment feed this might work surprisingly well.
Anyway, would be interesting to see whether you got somewhere with the tag detection.
Martin
EDIT: I just found something that might really be helpfull. The algorithm is called VIPS [see: http://www.zjucadcg.cn/dengcai/VIPS/VIPS.html] and stands for Vision Based Page Segmentation. It is based on the idea that page content can be visually split into sections. Compared with DOM based methods, the segments obtained by VIPS are much more semantically aggregated. Noisy information, such as navigation, advertisement, and decoration can be easily removed because they are often placed in certain positions of a page. This could help you detect the tag block quite accurately!
there is a term extractor module in Drupal. (http://drupal.org/project/extractor) but it's only for Drupal 6.
I am trying to figure out the difference between taking a process-centric approach in software development as opposed to data-centric. What are the pros and cons of the two approaches and etc.?
I've Googled around but I haven't found a definite answer on why process-centric is better or not better.
My understanding would be that it is a question of focus in the development:
Process-centric would center around using a specific form of process as the core of the methodology. For example, there may be some place that loves Waterfall and that is what they use regardless of other factors.
Data-centric would center around the data and may involve different methodologies for different sets of data. So, a reporting group may use a data-centric approach as most of what they do revolves around data and using it. In contrast, a group customizing a CMS may choose something more Agile to handle changing requirements which may happen as a company starts to use a CMS.
As for which is better there are a few factors to consider here:
People - Do they seem to have a preference for which they like to have in control? Some people may prefer a process-centric approach so that everything is done the same way and methodology is upheld no matter what while in other cases some may say it just depends on the data.
Precedence - Is this going to be something a team will use over and over again? If so, then the process-centric may have better results as the process can get refined over time in contrast to the data-centric where one may change methodologies regularly.
Final product - If the projects are consulting engagements then it may be better to focus on the process rather than the data unless the consulting is in something data-intensive like data warehouse or business intelligence.
Executive buy-in - Either approach can be useless if the people controlling the money don't understand why one style is better than the other for a specific situation.