I self taught myself PHP, so I don't know many of the advantages and disadvantages in programming styles. Recently, I have been looking at large PHP projects, like webERP, Wordpress, Drupal, etc and I have notices they all have a main PHP page that is very large (1000+ lines of code) performing many different functions. Whereas, my projects' pages all seem to be very specific in function and are usually less than 1000 lines. What is the reasoning behind the large page, and are there any advantages over smaller more specific pages?
Thanks for the information.
It's partly about style and partly about readability/relationships. Ideally everything in a single file is related (ex. a class, related operation functions etc.) and unrelated items belong in another file.
Obviously if you are writing something to be included by others making a single file can have its advantages. Such as a condensed version of jQuery, etc.
Related
Let's say that a developer has created a general-purpose Dart library, aimed at the client, that contains a large number of class and functionality definitions, of which only a small subset is ever expected to be used in a single web page.
As a Javascript example, Mathjax provides files that contain a large amount of functionality related to displaying mathematical expressions in the browser, though any given web page displaying mathematical expressions is likely to only use a very small amount of the functionality defined by Mathjax. Thus, what page authors tend to do is to link each page they create to a single copy of the large, general-purpose JS file (in the page header, for example) and to then write their own, usually relatively small JS files defining behaviour particular to the page. They can thus potentially have hundreds of pages each linking to a single general-purpose JS file.
When we transpile Dart to JS, however, the output is a single, large JS file that appears to contain the functionality of any dependencies along with the behaviour desired for the particular page. If we had hundreds of pages, each linked to their own dart2js-produced JS file, we appear to have a tremendous amount of redundancy in the code. Further, if we need to make a change to the general-purpose library, it seems that we would have to regenerate the JS for each of the pages. (Or maybe I completely misunderstand things.)
My question is: is it possible to transpile Dart code to multiple JS files? For example, Dart code for a given page might be transpiled into one JS file containing the dependency functionality (that many pages can link to) and one small JS file defining the behaviour particular to the page?
Sounds like you should rethink your website as a single-page app. Then, you get a payload that can process any of your pages, but is still tree-shaken to have only exactly what you need for all of them.
I am trying to parse entities from web pages that contain a time, a place, and a name. I read a little about natural language processing, and entity extraction, but I am not sure if I am heading down the wrong path, so I am asking here.
I haven't started implementing anything yet, so if certain open source libraries are only suitable for a specific language, that is ok.
A lot of times the data would not be found in sentences, but instead in html structures like lists (e.g. 2013-02-01 - Name of Event - Arena Name).
The structure of the webpages will be vastly different (some might use lists, some might put them in a table, etc.).
What topics can I research to learn more about how to achieve this?
Are there any open source libraries that take into account the structure of html when doing entity extraction?
Would extracting these (name, time, place) entities from html be better (or even possible) with machine vision where the CSS styling might make it easier to differentiate important parts (name, time, location) of the unstructured text?
Any guidance on topics/open source projects that I can research would help I think.
Many programming languages have external libraries that generate canonical date-stamps from various formats (e.g. in Java, using the SimpleDateFormat). As you say, the structure of the web-pages will be vastly different, but date can be expressed using a small number of variations only, so writing down the regular expressiongs for a few (let's say, half-a-dozen) formats will enable extraction of dates from most, if not all, HTML pages.
Extraction of places and names is harder, however. This is where natural language processing will have to come in. What you are looking for is a Named Entity Recognition system. One of the best open source NER systems is the Standford NER. Before using, you should check out their online demo. The demo has three classifiers (for English) that you can choose from. For most of my tasks, I find their english.all.3class.distsim classifier to be quite accurate.
Note that an NER performs well when the places and names you extract are occurring in sentences. If they are going to occur in HTML labels, this approach is probably not going to be very helpful.
when i got to this project there were cucumber tests in "features/enhanced", which ran with javascript and a few in "features/plain" which did not require js. with the later development of per-scenario #javascript, this doesn't make sense. and as the number of features files we have grows and grows, it'd be awesome if this stayed tidy.
so, in best practice land:
1) how long should .feature files be? i try to keep each narrow and specific with 1 or 2 "Scenarios".
2) what folder/file structure should one keep them in?
2a) how might one group similar features?
1) Once you've done them for a few months you'll soon find what works best for you. My advice is you should make them small ish. We have often split our earlier features down into smaller chunks, but have never ended up combining them. It's handy for making use of backgrounds etc...
2) We had a big problem with this and spent ages doing it one way then another. In the end we gunned to group them by the services that our company provides. e.g. payments, customer registration, stock management
Inconveniently, features don't always conform to a hierarchical tree view of the world, so make liberal use of tagging and your primary grouping of features is less important.
Have you tried yard? There's an example here We've just built it into our CI, it lets you pull together sets of scenarios based on tags, you can do unions, intersections etc... well worth it :)
I would keep the JavaScript and non-JavaScript versions of a scenario together, since they should be very similar.
Anything more than 8 scenarios in a feature file is probably too much.
A useful approach is to have a folder to represent the high-levels features (sometimes call epics or themes), and separate feature files within those folders for the different aspects of the behaviour.
For example, you may have a feature "Employee Directory" which would have separate feature files contains scenarios for a photograph, office location, job title, etc.
Depending on the size and complexity of your app, you could group those folders into other folders.
(Note that none of the above is specific to Rails apps).
My web site (on Linux servers) needs to support multiple languages.
What is the best practice to have/store multiple languages versions of the same site?
Some I can think of:
store in DB
different view file for each language
gettex
hard coded words in PHP files (like in phpBB)
With web sites, you really have several categories of content to consider for localization:
The article-type content elements that you would in many cases create, edit and publish in a CMS.
The smaller content blocks that are common to every page (or a sub-group of pages), such as tagline, blurb, text around a contact form, but also imported content such as a news ticker or ads and affiliate links. Some of these may only appear for one language (for example, if you don't offer some services in some regions, or don't have, say, language-appropriate imported content for a particular language: it can be better to remove an element rather than offering English to people who may not speak it).
The purely functional elements, like "Click here to comment", "More...", high-level navigation, etc., which are sometimes part of your template. Some of these may be inside images.
For 1. the main decision is using a CMS or not. If yes, you absolutely need to choose one that supports multiple languages. I'm not up-to-date with recent developments in PHP CMS's, but several of the Django CMS apps (Django-CMS-2, FeinCMS) support multi-language content. Don't forget that date stamps, for example, need to be localized, too (or you can get around this by choosing ISO dates, though that may not always be possible). If you don't use a CMS, and everything is in your HTML files, then gettext is the way to go, and keep the .mo files (and your offline .po files) in folders by language.
For 2. if you have a CMS with good multi-lingual support, get as much as possible inside the CMS. The reason is that these bits do change, and you want to edit your template as little as possible. If you write code yourself, think of ways of exporting all in-CMS strings per language, to hand them to translators. Otherwise, again, gettext. The main issue is that these elements may require hard-coding language-selection code (if $language = X display content1 ...)
For 3., if it's in your template, use gettext. For images, the per-language folders will come in handy, and for heaven's sake make choose images the generation of which can be automated, or you (or your graphic artist) will go mad with editing 100s of custom images with strings in languages you don't understand.
For both 2. and 3., abstracting from the language selection may help selecting the appropriate blocks or content directory (where localized images or .mo files are kept).
What you definitely want to avoid is keeping a pile of HTML files with extensive text content in them that would be a nightmare to maintain.
EDIT: Everything about gettext, .po and .mo files is in the GNU gettext manual (more than you ever wanted to know) or a slightly dated but friendlier tutorial. For PHP, there's are the PHP gettext functions, and also the Zend Locale documentation
I recommend using Zend_Translate's Gettext adapter which parses mo files. Very efficient + caching. Your calls would be like
echo $translation->_("Hello World");
Which would find the locale specific key for that specified string.
Check out i18n support for php: http://php-flp.sourceforge.net/getting_started_english.htm
Using the Grails YUI plugin, I've noticed that my GUI tags are replaced with some JavaScript code that is inserted in the HTML page.
Does this behavior contradict the Yahoo rule of making JavaScript and CSS external?
In other words, how do I separate the script code from the HTML page in order to allow external JavaScript script caching?
Should I use the Grails UI performance plugin for that matter? Is there another way to do it?
Everything in software design is a trade-off.
It depends on if the benefit of performance overweights the importance of having well segregated and maintainable code.
In your case, I wouldn't mind having some extra JavaScript code automatically added to dramatically improve the performance.
Complete code and UI separation always comes at a price. More levels of abstraction and intermediate code often translates into slower performance, but better maintainability.
Sometimes, the only way to reach maximum efficiency is to throw away all those abstractions and write optimized code for your platform, minimizing the number of functions and function calls, trying to do as most work as possible in one loop instead of having two meaningful loops, etc. (which is characterized as ugly code).
Well, this is one the features of the UiPerformance Plugin amongst other things:
The UI Performance Plugin addresses some of the 14 rules from Steve Souders and the Yahoo performance team.
[...]
Features
minifies and gzips .js and .css files
configures .js, .css, and image files (including favicon.ico) for caching by renaming with an increasing build number and setting a far-future expires header
[...]
So I'd use it indeed.