What does "lg" mean in the following phrase?
"... we ignore the least significant lg t bits of x when referring to Mt[x]." (Knuth, 2005, pp. 4-5).
From the context, it seems like "lg t" means "t -1" so that lg 2 would be 1 and lg 5 would be 4. That said, what is the strict meaning of "lg" here?
References
Knuth, D. E. (2005). The art of computer programming: Volume 1, fascicle 1 : MMIX, a RISC computer for the new millennium. Upper Saddle River, New Jersey: Addison-Wesley.
lg means log to the base 2.
i.e. lg(4) = 2, lg(2) = 1.
"lg" is commonly used to represent base 2 logarithms but this is a misuse is propagated in a few computer science texts.
Logarithm abbreviations are governed by standards. The abbreviation "lg" is reserved under DIN (DIN 1302) and ISO standards (ISO-31-11, ISO 80000-2) for a logarithm base 10. Since "lg" is widely used in other science and engineering fields in this manner, no one should use "lg" to refer to a base 2 logarithm.
The correct abbreviation for base 2 logarithm is logarithmus binaris (the binary logarithm) is "lb," though some Germans still use "ld" (for logarithmus dualis).
One of the most popular texts misusing the abbreviation (Cormen et alli: Introduction to Algorithms) commits several other mathematical sins (such as misusing "asymptotic") that make it more difficult for students to connect the material to their precalculus and calculus courses.
References:
Wikipedia - Binary Logarithm: Notation
Guide for the Use of the International System of Units (SI) — NIST Special Publication 811, 2008 Edition — Second Printing
Quantities and units – Part 2: Mathematical signs and symbols to be used in the natural sciences and technology
Probably logarithm of t with base 2.
The quoted passage refers to Knuth Vol. 1 [1]. Section 1.2.2 of this monumental work is entitled "Numbers, Powers, and Logarithms". Here's how Knuth explains his notation:
"One might expect that in computer work binary logarithms (to the base 2) would be more useful, since most computers do binary arithmetic. Actually, we will see that binary logarithms are indeed very useful, but not only for that reason; the reason is primarily that a computer algorithm often makes two-way branches. Binary logarithms arise so frequently, it is wise to have a shorter notation for them. Therefore we shall write , following a suggestion of Edward M. Reingold."
1: Knuth, The Art of Computer Programming, third edition, Addison-Wesley, 1997.
Related
I'm new to Named Entity Recognition and I'm having some trouble understanding what/how features are used for this task.
Some papers I've read so far mention features used, but don't really explain them, for example in
Introduction to the CoNLL-2003 Shared Task:Language-Independent Named Entity Recognition, the following features are mentioned:
Main features used by the the sixteen systems that participated in the
CoNLL-2003 shared task sorted by performance on the English test data.
Aff: affix information (n-grams); bag: bag of words; cas: global case
information; chu: chunk tags; doc: global document information; gaz:
gazetteers; lex: lexical features; ort: orthographic information; pat:
orthographic patterns (like Aa0); pos: part-of-speech tags; pre:
previously predicted NE tags; quo: flag signing that the word is
between quotes; tri: trigger words.
I'm a bit confused by some of these, however. For example:
isn't bag of words supposed to be a method to generate features (one for each word)? How can BOW itself be a feature? Or does this simply mean we have a feature for each word as in BOW, besides all the other features mentioned?
how can a gazetteer be a feature?
how can POS tags exactly be used as features ? Don't we have a POS tag for each word? Isn't each object/instance a "text"?
what is global document information?
what is the feature trigger words?
I think all I need here is to just to look at an example table with each of these features as columns and see their values to understand how they really work, but so far I've failed to find an easy to read dataset.
Could someone please clarify or point me to some explanation or example of these features being used?
Here's a shot at some answers (and by the way the terminology on all this stuff is super overloaded).
isn't bag of words supposed to be a method to generate features (one for each word)? How can BOW itself be a feature? Or does this simply mean we have a feature for each word as in BOW, besides all the other features mentioned?
how can a gazetteer be a feature?
In my experience BOW Feature Extraction is used to produce word features out of sentences. So IMO BOW is not one feature, it is a method of generating features out of a sentence (or a block of text you are using). Uning NGrams can help with accounting for sequence, but BOW features amount to unordered bags of strings.
how can POS tags exactly be used as features ? Don't we have a POS tag for each word?
POS Tags are used as features because they can help with "word sense disambiguation" (at least on a theoretical level). For instance, the word "May" can be a name of a person or a month of a year or a poorly capitalized conjugated verb, but the POS tag can be the feature that differentiates that fact. And yes, you can get a POS tag for each word, but unless you explicitly use those tags in your "feature space" then the words themselves have no idea what they are in terms of their POS.
Isn't each object/instance a "text"?
If you mean what I think you mean, then this is true only if you have extracted object-instance "pairs" and stored them as features (an array of them derived from a string of tokens).
what is global document information?
I perceive this one to mean as such: Most NLP tasks function on a sentence. Global document information is data from all the surrounding text in the entire document. For instance, if you are trying to extract geographic placenames but disambiguate them, and you find the word Paris, which one is it? Well if France is mentioned 5 sentences above, that could increase the likelihood of it being Paris France rather than Paris Texas or worst case, the person Paris Hilton. It's also really important in what is called "coreference resolution", which is when you correlate a name to a pronoun reference (mapping a name mention to "he" or "she" etc).
what is the feature trigger words?
Trigger words are specific tokens or sequences that have high reliability as a stand alone thing to have a specific meaning. For instance, in sentiment analysis, curse words with exclamation marks often indicate negativity. There can be many permutations of this.
Anyway, my answers here are not perfect, and are prone to all manner of problems in human epistemology and inter-subjectivity, but those are the way I've been thinking about this things over the years I've been trying to solve problems with NLP.
Hopefully someone else will chime in, especially if I'm way off.
You should probably keep in mind that NER classify each word/token separately from features that are internal or external clues. Internal clues takes into account the word itself (morphology as uppercase letters, is the token present in a dedicated lexicon, POS) and external ones relies on contextual information (previous and next word, document features).
isn't bag of words supposed to be a method to generate features (one
for each word)? How can BOW itself be a feature? Or does this simply
mean we have a feature for each word as in BOW, besides all the other
features mentioned?
Yes, BOW generates one feature per word, with sometimes feature selection methods that reduces the number features taken into account (e.g. minimum frequency of words)
how can a gazetteer be a feature?
Gazetteer may also generate one feature per word, but in most cases it does enrich data, by labelling words or multi-word expressions (as full proper names). It is an ambiguous step: "Georges Washington" will lead to two features: entire "Georges Washington" as a celebrity and "Washington" as a city.
how can POS tags exactly be used as features ? Don't we have a POS tag
for each word? Isn't each object/instance a "text"?
For classifiers, each instance is a word. This is why sequence labelling (e.g. CRF) methods are used: they allow to leverage previous words and next words as additional contextual features to classify the current word. Labelling a text is done as a process relying on the most likely NE types for each word in the sequence.
what is global document information?
This could be metadata (e.g. date, author), topics (full text categorization), coreference, etc.
what is the feature trigger words?
Triggers are external clues, contextual patterns that help disambiguation. For instance "Mr" will be used as a feature that strongly suggest that the following tokens would be a person.
I recently implemented a NER system in python and I found the following features helpful:
character-level ngrams (using CountVectorizer)
previous word features and labels (i.e. context)
viterbi or beam-search on label sequence probability
part of speech (pos), word-length, word-count, is_capitalized, is_stopword
I am confused between the word syntax and grammar. Is there a reason that for computer languages we always use the word syntax to describe the word order and not the word grammar?
The term "syntax" and "grammar" both comes from the field of linguistics. In linguistics, syntax refers to the rules by which sentences are constructed. Grammar refers to how the rules of the language relate to one another.
Grammar actually covers syntax, morphology and phonology. Morphology are the rules of how words can be modified to add meaning or context. Phonology are the rules of how words should sound like (which in turn govern how spelling works in that language).
So, how did concepts form linguistics got adopted by programmers?
If you look at really old papers and publications related to computing, for example Turing's seminal work on computability (Turing machines) or even older, Babbage's publications describing his Analytical Engine and Ada Lovelace's publications on programming, you'll find that they don't refer to computer programs as languages. Instead, they were just referred to as instructions or, if you want to get fancy, algorithms.
It was partly, perhaps mostly, the work of Noam Chomsky that related languages to programming.
Looking for a new way to study languages and how to extract meaning from sentences Chomsky created the concept of the Chomsky hierarchy. His idea was to start with the simplest system that could process a string of "stuff" (sounds,letters,words): a Turing machine and categorize the instructions for a Turing machine as type-0 grammar. Then he went on to define grammar types 1, 2 and 3 (type 3 being the grammar of human languages such as English or Swahili) hoping that as we understand how complexity gets introduced we will end up with a parser for human languages.
Most programming languages are type 2. Indeed we have discovered parsers for types 0, 1 and 2 in the form of language interperters and CPU designs.
Inheriting Chomsky's work, we have defined "syntax" in computing to mean how symbols are arranged to implement a language feature and "grammar" to mean the collection of syntax rules.
Because a language has only "one" syntax (the set of strings it will accept), and probably very many grammars even if we exclude trivial variants.
This may be clearer if you think about the phrase, "the language syntax allows stuff". This phrase is independent of any grammars that might be used to describe the syntax.
I am working on developing a tool for language identification of a given text i.e. given a sample text, identify the language (for e.g. English, Swedish, German, etc.) it is written in.
Now the strategy I have decided to follow (based on a few references I have gathered) are as follows -
a) Create a character n-gram model (The value of n is decided based on certain heuristics and computations)
b) Use a machine learning classifier(such as naive bayes) to predict the language of the given text.
Now, the doubt I have is - Is creating a character N-gram model necessary. As in, what disadvantage does a simple bag of words strategy have i.e. if I use all the words possible in the respective language to create a prediction model, what could be the possible cases where it would fail.
The reason why this doubt arose was the fact that any reference document/research paper I've come across states that language identification is a very difficult task. However, just using this strategy of using the words in the language seems to be a simple task.
EDIT: One reason why N-gram should be preferred is to make the model robust even if there are typos as stated here. Can anyone point out more?
if I use all the words possible in the respective language to create a prediction model, what could be the possible cases where it would fail
Pretty much the same cases were a character n-gram model would fail. The problem is that you're not going to find appropriate statistics for all possible words.(*) Character n-gram statistics are easier to accumulate and more robust, even for text without typos: words in a language tend to follow the same spelling patterns. E.g. had you not found statistics for the Dutch word "uitbuiken" (a pretty rare word), then the occurrence of the n-grams "uit", "bui" and "uik" would still be strong indicators of this being Dutch.
(*) In agglutinative languages such as Turkish, new words can be formed by stringing morphemes together and the number of possible words is immense. Check the first few chapters of Jurafsky and Martin, or any undergraduate linguistics text, for interesting discussions on the possible number of words per language.
Cavnar and Trenkle proposed a very simple yet efficient approach using character n-grams of variable length. Maybe you should try to implement it first and move to a more complex ML approach if C&T approach doesn't meet your requirements.
Basically, the idea is to build a language model using only the X (e.g. X = 300) most frequent n-grams of variable length (e.g. 1 <= N <= 5). Doing so, you are very likely to capture most functional words/morphemes of the considered language... without any prior linguistic knowledge on that language!
Why would you choose character n-grams over a BoW approach? I think the notion of character n-gram is pretty straightforward and apply to every written language. Word, is a much much complex notion which greatly differ from one language to another (consider languages with almost no spacing marks).
Reference: http://odur.let.rug.nl/~vannoord/TextCat/textcat.pdf
The performance really depends on your expected input. If you will be classifying multi-paragraph text all in one language, a functional words list (which your "bag of words" with pruning of hapaxes will quickly approximate) might well serve you perfectly, and could work better than n-grams.
There is significant overlap between individual words -- "of" could be Dutch or English; "and" is very common in English but also means "duck" in the Scandinavian languages, etc. But given enough input data, overlaps for individual stop words will not confuse your algorithm very often.
My anecdotal evidence is from using libtextcat on the Reuters multilingual newswire corpus. Many of the telegrams contain a lot of proper names, loan words etc. which throw off the n-gram classifier a lot of the time; whereas just examining the stop words would (in my humble estimation) produce much more stable results.
On the other hand, if you need to identify short, telegraphic utterances which might not be in your dictionary, a dictionary-based approach is obviously flawed. Note that many North European languages have very productive word formation by free compounding -- you see words like "tandborstställbrist" and "yhdyssanatauti" being coined left and right (and Finnish has agglutination on top -- "yhdyssanataudittomienkinkohan") which simply cannot be expected to be in a dictionary until somebody decides to use them.
I obtained several statistics from runs of Z3. I need to understand what these mean.
I am rather rusty and non up to date for the recent developments of sat and SMT solving, for this reason I tried to find explanations myself and I might be dead wrong.
So my questions are mainly:
1) What do the measures' names mean?
2) If wrong, can you give me pointers to understand better to what they refer to?
Other observations are made below and conceptually belong to the two questions above.
Thanks in advance!
My interpretation follows.
DPLL. All the metrics below refer to the jargon of the DPLL algorithm which is the foundation of most solvers.
:decisions
Number of decisions
:propagations
Number of propagations (I guess unit propagations)
:binary-propagations, :ternary-propagations
Propagations of two and three literals at once
:conflicts
Number of conflicts
RESOLUTION. Operations made interpreting clauses as sets, roughly speaking; techniques taken from resolution which is another paradigm for solving SAT.
:subsumed
:subsumption-resolution
What is the difference between the above two?
:dyn-subsumption-resolution
Should be described here: Learning for Dynamic Subsumption, by Hamadi et al.
OTHER TECHNIQUES
:minimized-lits
No clear idea. Is it probably related with clause learning?
:probing-assigned
I guess it counts the number of assignment made when "probing", which I guess is some kind of lookahead technique.
:del-clause
Number of deleted clauses (for what reason? Redundant?)
:elim-literals :elim-clauses :elim-bool-vars :elim-blocked-clauses
Number of entities after the elim- eliminated.
These metrics refer to particular SAT solving techniques
(see for reference Blocked Clause Elimination, by M.Järvisalo et al.)
:restarts
Number of restarts.
OTHER ASPECTS
:mk-bool-var :mk-binary-clause :mk-ternary-clause :mk-clause
Number of boolean variables and binary,ternary and generic clauses created.
:memory
Maximum amount of memory used.
:gc-clause
Garbage-collected clauses ...?
This interpretation is plausible according to my experiments since it's always the case that
:gc-clause <= :del-clause ; in my case the disequality is strict.
It is not always the case that
:gc-clause<=:elim-clauses; it can also be :gc-clause > :elim-clauses
I am afraid this is an open-ended question.
Z3 exposes many counters that are collected in many different ways.
While many capture abstract concepts, their meanings are ultimately
based on implementation behaviors of the code.
Fortunately the source code is available and provides the full context
for understanding the behavior of each counter. So there is no single
document that tracks the meaning of the counters, but the source code
is made available to give the full context.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Does my variable naming convention have a name?
Notation in question is described by example below:
T for type
P for pointer
F for field
A for argument
L for local
et cetera, there is at least S missing from the list, but i'm not sure which string it designates.
First 3 prefices was with Delphi since very beginning, last 2 i've noticed relatively recently. I'd like to know notation name (if any), and read some normative whitepaper (and adopt then, may be).
Zarko Gajic has a pretty good Delphi-specific list here:
http://delphi.about.com/od/standards/l/bldnc.htm
Personally, I find some conventions like this useful. I still remember my first language FORTRAN, where the convention for Integers was to start them any letter from I to N, and it was easy to remember because they are the first two letters of INteger.
Section "3.3 Field Naming" of the Object Pascal Style Guide by Charles Calvert gives a brief but good guide as to when to use Hungarian notation, and also what single character identifier names are appropriate. My FORTRAN background (8 character names max) also made me use "N" as the count of items and led to code such as:
DO 10 I = 1, N
DO 20 J = I, N
...
20 CONTINUE
10 CONTINUE
Ouch! The memories hurt.
My personal favorite of all these standards, is to obey the standards already established in the code you're in, and not try to impose a different standard 50% of the way through, and to religiously avoid bikeshed discussions.
But if you press me really hard, I'll admit, I prefer Charlie Calvert's standards as used by JVCL devs, same as "section 3.3" link by LKessler above.
Hungarian notation.
With modern IDEs (including Delphi's) many people (myself included) feel it is no longer necessary.
EDIT: Technically this is not true Hungarian notation, as sometimes the prefix indicates the scope rather than the type.