word game advice - xna

My background is as a linguist, and so for new years decide to learn computer language, c#, out of interest and so that I can make small word games for my children and students.
I have started to look at word games and have been reading about using char types and char arrays which I have been playing with so I have been able to generate the alphabet.
What I really want to do is have a word appear with random letters missing and then letters of the alphabet appears and player needs to select correct letter to complete word.
I am not after code, as an educator I am not fond of cheating, just advice on where I could start, what should I be reading about so that I can achieve what I described.
Many thanks in advance for help and advice

If I understand your question correctly, you're looking for an algorithm (or even pseudocode) rather than code or anything else. If I was to implement a game as you described, I would go about it the following fashion:
Select a word from a list. This "dictionary" could be as simple as a text file containing different words, or a more complex database of all words in the English language.
Pick a letter from the word and remove it.
Ask the user for the missing letter. Keep asking until they guess correctly or they run out of guesses.
Rise and repeat.
This is a pretty simple game, which uses pretty basic concepts. I believe the XNA would be complete overkill in this situation. Like Mustafa mentioned in the comment on the original post, XNA provides a framework that makes game progamming easier because it provides templates, but it also adds a lot of overhead and needless complexity (especially for a novice programmer.) Since you're coming from a non-programming background, I would suggest Python or Ruby as a good starting language, and suggest looking into the following topics:
Reading from file (the "dictionary" mentioned above)
Loops, specifically for-loops and while-loops or the language equivalent (to allow the user to keep guessing until they run out of guesses or guess correctly)
Command-line input/output (IO) -- print to screen and read input from the console.
Arrays and Strings
Once you've built out a working command-line application, then I would suggest looking into things like Graphical User Interfaces (GUIs) and making it look "pretty."

Related

spell checker uses language model

I look for spell checker that could use language model.
I know there is a lot of good spell checkers such as Hunspell, however as I see it doesn't relate to context, so it only token-based spell checker.
for example,
I lick eating banana
so here at token-based level no misspellings at all, all words are correct, but there is no meaning in the sentence. However "smart" spell checker would recognize that "lick" is actually correctly written word, but may be the author meant "like" and then there is a meaning in the sentence.
I have a bunch of correctly written sentences in the specific domain, I want to train "smart" spell checker to recognize misspelling and to learn language model, such that it would recognize that even thought "lick" is written correctly, however the author meant "like".
I don't see that Hunspell has such feature, can you suggest any other spell checker, that could do so.
See "The Design of a Proofreading Software Service" by Raphael Mudge. He describes both the data sources (Wikipedia, blogs etc) and the algorithm (basically comparing probabilities) of his approach. The source of this system, After the Deadline, is available, but it's not actively maintained anymore.
One way to do this is via a character-based language model (rather than a word-based n-gram model). See my answer to Figuring out where to add punctuation in bad user generated content?. The problem you're describing is different, but you can apply a similar solution. And, as I noted there, the LingPipe tutorial is a pretty straightforward way of developing a proof-of-concept implementation.
One important difference - to capture more context, you may want to train a larger n-gram model than the one I recommended for punctuation restoration. Maybe 15-30 characters? You'll have to experiment a little there.

Find almost-duplicate strings in Objective-C on iOS

I have a list of song tracks that I uploaded from the iTunes API. Some of them are duplicates, but not perfect duplicates. For example, one might say "All 4 u" vs "All for you", or "Some song" vs "some song feat. some other artist"
I want to be able to identify the duplicates. Is the best way to compute the Levenshtein distance for all pairs? That seems excessive.
I'm working in the Cocoa Touch framework for iOS programming so if anyone knows of any libraries that would help a lot.
Why do you consider computing the Levenshtein distance excessive? What algorithm would you use if you were sitting down to a list with pencil and paper?
That said, Levenshtein is likely necessary, but not sufficient. I would start by normalizing the strings. In some cases, a string might normalize a couple of ways and you'll need to do both. Normalization would look like:
convert to lowercase
Strip any leading numbers followed by punctuation ( "1.", "1 - ", etc.)
Tentatively strip anything after "feat." or "with"
This is an example of special knowledge about your problem set. You're going to have to use a lot of special knowledge like this.
"Tentatively" means you should probably keep both the stripped and non-stripped versions of the string
Keep in mind that things including "feat." might be remixes, so you have to be careful about assuming duplicates. This is of course true of almost any attempt at de-dupping. There are often multiple versions.
Tentatively expand common abbreviations (u=>you, 4=>for, 2=>two, w/=>with, etc. etc.)
Tentatively strip anything in parentheses
Strip English articles (a, an, the). Maybe even strip all very short words (3 or less characters) as a first pass.
Doing this well is complicated and will require a lot of trial and error. I've done a lot of contact de-dupping in the past, and one piece of advice: start conservative. It is very easy to accidentally de-dupe way too much. Build a big list of test data that you've de-duped by hand and test, test, test after every algorithm change. Make sure your UI can present the user with anything you're uncertain about, because there are going to be many, many records that you can't be certain about. (This is true even when you do it by hand. Look at a big list of human-entered titles and tell me which ones are duplicates 100% without listening to the tracks. A computer isn't going to do better than you at this.)
I'm not aware of any publicly available library for this. It's been solved by many people many times (search for "dedupe song titles" or anything similar). But it's generally commercial software.
One more piece of advice for this, since it's a huge O(n^2) or worse problem. Look for bucketing opportunities. If you can match artists first, then albums, then tracks, you can divide and conquer in much less time.

What's the fastest way to read a large file in Ruby?

I've seen answers to this question but I couldn't figure out which of the answers would perform the fastest. These are the answers I've seen- which is best?
Read one line at a time using each or each_line
Read one line at a time using gets
Save it all into an array of lines using readlines and then use each
Use grep (not sure what exactly to do with grep...)
Use sed (not sure what exactly to do with sed...)
Something else?
Also, would it be better to just use another language or should Ruby be fine?
EDIT:
More details: Each line contains something like "id1 attr1_1 attr2_1 id2 attr1_2 attr2_2... idn attr1_n attr2_n" (n is very big) and I need to insert those into a database. For that example line, I would need to insert n rows into the database.
Ruby will likely be using the same or very similar low-level code (written in C) to do the actual reading from disk for the first three options, so they should perform similarly. Given that, you should choose whichever is most convenient for you; the ability to do that is what makes languages like Ruby so useful! You will be reading a lot of data from disk, so I would suggest using each_line and processing each line as you read it.
I would not recommend bringing grep, sed, or any other such external utilities into the picture unless you have a very good reason, as they will make your code less portable and expose you to failures that may be difficult to diagnose.
If you're using Ruby then there's no need to worry about performance. The language is such that it suits an iterative approach to reading a file, line by line, and works very nicely. So long as you're using the language the way it's designed you can let the interpreter people worry about performance. Job done.
If one particular readLargeFileFast method is needed then it should be because it's really hindering the program somehow. Now, you write a C program to do it and popen it as a separate process within your ruby code. You could call it read_large.c and (perhaps) use command line arguments to tell it how to behave.
This is championing the idea that a scripting language is used for a fast development rather than a fast run time. As such a developer can be very productive by swiftly 'prototyping' a program in something like Ruby and only later rewriting the components warrant some low level code. Often, however, once it's working in script, it's not necessary to do anything else at all.
The Ruby Docs describe launching a separate process and treating it as a file. It's easy-peasy! A good start is The Art of Linux Programming's introductory paragraph on program modularity. This book also makes a great example of using linux's standard stream editor, called sed, which you could probably use from Ruby right now.
If you need to parse or edit a lot of text then many interpreters or editors have been written around sed's functionality. Further, it may save you a lot of effort writing something super efficient if you don't know C. Good is the Introduction to SED by Bruce Barnett.

File Path Name or URL analysis

I am looking for information on tools, methods, techniques for analysis of file path names. I am not talking file size, read/write times, or file types, but analysis of the path or URL it self.
I am only aware of basic word frequency text tools or methods, but I am wondering if there is something more advanced that people use/apply to this to try and mine extra information out of them.
Thanks!
UPDATE:
Here is the most narrow example of what I would want. OK, so I have some full path names as strings like this:
F:\Task_Order_Projects\TO_01_NYS\Models\MapShedMaps\Random_File1.doc
F:\Task_Order_Projects\TO_01_NYS\Models\MapShedMaps\Random_File2.doc
F:\Task_Order_Projects\TO_01_NYS\Models\MapShedMaps\Random_File3.doc
F:\Task_Order_Projects\TO_01_NYS\Models\MapShedMaps\Random_File4.doc
F:\Task_Order_Projects\TO_01_NYS\Models\MapShedMaps\Random_File5.doc
F:\Task_Order_Projects\TO_02_NYS\Models\MapShedMaps\Random_File1.doc
F:\Task_Order_Projects\TO_02_NYS\Models\MapShedMaps\Random_File2.doc
F:\Task_Order_Projects\TO_02_NYS\Models\MapShedMaps\Random_File3.doc
F:\Task_Order_Projects\TO_02_NYS\Models\MapShedMaps\Random_File4.doc
F:\Task_Order_Projects\TO_02_NYS\Models\MapShedMaps\Random_File5.doc
What I want to know is that the folder MapShedMaps appears "uniquely" 2 times. If I do frequency on the strings I would get 10 appearances. The issues is that I don’t know what level in the directory this is important, so I would like a unique count at each level of the directory based on what I am describing.
This is an extremely broad question so it is difficult for me to give you a per say "Answer" but I will give you my first thoughts on this.
First,
the Regular expression class of .NET is extremely useful for parsing large amounts of information. It is so powerful that it will easily confuse the impatient, however once mastered it can be used across text editors, .NET and pretty much any other respectable language I believe. This would allow you to search strings and separate it into directories. This could be overkill depending on how you use it, but its a thought. Here is a favorite link of mine to try out some regular expressions.
Second,
You will need a database, I prefer to use SQL. Look into how to connect to databases and create databases. With this database you can store all the fields abstracted from your original path entered. Such as a parent directory, child directory, common file types accessed. Just have a field for each one of these and through queries you can form a hypothesis as to redundancy.
Third,
I don't know if its easily accessible but you might look into whether windows stores accessed file history. It seems to have some inkling as to which files have been opened in the past. So there may be a resource in windows which already stores much of the information you would be storing in your database. If you could find a way to access this information. Parse it with regular expressions and resubmit it to the database of your application. You could control the WORLD! j/k... You could get a pretty good prediction as to user access patterns though.
Fourth,
I always try to stick with what I have available. If .NET is sitting in front of you, hammer away at what your trying to do. If you reach a wall. At least your making forward progress. In today's motion towards object orientated programming, you can usually change data collected by one program into an acceptable format for another. You just gotta dig a little.
Oh and btw, Coursera.com is actually doing a free class on machine learning and algorithms. You might want to check it out or reference it for prediction formulas.
Good Luck.
I wanted to post this as a comment but SO kept editing the double \ to \ and it is important there are two because \ is a key character, without another \ to escape it, regex will interpret it as a command.
Hey I just wanted to let you know I've been playing with some regex... I know a pretty easy way to code this up in VB.net and I'll post that as my second answer but I wanted you to check out back-references. If the part between parenthesis matches it captures that text and moves on to the second query for instance....
F:\\(directory1)?(directory2)?(directory3)?
You could use these matches to find out how many directories each parent directory has under it. Are you following me? Here is a reference.

Designing a Non-Specific Language Application, e.g. planning for localization

Made this community wiki :3
I'm developing a basic RPG, and one of my goals from the beginning is to make sure that my program is language non-specific. Basically, before I design or start programming any menus, I want to make sure that I can load and display them out of supported languages so I am not hard-coding in values.
(It would save me from many migranes down the road)
For this example, let's use Western Left-to-Right languages. English, Spanish, German, French, Italian.
This is a basic example of what I have.
One XML file contains a mapping and design of a conversation.
<conversation>
<dialog>line1</dialog>
<dialog>line2</dialog>
</conversation>
Other XML files contains the definitions.
<mappings language="English">
<line1>This is line 1 in English!</line1>
<line2>Other lines are contained in language-separated xml files</line2>
</mappings>
Heh. This would work great, besides the fact that I forgot that English doesn't assign genders to their words, whereas other languages do. So, where one sentence might be enough in English, I might need to have two sentences in other languages, one to cover the masuline tense and the other to cover the feminine tense.
What would be the most condusive way of solving this problem? Right now, I've considered coming up with different mapping tables, one excuslively for masculine-tense sentences whereas the other table would cover just feminine-tenses. Or just reading from different defintion tables.
And another kicker would be based within my game data design. I never thought about it, but I might need to store within my game items and characters their sexes so I can use the correct sentence. However, other languages might have their own specific quirks that I would need to consider as well (though thankfully, from what I know Italian and Spanish are relatively similar, and French possibly as well.)
So, obviously this is a huge task ahead of me. What other design considerations should I think of? Rightnow, I'm thinking a static class would be easiest. Configure selected language at startup, throw in inputs and hopefully get a string back.
Any ideas (looking to throw ideas around :P)
There's two general ways to approach this: brute force and trying to be clever. Brute force means writing each possible line and including it with your XML files. It's a lot of work, but it will work.
Trying to be clever gets into deep water, fairly fast, particularly if you're trying to cover a whole lot of languages.
You need to keep more information about characters than gender. In Russian, for example, there are different words meaning "you" depending on whether you're being informal or formal (or talking to multiple people), and the verb endings are also different. There are different translations of "please pass the bread" depending on the formality. In other languages, getting the translation right depends on social status.
There are issues, as pawel_dyda pointed out, with singular, plural, and possibly dual case. Other languages also use different word orders: "The arrows are X coppers each, so to buy Y arrows you'll need Z silver" may require you to keep track of the order of the numbers.
Visual C++ and MFC come with internationalization facilities that are actually pretty good. You'd keep the strings in a resource file, and it's possible to substitute numbers and the like in while keeping the order correct for different languages.
Look up "internationalization" (often abbreviated to "i18n") on the web. There's plenty of stuff out there.
As for genders you may try encourage translators to use non-gender specific translations (which is usually possible in business applications but might be impossible here).
You may have also encounter the problem somewhere else. Other (non-English) languages have multiple plural forms. For example: "Your team has acquired 2 swords". No matter how many swords you will actually receive, be it 5 or 1000, in English you will always end up with one plural sentence. But this is not the case in many languages.

Resources