Validate User Inputted Text is Family-Friendly - ios

I'm working on an iOS app that involves user input, and I'd like to keep it kid-friendly. One of the main features of the app is that user inputted titles and phrases can be shown to everyone who uses the app.
When a user creates a new title I want to verify that it is safe-for-work. My initial thought was just to have a list of all profane words and verify that none of them exist in the title:
for bad_word in list_of_bad_words:
if bad_word in user_inputted_title:
// Complain to user!
// Title is okay.
I imagine that there must be libraries or best practices for doing this. People could easily substitute numbers for letters, and I'm sure there are sequences of SFW words that create inappropriate phrases.
Can anyone suggest a better way of doing this? Specifically, if there are any Swift tools that would be awesome!

There are some cocoapods for this:
https://github.com/IslandOfDoom/IODProfanityFilter
https://github.com/MaxKramer/SCRProfanityChecker
I haven't used either of these personally, but I hope these can be a good starting point. The first one replaces any profanity with asterisks, and the second can give you the range of the profanity so you can replace it with your own filler. Good luck.

Related

iOS DDMathParser Get Any Occurrences of (...) In String

I am using DDMathParser in my app, and have recently come across the need to get occurrences of any group of numbers within a () parentheses bracket thingy (very highly technical!). For example, I would need to get (6+5) out of 6+7/8(6+5). Specifically, I would like to be able to do this so that I can make (56+9)sqrt compile just as well as sqrt(56+9). Any help?
P.S. I know that the maker of DDMathParser is often sighted in this neck of the woods. I am secretly hoping that he will come to the rescue and either fix my problem so I can implement it myself or him make it part of DDMathParser! :)
So, I've thought a lot about this question since you posted it a month ago. From what I understand, you're constructing a string as the user clicks/taps buttons.
I think this is your problem.
As the user taps buttons, you should be constructing (or modifying) DDExpression objects. This is the "pure" format of a math expression, whereas a string is lossy and difficult to manipulate. The string you show to the user should be generated from the DDExpression tree you're building.
This is a complex problem, and I'm still not entirely sure how I would go about implementing this, but this is the root of how I'd do it. I would not just construct a string based on what the user types.

Implementing Autocomplete in iOS

I am creating an application where I need to implement autocompletion when a user is typing into an text input, with the 10 nearest/highest ranking words appearing below the text field.
I've been given a fairly big list of around 80,000 words and their respective 'priority' - a number which determines how high up they appear in the autocomplete depending on the size of the number, like this:
"transport international";19205
"taxi";18462
"location de voitures";18160
"police";18126
"formation";17858
I am kinda new to iOS development and was wondering what is the best way to do this - should I split the 80,000 phrases into smaller files, or just keep it in one? What would be faster?
I have seen autocompletion used in an example for iOS but it was for a very small amount of suggestions - I haven't seen it done using a file this large before, and obviously I would like to make it as fast as possible for added user experience.
Any suggestions as to examples, tutorials or code suggestions would be greatly appreciated, thanks.
If you prefer something that does autocomplete but is a direct subclass of UITextField, then MLPAutoCompleteTextField may be of interest to you.
MLPAutoCompleteTextField works by simply asking its autocomplete datasource for an array of autocomplete suggestions each time the text in the textfield changes. It can even automatically sort words so that the ones closest to what the user is typing will appear at the top of the autocomplete list (using a Levenshtein Distance algorithm). Autocomplete suggestions can be simple strings, or objects that implement MLPAutoCompletionObject protocol.
Tip: For a large dataset of autocomplete terms, you'll probably want to break up your list based on starting letters. (Example: When the user enters the letter F, you give the autocomplete textfield only a list of words that start with F.)
MLPAutoCompleteTextField can efficiently sort several thousand suggestions in a reasonable amount of time, and will never block the UI while it sorts.
At the moment, weighted suggestions (that override the default sorting) aren't possible but it's a planned feature.
You may want to use this repo HTAutocompleteTextField, perfect solution.
https://github.com/TarasRoshko/TRAutocompleteView
Just conform TRAutocompleteItemsSource protocol and that's it. Protocol is designed with async support in mind. Demo app and sample TRGoogleMapsAutocompleteItemsSource should greatly help you with it.
This link worked well for me. Depending on your code, just don't miss the difference between UITextField and UITextView.
No extra libraries, just an easy custom UITableView and search function.

Profanity filter import

I am looking to write a basic profanity filter in a Rails based application. This will use a simply search and replace mechanism whenever the appropriate attribute gets submitted by a user. My question is, for those who have written these before, is there a CSV file or some database out there where a list of profanity words can be imported into my database? We are submitting the words that we will replace the profanities with on our own. We more or less need a database of profanities, racial slurs and anything that's not exactly rated PG-13 to get triggered.
As the Tin Man suggested, this problem is difficult, but it isn't impossible. I've built a commercial profanity filter named CleanSpeak that handles everything mentioned above (leet speak, phonetics, language rules, whitelisting, etc). CleanSpeak is capable of filtering 20,000 messages per second on a low end server, so it is possible to build something that works well and performs well. I will mention that CleanSpeak is the result of about 3 years of on-going development though.
There are a few things I tell everyone that is looking to try and tackle a language filter.
Don't use regular expressions unless you have a small list and don't mind a lot of things getting through. Regular expressions are relatively slow overall and hard to manage.
Determine if you want to handle conjugations, inflections and other language rules. These often add a considerable amount of time to the project.
Decide what type of performance you need and whether or not you can make multiple passes on the String. The more passes you make the slow your filter will be.
Understand the scunthrope and clbuttic problems and determine how you will handle these. This usually requires some form of language intelligence and whitelisting.
Realize that whitespace has a different meaning now. You can't use it as a word delimiter any more (b e c a u s e of this)
Be careful with your handling of punctuation because it can be used to get around the filter (l.i.k.e th---is)
Understand how people use ascii art and unicode to replace characters (/ = v - those are slashes). There are a lot of unicode characters that look like English characters and you will want to handle those appropriately.
Understand that people make up new profanity all the time by smashing words together (likethis) and figure out if you want to handle that.
You can search around StackOverflow for my comments on other threads as I might have more information on those threads that I've forgotten here.
Here's one you could use: Offensive/Profane Word List from CMU site
Based on personal experience, you do understand that it's an exercise in futility?
If someone wants to inject profanity, there's a slew of words that are innocent in one context, and profane in another so you'll have to write a context parser to avoid black-listing clean words. A quick glance at CMU's list shows words I'd never consider rude/crude/socially unacceptable. You'll see there are many words that could be proper names or nouns, countries, terms of endearment, etc. And, there are myriads of ways to throw your algorithm off using L33T speak and such. Search Wikipedia and the internets and you can build tables of variations of letters.
Look at CMU's list and imagine how long the list would be if, in addition to the correct letter, every a could also be 4, o could be 0 or p, e could be 3, s could be 5. And, that's a very, very, short example.
I was asked to do a similar task and wrote code to generate L33T variations of the words, and generated a hit-list of words based on several profanity/offensive lists available on the internet. After running the generator, and being a little over 1/4 of the way through the file, I had over one million entries in my DB. I pulled the plug on the project at that point, because the time spent searching, even using Perl's Regex::Assemble, was going to be ridiculous, especially since it'd still be so easy to fool.
I recommend you have a long talk with whoever requested that, and ask if they understand the programming issues involved, and low-likelihood of accuracy and success, especially over the long-term, or the possible customer backlash when they realize you're censoring them.
I have one that I've added to (obfuscated a bit) but here it is: https://github.com/rdp/sensible-cinema/blob/master/lib/subtitle_profanity_finder.rb

UITextChecker is what dictionary?

Does anybody know what dictionary UITextChecker pulls from? I use it to verify that a word is in fact a valid word in an app. I have some questions from users about why specific words are available in other games (Boggle/Scrabble) but not in mine.
Examples: ai, qi, qat, xu, ae, tae, ait, ain, lav, aa, shh, za
I checked against /usr/share/dict/words and none of these words are in Websters Second International, so maybe UITextChecker uses this same source? They do show up in other dictionaries online (but this is really besides the point of the post).
Thanks for any insight!
UITextChecker may be using the same dictionary that UIReferenceLibraryViewController uses. In which case, you could use something like [UIReferenceLibraryViewController dictionaryHasDefinitionForTerm: #"term"] and if it returns true the word exists. I'm not sure how complete the built in dictionary is however.
I guess it uses the iPhone dictionary of the user, which depends on the current language/NSLocale the user is using (which is set in the "International" Settings on the iPhone). This is the behavior we observe when typing some text anywhere in the iPhone, words underlined in read (because detected by the internal UITextChecker) depends on the locale used.
If the user have activated multiple keyboards with different languages each (e.g. a French AZERTY keyboard and an US QWERTY keyboard) it depends obviously on the current language, namely the current keyboard active at this moment.
If you refer to the wordfeud dictionary... (that would be the only game I know those words from). They check their words from an online dictionary on their own server. Must be a list parsed from another spelling site or something.
I sometimes doubt the validity of some words though....

Validate words against an English dictionary in Rails?

I've done some Google searching but couldn't find what I was looking for.
I'm developing a scrabble-type word game in rails, and was wondering if there was a simple way to validate what the player inputs in the game is actually a word. They'd be typing the word out.
Is validation against some sort of English language dictionary database loaded within the app best way to solve this problem? If so, are there any libraries that offer this kind of functionality? If not, what would you suggest?
Thanks for your help!
You need two things:
a word list
some code
The word list is the tricky part. On most Unix systems there's a word list at /usr/share/dict/words or /usr/dict/words -- see http://en.wikipedia.org/wiki/Words_(Unix) for more details. The one on my Mac has 234,936 words in it. But they're not all valid Scrabble words. So you'd have to somehow acquire a Scrabble dictionary, make sure you have the right license to use it, and process it so it's a text file.
(Update: The word list for LetterPress is now open source, and available on GitHub.)
The code is no problem in the simple case. Here's a script I whipped up just now:
words = {}
File.open("/usr/share/dict/words") do |file|
file.each do |line|
words[line.strip] = true
end
end
p words["magic"]
p words["saldkaj"]
This will output
true
nil
I leave it as an exercise for the reader to make it into a proper Words object. (Technically it's not a Dictionary since it has no definitions.) Or to use a DAWG instead of a hash, even though a hash is probably fine for your needs.
A piece of language-agnostic advice here, is that if you only care about the existence of a word (which in such a case, you do), and you are planning to load the entire database into the application (which your query suggests you're considering) then a DAWG will enable you to check the existence in O(n) time complexity where n is the size of the word (dictionary size has no effect - overall the lookup is essentially O(1)), while being a relatively minimal structure in terms of memory (indeed, some insertions will actually reduce the size of the structure, a DAWG for "top, tap, taps, tops" has fewer nodes than one for "tops, tap").

Resources