I'm currently using this formula to highlight duplicates in my spreadsheet.
=ARRAYFORMULA(COUNTIF(A$2:$A2,$A2)>1)
Quite simple, it allows me to skip the first occurrence and only highlight 2nd, 3rd, ... occurrences.
I would like the formula to go a bit further and highlight near duplicates as well.
Meaning if there is only one character difference between 2 cells, then it should be considered as a duplicate.
For instance: "Marketing", "Marketng", "Marketingg" and "Market ing" would all be considered the same.
I've made a sample sheet in case my requirement is not straightforward to understand.
Thanks in advance.
Answer
Unfortunately, it is not possible to do this only through Formulas. Apps Scripts are need as well. The process for achieving your desired results is described below.
In Google Sheets, go to Extensions > Apps Script, paste the following code1 and save.
function TypoFinder(range, word) { // created by https://stackoverflow.com/users/19361936
if (!Array.isArray(range) || word == "") {
return false;
}
distances = range.map(row => row.map(cell => Levenshtein(cell, word))) // Iterate over range and check Levenshtein distance.
var accumulator = 0;
for (var i = 0; i < distances.length; i++) {
if (distances[i] < 2) {
accumulator++
} // Keep track of how many times there's a Levenshtein distance of 0 or 1.
}
return accumulator > 1;
}
function Levenshtein(a, b) { // created by https://stackoverflow.com/users/4269081
if (a.length == 0) return b.length;
if (b.length == 0) return a.length;
// swap to save some memory O(min(a,b)) instead of O(a)
if (a.length > b.length) {
var tmp = a;
a = b;
b = tmp;
}
var row = [];
// init the row
for (var i = 0; i <= a.length; i++) {
row[i] = i;
}
// fill in the rest
for (var i = 0; i < b.length; i++) {
var prev = i;
for (var j = 0; j < a.length; j++) {
var val;
if (b.charAt(i) == a.charAt(j)) {
val = row[j]; // match
} else {
val = Math.min(row[j] + 1, // substitution
prev + 1, // insertion
row[j + 1] + 1); // deletion
}
row[j] = prev;
prev = val;
}
row[a.length] = prev;
}
return row[a.length];
}
In cell B1, enter =TypoFinder($A$2:$A2,$A2). Autofill that formula down the column by draggin.
Create a conditional formatting rule for column A. Using Format Rules > Custom Formula, enter =B2:B.
At this point, you might wish to hide column B. To do so, right click on the column and press Hide Column.
The above explanation assumes the column you wish to highlight is Column A and the helper column is column B. Adjust appropriately.
Note that I have assumed you do not wish to highlight repeated blank columns as duplicate. If I am incorrect, remove || word == "" from line 2 of the provided snippet.
Explanation
The concept you have described is called Levenshtein Distance, which is a measure of how close together two strings are. There is no built-in way for Google Sheets to process this, so the Levenshtein() portion of the snippet above implements a custom function to do so instead. Then the TypoFinder() function is built on top of it, providing a method for evaluating a range of data against a specified "correct" word (looking for typos anywhere in the range).
Next, a helper column is used because Sheets has difficulties parsing custom formulas as part of a conditional formatting rule. Finally, the rule itself is implemented to check the helper column's determination of whether the row should be highlighted or not. Altogether, this highlights near-duplicate results in a specified column.
1 Adapted from duality's answer to a related question.
Let's say I want to process a variadic function which alternately gets passed start and end values of 1 or more intervals and it should return a range of random values in those intervals. You can imagine the input to be a flattened sequence of tuples, all tuple elements spread over one single range.
import std.meta; //variadic template predicates
import std.traits : isFloatingPoint;
import std.range;
auto randomIntervals(T = U[0], U...)(U intervals)
if (U.length/2 > 0 && isFloatingPoint!T && NoDuplicates!U.length == 1) {
import std.random : uniform01;
T[U.length/2] randomValues;
// split and iterate over subranges of size 2
foreach(i, T start, T end; intervals.chunks(2)) { //= intervals.slide(2,2)
randomValues[i] = uniform01 * (end - start) + start,
}
return randomValues.dup;
}
The example is not important, I only use it for explanation. The chunk size could be any finite positive size_t, not only 2 and changing the chunk size should only require changing the number of loop-variables in the foreach loop.
In this form above it will not compile since it would only expect one argument (a range) to the foreach loop. What I would like is something which rather automatically uses or infers a sliding-window as a tuple, derived from the number of given loop-variables, and fills the additional variables with next elements of the range/array + allows for an additional index, optionally. According to the documentation a range of tuples allows destructuring of the tuple elements in place into foreach-loop-variables so the first thing, I thought about, is turning a range into a sequence of tuples but didn't find a convenience function for this.
Is there a simple way to loop over destructured subranges (with such a simplicity as shown in my example code) together with the index? Or is there a (standard library) function which does this job of splitting a range into enumerated tuples of equal size? How to easily turn the range of subranges into a range of tuples?
Is it possible with std.algorithm.iteration.map in this case (EDIT: with a simple function argument to map and without accessing tuple elements)?
EDIT: I want to ignore the last chunk which doesn't fit into the entire tuple. It just is not iterated over.
EDIT: It's not, that I couldn't program this myself, I only hope for a simple notation because this use case of looping over multiple elements is quite useful. If there is something like a "spread" or "rest" operator in D like in JavaScript, please let me know!
Thank you.
(Added as a separate answer because it's significantly different from my previous answer, and wouldn't fit in a comment)
After reading your comments and the discussion on the answers thus far, it seems to me what you seek is something like the below staticChunks function:
unittest {
import std.range : enumerate;
size_t index = 0;
foreach (i, a, b, c; [1,2,3,1,2,3].staticChunks!3.enumerate) {
assert(a == 1);
assert(b == 2);
assert(c == 3);
assert(i == index);
++index;
}
}
import std.range : isInputRange;
auto staticChunks(size_t n, R)(R r) if (isInputRange!R) {
import std.range : chunks;
import std.algorithm : map, filter;
return r.chunks(n).filter!(a => a.length == n).map!(a => a.tuplify!n);
}
auto tuplify(size_t n, R)(R r) if (isInputRange!R) {
import std.meta : Repeat;
import std.range : ElementType;
import std.typecons : Tuple;
import std.array : front, popFront, empty;
Tuple!(Repeat!(n, ElementType!R)) result;
static foreach (i; 0..n) {
result[i] = r.front;
r.popFront();
}
assert(r.empty);
return result;
}
Note that this also deals with the last chunk being a different size, if only by silently throwing it away. If this behavior is undesirable, remove the filter, and deal with it inside tuplify (or don't, and watch the exceptions roll in).
chunks and slide return Ranges, not tuples. Their last element can contain less than the specified size, whereas tuples have a fixed compile time size.
If you need destructuring, you have to implement your own chunks/slide that return tuples. To explicitly add an index to the tuple, use enumerate. Here is an example:
import std.typecons, std.stdio, std.range;
Tuple!(int, int)[] pairs(){
return [
tuple(1, 3),
tuple(2, 4),
tuple(3, 5)
];
}
void main(){
foreach(size_t i, int start, int end; pairs.enumerate){
writeln(i, ' ', start, ' ', end);
}
}
Edit:
As BioTronic said using map is also possible:
foreach(i, start, end; intervals
.chunks(2)
.map!(a => tuple(a[0], a[1]))
.enumerate){
Your question has me a little confused, so I'm sorry if I've misunderstood. What you're basically asking is if foreach(a, b; [1,2,3,4].chunks(2)) could work, right?
The simple solution here is to, as you say, map from chunk to tuple:
import std.typecons : tuple;
import std.algorithm : map;
import std.range : chunks;
import std.stdio : writeln;
unittest {
pragma(msg, typeof([1,2].chunks(2).front));
foreach(a, b; [1,2,3,4].chunks(2).map!(a => tuple(a[0], a[1]))) {
writeln(a, ", ", b);
}
}
At the same time with BioTronic, I tried to code some own solution to this problem (tested on DMD). My solution works for slices (BUT NOT fixed-size arrays) and avoids a call to filter:
import std.range : chunks, isInputRange, enumerate;
import std.range : isRandomAccessRange; //changed from "hasSlicing" to "isRandomAccessRange" thanks to BioTronics
import std.traits : isIterable;
/** turns chunks into tuples */
template byTuples(size_t N, M)
if (isRandomAccessRange!M) { //EDITED
import std.meta : Repeat;
import std.typecons : Tuple;
import std.traits : ForeachType;
alias VariableGroup = Tuple!(Repeat!(N, ForeachType!M)); //Tuple of N repititions of M's Foreach-iterated Type
/** turns N consecutive array elements into a Variable Group */
auto toTuple (Chunk)(Chunk subArray) #nogc #safe pure nothrow
if (isInputRange!Chunk) { //Chunk must be indexable
VariableGroup nextLoopVariables; //fill the tuple with static foreach loop
static foreach(index; 0 .. N) {
static if ( isRandomAccessRange!Chunk ) { // add cases for other ranges here
nextLoopVariables[index] = subArray[index];
} else {
nextLoopVariables[index] = subArray.popFront();
}
}
return nextLoopVariables;
}
/** returns a range of VariableGroups */
auto byTuples(M array) #safe pure nothrow {
import std.algorithm.iteration : map;
static if(!isInputRange!M) {
static assert(0, "Cannot call map() on fixed-size array.");
// auto varGroups = array[].chunks(N); //fixed-size arrays aren't slices by default and cannot be treated like ranges
//WARNING! invoking "map" on a chunk range from fixed-size array will fail and access wrong memory with no warning or exception despite #safe!
} else {
auto varGroups = array.chunks(N);
}
//remove last group if incomplete
if (varGroups.back.length < N) varGroups.popBack();
//NOTE! I don't know why but `map!toTuple` DOES NOT COMPILE! And will cause a template compilation mess.
return varGroups.map!(chunk => toTuple(chunk)); //don't know if it uses GC
}
}
void main() {
testArrayToTuples([1, 3, 2, 4, 5, 7, 9]);
}
// Order of template parameters is relevant.
// You must define parameters implicitly at first to be associated with a template specialization
void testArrayToTuples(U : V[], V)(U arr) {
double[] randomNumbers = new double[arr.length / 2];
// generate random numbers
foreach(i, double x, double y; byTuples!2(arr).enumerate ) { //cannot use UFCS with "byTuples"
import std.random : uniform01;
randomNumbers[i] = (uniform01 * (y - x) + x);
}
foreach(n; randomNumbers) { //'n' apparently works despite shadowing a template parameter
import std.stdio : writeln;
writeln(n);
}
}
Using elementwise operations with the slice operator would not work here because uniform01 in uniform01 * (ends[] - starts[]) + starts[] would only be called once and not multiple times.
EDIT: I also tested some online compilers for D for this code and it's weird that they behave differently for the same code. For compilation of D I can recommend
https://run.dlang.io/ (I would be very surprised if this one wouldn't work)
https://www.mycompiler.io/new/d (but a bit slow)
https://ideone.com (it works but it makes your code public! Don't use with protected code.)
but those didn't work for me:
https://tio.run/#d2 (didn't finish compilation in one case, otherwise wrong results on execution even when using dynamic array for the test)
https://www.tutorialspoint.com/compile_d_online.php (doesn't compile the static foreach)
Is there a fast (native) method to search for a sequence in a Uint8List?
///
/// Return index of first occurrence of seq in list
///
int indexOfSeq(Uint8List list, Uint8List seq) {
...
}
EDIT: Changed List<int> into Uint8List
No. There is no built-in way to search for a sequence of elements in a list.
I am also not aware of any dart:ffi based implementations.
The simplest approach would be:
extension IndexOfElements<T> on List<T> {
int indexOfElements(List<T> elements, [int start = 0]) {
if (elements.isEmpty) return start;
var end = length - elements.length;
if (start > end) return -1;
var first = elements.first;
var pos = start;
while (true) {
pos = indexOf(first, pos);
if (pos < 0 || pos > end) return -1;
for (var i = 1; i < elements.length; i++) {
if (this[pos + i] != elements[i]) {
pos++;
continue;
}
}
return pos;
}
}
}
This has worst-case time complexity O(length*elements.length). There are several more algorithms with better worst-case complexity, but they also have larger constant factors and more expensive pre-computations (KMP, BMH). Unless you search for the same long list several times, or do so in a very, very long list, they're unlikely to be faster in practice (and they'd probably have an API where you compile the pattern first, then search with it.)
You could use dart:ffi to bind to memmem from string.h as you suggested.
We do the same with binding to malloc from stdlib.h in package:ffi (source).
final DynamicLibrary stdlib = Platform.isWindows
? DynamicLibrary.open('kernel32.dll')
: DynamicLibrary.process();
final PosixMalloc posixMalloc =
stdlib.lookupFunction<Pointer Function(IntPtr), Pointer Function(int)>('malloc');
Edit: as lrn pointed out, we cannot expose the inner data pointer of a Uint8List at the moment, because the GC might relocate it.
One could use dart_api.h and use the FFI to pass TypedData through the FFI trampoline as Dart_Handle and use Dart_TypedDataAcquireData from the dart_api.h to access the inner data pointer.
(If you want to use this in Flutter, we would need to expose Dart_TypedDataAcquireData and Dart_TypedDataReleaseData in dart_api_dl.h https://github.com/dart-lang/sdk/issues/40607 I've filed https://github.com/dart-lang/sdk/issues/44442 to track this.)
Alternatively, could address https://github.com/dart-lang/sdk/issues/36707 so that we could just expose the inner data pointer of a Uint8List directly in the FFI trampoline.
Is there any way to generate random numbers without duplication?
For instance I want to generate 50 random numbers from 1 to 100 no duplication, any way to do this or do I have to check every time incoming number is already created or not?
you can use shuffle as following code.
import 'dart:math';
var list = new List<int>.generate(10, (int index) => index); // [0, 1, 4]
list.shuffle();
print(list);
You can use Set. Each object can occur only once when using it. Just try this:
Set<int> setOfInts = Set();
while (setOfInts.length < 50) {
setOfInts.add(Random().nextInt(range) + 1);
}
You can read the documentation here: Set Doc
Here is an alternative that avoids creating an array of all the possible values, and avoids repeatedly looping until no collision occurs. It may be useful when there is a large range to select from.
import 'dart:math';
class RandomList {
static final _random = new Random();
static List<int> uniqueSample({int limit, int n}) {
final List<int> sortedResult = [];
final List<int> result = [];
for (int i = 0; i < n; i++) {
int rn = _random.nextInt(limit - i); // We select from a smaller list of available numbers each time
// Increment the number so that it picks from the remaining list of available numbers
int j = 0;
for (; j < sortedResult.length && sortedResult[j] <= rn; j++) rn++;
sortedResult.insert(j, rn);
result.add(rn);
}
return result;
}
}
I haven't tested it exhaustively but it seems to work.
In Dart, I am using print() to write content to the console.
Is it possible to use print() without it automatically adding a newline to the output?
You can use stdout:
import "dart:io";
stdout.write("foo");
will print foo to the console but not place a \n after it.
You can also use stderr. See:
How can I print to stderr in Dart in a synchronous way?
You cannot customize the print() to prevent printing a newline in Dart (as in Python by using 'end' argument).
To achieve this in Dart, use stdout.write() function, like
stdout.write("<your_message>");
Have a look are the dcli package you can use the echo function.
1. By using the dart io package
import 'dart:io';
void main() {
for(int r = 0; r < 5; r++){
for(int c = 0; c < r; c++){
stdout.write('Text ');
}
print('');
}
}
Output:
Text
Text Text
Text Text Text
Text Text Text Text
2. By using the direct print() method (Support to Dartpad)
void main() {
for(int r = 0; r < 5; r++){
print('Text ' * r);
}
}
Output:
Text
Text Text
Text Text Text
Text Text Text Text
Example for when multiplying by 0, 1, 2
void main() {
print('String' * 1);
print('String' * 0);
print('String' * 2);
}
Output:
String
StringString