How to keep only duplicate values from another list?

How to keep only duplicate values from another list? - google-sheets

Basically I am trying to keep only duplicate values that appear in another list, so that I know which products are fast moving.
For example, I have the following original list:
Confectionery
Haribo
Twix
Maoam
Refreshers
Bounty
Twix
Malteasers
Snickers
Dairymilk
Wispa
Galaxy
Twirl
Fruit-tella
Then I have another list:
Fruit-tella
7Up
Coca-Cola
Sprite
Haribo
Ribena
Twix
Bounty
Snickers
Boost
Red Bull
Mountain Dew
Effectively, I only want to keep values that appear in both lists, so in my case only the following should be taken:
Fruit-tella
Haribo
Twix
Bounty
Snickers
How would I go about doing it?
Thanks

use:
=FILTER(A:A, COUNTIF(B:B, A:A))

Related

Splunk Streamlined search for specific fields only

I've ran out of GoogleFu, so if anyone can point me in the right direction or a better term or two to Google... I'm trying to figure out Splunk SPL syntax to search 4 different fields for the same value, any match in the four fields wins, with out searching every field for the TERM(<IP>).
index="main" packets_out>0 action="allowed" TERM(192.168.2.1)
| fields src_ip, dest_ip, dest_translated_ip, src_translated_ip,packets_out
| head 10
These will always be constant: index="main" packets_out>0 action="allowed"
The IP will be the only variable that will change and I'm trying to make it as simple as possible for others to "open search, change 1 IP, click go".
This works as is, but once I try to search against prod with 2000 devices.. I'm expecting my query time will not be 1 second anymore, even with using "Fast Mode" search. I've reduced the 4 second query time to 1. Along with the size of data queried with this already, in my home lab, but I don't think this is going to scale very well.
Is there a better way to do this, besides plugging in 10-20 device names into the query like this? I would rather not have static device names, so if someone "forgets" to update the query; I'll get blamed for the external IP overlap issue.
index="main" packets_out>0 action="allowed" TERM(192.168.2.1) dvc_name="firewall1" OR dvc_name="firewall2" <*18>
| fields src_ip, dest_ip, dest_translated_ip, src_translated_ip,packets_out
| head 10
Raw log if needed:
Apr 7 23:59:55 192.168.2.1 Apr 7 23:59:55 wall 1,2021/04/07 23:59:54,012801092758,TRAFFIC,end,2560,2021/04/07 23:59:54,192.168.2.189,173.194.219.94,10.10.10.2,173.194.219.94,web_access_out-1,,,quic,vsys1,trust,untrust,ethernet1/8,ethernet1/2,splunk,2021/04/07 23:59:54,2004,1,53384,443,59427,443,0x400050,udp,allow,5528,2350,3178,15,2021/04/07 23:57:53,1,any,0,5261883,0x0,192.168.0.0-192.168.255.255,United States,0,6,9,aged-out,0,0,0,0,,wall,from-policy,,,0,,0,,N/A,0,0,0,0,f863e426-7e87-4999-b5cb-bc6dc38d788f,0,0,,,,,,,,0.0.0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,2021-04-07T23:59:55.282-04:00,,
Thanks,

Use OR:
index=ndx sourcetype=srctp (fieldA="myval" OR fieldB="myval" OR fieldC="myval")
Parenthesis added for clarity/readability

How to handle release year difference in movie recommendation

I have been part of the movie recommendation project. We have developed a doc2vec model using gensim.
You can have a look at gensim documentation if needed.
https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.WordEmbeddingsKeyedVectors.most_similar
Trained the model and when i took top 10 similar movies for a film based on cast it gives way back old movies with release_yr as (1960, 1950, ...). So i have tried including the release_yr as a parameter to gensim model but still it shows me old movies. How can i solve this release_yr difference? When I see top10 recommendations for a film I need those movies whose release_yr difference is less (like past 10 years movies not more than that). How can i do that?
code for doc2vec model
def d2v_doc(titles_df):
tagged_data = [TaggedDocument(words=_d, tags=[str(titles_df['id_titles'][i])]) for i, _d in enumerate(titles_df['doc'])]
model_d2v = Doc2Vec(vector_size=300,min_count=10, dm=1)
model_d2v.build_vocab(tagged_data)
model_d2v.train(tagged_data,epochs=100,total_examples=model_d2v.corpus_count)
return model_d2v
titles_df dataframe contains columns(id_titles, title, release_year, actors, director, writer, doc)
col_names = ['actors', 'director','writer','release_year']
titles_df['doc'] = titles_df[col_names].apply(lambda x: ' '.join(x.astype(str)), axis=1).str.split()
Code for Top 10 similar movies
def titles_lookup(similar_doc,titles_df):
df = pd.DataFrame(similar_doc, columns =['id_titles', 'simialrity'])
df = pd.merge(df, titles_df[['id_titles','title','release_year']],on='id_titles',how='left')
print(df)
def demo_d2v_title(model,titles_df, id_titles):
similar_doc = model.docvecs.most_similar(id_titles)
titles_lookup(similar_doc,titles_df)
def demo(model,titles_df):
print('hunt for red october')
demo_d2v_title(model,titles_df, 'tt0099810')
The output for Top 10 similar movies for film - "hunt for red october"
id_titles similarity title release_year
0 tt0105112 0.541722 Patriot Games 1992.0
1 tt0267626 0.524941 K19: The Widowmaker 2002.0
2 tt0112740 0.496758 Crimson Tide 1995.0
3 tt0052151 0.471951 Run Silent Run Deep 1958.0
4 tt1922685 0.464007 Phantom 2013.0
5 tt0164184 0.462187 The Sum of All Fears 2002.0
6 tt0058962 0.459588 The Bedford Incident 1965.0
7 tt0109444 0.456760 Clear and Present Danger 1994.0
8 tt0063121 0.455807 Ice Station Zebra 1968.0
9 tt0146309 0.452572 Thirteen Days 2001.0
you can see from the output that i'm still getting old movies. Please help me how to solve that.
Thanks in advance.

Doc2Vec only knows text-similarity; it doesn't have the idea of other fields.
So if you want to discard matches according to some criteria other than text-similarity, that's only represented external to the Doc2Vec model, you'll have to do that in a separate step.
So, you could use .most_similar() with a topn=len(model.docvecs) parameter - to get back all moviews, ranked. Then, filter that result-set by discarding any whose year is too-far from your desired year. Then, trim that result-set to the top N that you really want.

How to order a list of URLs in Google Spreadsheet in a cascade structure?

Let’s say I have a list of URLs like this:
https://moz.com/
https://moz.com/about
https://moz.com/about/contact
https://moz.com/about/jobs
https://moz.com/beginners-guide-to-seo
https://moz.com/blog
https://moz.com/blog/category/advanced-seo
https://moz.com/blog/advanced-seo/technical
https://moz.com/blog/advanced-seo/content
https://moz.com/blog/googles-walled-garden
https://moz.com/blog/local-search-ranking-factors-survey-results-2017
https://moz.com/explorer
https://moz.com/help
https://moz.com/help/guides
https://moz.com/help/guides/moz-pro-overview
And I wanted it to be displayd in different columns according to the depth of the structure. Like each part of the URL is a level in the sites hierarchy and I want to visualize the hierarchy as such:
https://moz.com/
https://moz.com/about
https://moz.com/about/contact
https://moz.com/about/jobs
https://moz.com/beginners-guide-to-seo
https://moz.com/blog
https://moz.com/blog/advanced-seo
https://moz.com/blog/advanced-seo/technical
https://moz.com/blog/advanced-seo/content
https://moz.com/blog/googles-walled-garden
https://moz.com/blog/local-search-ranking-factors-survey-results-2017
https://moz.com/explorer
https://moz.com/help
https://moz.com/help/guides
https://moz.com/help/guides/moz-pro-overview
How can I do this? I have already tried utilizing the split function for this but that does not work because it just splits the different parts of the URL into different columns and not the whole URL into the accordant column.
Thanks in advance :)

Suppose the range with links is A:A:
Put the formula =ArrayFormula(COUNTIF(REGEXMATCH(A2,$A$1:A1), true))*1 in B2 and drag it down
Put the formula =REPT(" ",B1)&A1 in C1 and drag it down.
Edit1
Here's the single formula to do the same:
=ARRAYFORMULA(rept(" ",MMULT(
--(REGEXMATCH(A1:A15,TRANSPOSE(OFFSET(A1:A15,1,)))),SIGN(A1:A15<>""))-1)&A1:A15)
Edit2
this is a brilliant solution thank you alot. However it seems I run
into problems with sites that include .html at the very end
(moz.com/about.html but moz.com/about/contact.html and so on). Any
ideas how to bypass that?
=ARRAYFORMULA(rept(" ",MMULT(--(REGEXMATCH(A1:A15,REGEXREPLACE(TRANSPOSE(OFFSET(A1:A15,1,)),"\.html$",""))),
SIGN(A1:A15<>""))-1)&A1:A15)
Notes:
the formula also replaces ".html" from the end of a string.

NetLogo - exchanging variable values across links or agentsets (not within breeds)

I am building a NetLogo model that tries to explain how agents find information they need to make a decision by bumping into other agents as they move about a space. There are three types of agents, each type has their own behavior rules and interact with their environment in different ways. The three types of agents, however, are all part of the same organization [Organization A].
The code below shows the breed names and the kinds of variables I'm using.
breed [Implementers Implementer];; Member of Organization A
breed [IMDeployeds IMDeployed];; Member of Organization A
breed [IMremotes IMremote];; Member of Organization A
... [other breeds]
Turtles-own [exchangeinfo holdinfo inforelevant infoarray
taskcomplexity done];; is an info exchange going to happen, does the
turtle have info, is info relevant, and then an array
extensions [array]
globals [complete routinemeeting]
I want to do three things:
1&2 Create a mechanism that joins the IMRemotes to the IMDeployeds and the IMDeployeds to the Implementers.
(I've already tried creating links - I'm not sure that the mechanism does what I want the third thing to do:)
3: Periodically check in with the agents that are linked together to cross-check variable values so that "information" can be "exchanged". Code I have for when agents are in the same space and can use "turtles-here" is below:
ask Implementers [
ifelse any? other Implementers [have-info-same-team] [change-location]
ifelse any? IMDeployeds-here [have-info-same-team] [change-location]
end
to have-info-same-team
ifelse any? turtles-here with [holdinfo > 0] [checkarray9] [change-
location]
end
to checkarray9
ifelse any? other turtles-here with [array:item infoarray 9 > 0]
[array:set infoarray 9 1 set holdinfo 1 checkarray8][checkarray8]
end
[etc, checking each position from 9 down to 0 in the array until you've gotten all the new information from that agent that you need]
When I try to ask my-links to do any of these things [so that the agents in the same organization, but different "functions, if you will, can have purposeful meetings rather than relying on being in the same space with each other to communicate], I'm told that the procedure is a "turtle only" procedure or that infoarray is a turtle-only variable.
Any help or suggestions greatly appreciated!

Rather than asking the links to do these things, you want to ask the turtles at the other end of the links. I don't know if you have created directed or undirected links, but something like
ask turtle-set [other-end] of my-out-links [do something]
or
ask my-out-links [ask other-end [do something]]
will make the ask of the turtles at the other end of the links to this turtle. (Note that [other-end] of my-out-links yields a list of turtles rather than a turtleset, thus the use of turtle-set to turn the list into a turtleset. my-out-links seems to work with both directed and undirected links. See http://ccl.northwestern.edu/netlogo/docs/dictionary.html#my-out-links.)
Hope this helps,
Charles

How can I iterate through a hash created with Rails' Hash#to_xml?

I'm working with thetvdb.com API to get episode listings for a show. In general, the XML format is something like:
<Data>
<Series>...</Series>
<Episode><EpisodeName>Foo</EpisodeName><EpisodeNumber>1</EpisodeNumber></Episode>
<Episode><EpisodeName>Bar</EpisodeName><EpisodeNumber>2</EpisodeNumber></Episode>
</Data>
What I do is to parse the XML using Hash.from_xml and then process the data. To iterate through the episodes, I do something like:
hash_data['Data']['Episode'].each do ...
This works great if there are multiple episodes. But if there's only one episode, the each method actually iterates through the hash entries for that single particular episode, rather than just running the each method once. That breaks all of my code following it.
I tried:
hash_data['Data']['Episode'].to_a.each do ...
with the same results. There must be a "right" way to do this?
UPDATE: I thought this question was fairly clear, but it appears people are confused by it. To clarify, I'm really trying to just iterate through the episodes and look at the contents. The data is initially received as XML, so in order to examine it in Ruby, I convert it to a hash using Hash.from_xml(xml_response).
In terms of "expected behaviour", take this example:
hash_data['Data']['Episode'].each do |e| { puts e['EpisodeNumber'] }
I would expect that given this initial data:
<Data>
<Series>...</Series>
<Episode><EpisodeName>Foo</EpisodeName><EpisodeNumber>1</EpisodeNumber></Episode>
<Episode><EpisodeName>Bar</EpisodeName><EpisodeNumber>2</EpisodeNumber></Episode>
</Data>
The output would be:
1
2
That works. However, if I'm given input like this:
<Data>
<Series>...</Series>
<Episode><EpisodeName>Foo</EpisodeName><EpisodeNumber>1</EpisodeNumber></Episode>
</Data>
I get a crash, because e['EpisodeNumber'] is not valid. It's not valid because in the case of only one episode, the each actually iterates through each key of the Hash (so the first value coming into the each block is a key-value pair of EpisodeName) instead of being an array of Hashes as it was when there was more than one element.
In other words, when there are multiple episodes, hash_data['Data']['Episode'] is an Array of Hash types. When there is only one episode, it's just a Hash. My code would work properly if, when there was one episode, it was still an Array, but with only one item in it. But that's not the case. How can I deal with this properly?
I hope that clears it up?
UPDATE 2: It's been requested that I post Hash#inspect for the returned data. Here it is for a show with a single episode:
{"Data"=>{"Series"=>{"id"=>"263752", "Actors"=>"||", "Airs_DayOfWeek"=>"Thursday", "Airs_Time"=>"10pm", "ContentRating"=>nil, "FirstAired"=>"2013-01-17", "Genre"=>"|Game Show|Reality|", "IMDB_ID"=>"tt2401129", "Language"=>"en", "Network"=>"TBS Superstation", "NetworkID"=>nil, "Overview"=>"Hosted by Robert Carradine and Curtis Armstrong, King of the Nerds is the ultimate nerd-off. The series will follow eleven fierce competitors from across the nerd spectrum as they set out to win $100,000 and be crowned the greatest nerd of them all.\n\nKing of the Nerds will take the glory of geekdom to a whole new level as the eleven competitors live together in \"Nerdvana.\" Each week, they must face challenges that will test their intellect, ingenuity, skills and pop culture prowess. In each episode, the nerds will first compete as teams and then as individuals, facing challenges that range from live gaming to a dance-off to life-sized chess. One competitor will be eliminated each week until one nerd stands alone as the ultimate champion off all things nerdy.", "Rating"=>nil, "RatingCount"=>"0", "Runtime"=>"60", "SeriesID"=>nil, "SeriesName"=>"King of the Nerds", "Status"=>"Continuing", "added"=>"2012-10-31 21:53:29", "addedBy"=>"348252", "banner"=>"graphical/263752-g2.jpg", "fanart"=>"fanart/original/263752-1.jpg", "lastupdated"=>"1357501598", "poster"=>nil, "zap2it_id"=>nil}, "Episode"=>{"id"=>"4428487", "Combined_episodenumber"=>"1", "Combined_season"=>"1", "DVD_chapter"=>nil, "DVD_discid"=>nil, "DVD_episodenumber"=>nil, "DVD_season"=>nil, "Director"=>nil, "EpImgFlag"=>nil, "EpisodeName"=>"Welcome to the Nerdvana", "EpisodeNumber"=>"1", "FirstAired"=>"2013-01-17", "GuestStars"=>nil, "IMDB_ID"=>nil, "Language"=>"en", "Overview"=>nil, "ProductionCode"=>nil, "Rating"=>nil, "RatingCount"=>"0", "SeasonNumber"=>"1", "Writer"=>nil, "absolute_number"=>nil, "filename"=>nil, "lastupdated"=>"1357501766", "seasonid"=>"504427", "seriesid"=>"263752"}}}
Notice that Episode is a Hash type.
Here it is for a show with more than one episode:
{"Data"=>{"Series"=>{"id"=>"220441", "Actors"=>"||", "Airs_DayOfWeek"=>"Saturday", "Airs_Time"=>"8:30PM", "ContentRating"=>"TV-PG", "FirstAired"=>"2010-12-25", "Genre"=>"|Children|Drama|", "IMDB_ID"=>"tt1765510", "Language"=>"en", "Network"=>"The Hub", "NetworkID"=>nil, "Overview"=>nil, "Rating"=>"7.0", "RatingCount"=>"1", "Runtime"=>"30", "SeriesID"=>nil, "SeriesName"=>"R L Stine's The Haunting Hour", "Status"=>"Continuing", "added"=>"2011-01-10 15:59:43", "addedBy"=>"66501", "banner"=>"graphical/220441-g.jpg", "fanart"=>"fanart/original/220441-1.jpg", "lastupdated"=>"1354439519", "poster"=>"posters/220441-1.jpg", "zap2it_id"=>nil}, "Episode"=>[{"id"=>"3453441", "Combined_episodenumber"=>"1", "Combined_season"=>"1", "DVD_chapter"=>nil, "DVD_discid"=>nil, "DVD_episodenumber"=>nil, "DVD_season"=>nil, "Director"=>nil, "EpImgFlag"=>"2", "EpisodeName"=>"Really You (Part 1)", "EpisodeNumber"=>"1", "FirstAired"=>"2010-10-29", "GuestStars"=>"|Bailee Madison|Connor Price|", "IMDB_ID"=>nil, "Language"=>"en", "Overview"=>"A girl named Lilly (Bailee Madison) is given her very own life-sized \"Really You\" doll which is named Lilly D.; because she is good at manipulating her dad. Lilly remains a spoiled brat, bragging about Lilly D, even going as far as ripping the leg off a friends doll, after the friend informs Lilly that \"Lilly D hates Lilly\". Soon after, strange events begin to occur which Lilly's mother accuses Lilly of doing; despite how Lilly maintains she is innocent, and that Lilly D is alive.", "ProductionCode"=>nil, "Rating"=>"8.0", "RatingCount"=>"1", "SeasonNumber"=>"1", "Writer"=>nil, "absolute_number"=>nil, "filename"=>"episodes/220441/3453441.jpg", "lastupdated"=>"1350772755", "seasonid"=>"393441", "seriesid"=>"220441"}, ...
Notice Episode is now an Array of Hash types.

It sounds like Rails' Array#wrap method would solve your problem. I believe this should work:
Array.wrap(hash_data['Data']['Episode']).each do ...
Documentation here.

Use Hash Key/Value Methods
Rather than Enumerator#each, you probably want to use Hash#each_key, Hash#each_value, or Hash#each_pair to iterate through your hash.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to keep only duplicate values from another list? - google-sheets

use: =FILTER(A:A, COUNTIF(B:B, A:A))

Related

Splunk Streamlined search for specific fields only

How to handle release year difference in movie recommendation

How to order a list of URLs in Google Spreadsheet in a cascade structure?

NetLogo - exchanging variable values across links or agentsets (not within breeds)

How can I iterate through a hash created with Rails' Hash#to_xml?

Categories

Resources