Find stolen code in your code base - code-search-engine

We have a large code base that has grown over many years with many different developers. By accident, we found a code snippet that has obviously been taken from an open source project, but without the corresponding license and reference to the origin.
What options are there, to find snippets like these?
I don't want to upload all of our code base to some service on the web and don't want to check manually.
Uploading smaller snippets to some search engine would be acceptable. What search engines are there?
What are best practices?

Related

PHP: One large file or several small files

I self taught myself PHP, so I don't know many of the advantages and disadvantages in programming styles. Recently, I have been looking at large PHP projects, like webERP, Wordpress, Drupal, etc and I have notices they all have a main PHP page that is very large (1000+ lines of code) performing many different functions. Whereas, my projects' pages all seem to be very specific in function and are usually less than 1000 lines. What is the reasoning behind the large page, and are there any advantages over smaller more specific pages?
Thanks for the information.
It's partly about style and partly about readability/relationships. Ideally everything in a single file is related (ex. a class, related operation functions etc.) and unrelated items belong in another file.
Obviously if you are writing something to be included by others making a single file can have its advantages. Such as a condensed version of jQuery, etc.

Why do we need separate ".swift" files for each class?

Wondering if you might be able to answer a very basic beginner question for me. I’m working through the Cocoa + Swift tutorial on Lynda and I’m a little confused about classes/objects.
Basically, I want to know why we have to create a new swift file for each new class we create.
As far as I know, you can create a new class within any .swift file in the project. My question is, why do we have to continually keep creating .swift files for each new class.
I’m wondering why there isn’t just one .swift file called AllClasses.swift that you can create all the classes in, for instance:
Within AllClasses.swift is the following code:
Class FirstClass : NSObject
Class SecondClass : NSObject
Class ThirdClass : NSObject
Class FourthClass : NSObject
As Opposed to:
Within FirstClass.swift is the following code:
Class FirstClass : NSObject
Within SecondClass.swift is the following code:
Class SecondClass : NSObject
Within ThirdClass.swift is the following code:
Class ThirdClass : NSObject
Within FourthClass.swift is the following code:
Class FourthClass : NSObject
I just want to know why we need to separate different code into files if it can be called from within any area of the project. In the case of a Mac application, it seems like almost everything could be done from within the AppDelegate.swift file.
This is a moronic question, but another hurdle that may be making object orientation a hard concept for me to fully grasp.
Maybe I can explain it in a somewhat amusing way:
In the beginning there was no concept of files and all code was in a single entity. Code within such entities was referenced by line numbers. Because everything was in one place it was easy to find what you wanted, even though programs were small. It was better than punch tape and so there was much rejoicing. We gotta do something about loading from cassette though.
But then someone discovered you could break up the code into separate parts called modules which was just as well as software was getting bigger. Man my 10MB hard drive is huge. Each module was a specialist and could call other specialists. It made your code easier to navigate. There was much rejoicing.
But then someone discovered object-orientation (OO) and files were cheap. Programs were so large now people were having a hard time finding that class that modelled the airspeed of an African Swallow in that multiple-class-containing file of 10000+ lines that maybe its time to start putting each class in its own file. Needless to say there was much rejoicing.
Then software had become so large that someone discovered source control which was most important when a team of coding scribes all meditated on a piece of software. Madness ensured for the brotherhood whose careless endeavour to write a program in one file of 30,000+ lines (research on African Swallows had grown to include European Swallows) even with OO, only lead to line conflict after line conflict during their attempts to check in changes into the source control system. There was much burning at the stake. Later revelations lead to breaking up the code into many texts or files was the way to avoid a lynching.
In summary, there is no rule to say you must have one file per class but its a good practice to do so mainly in the event your program grows to any reasonable size or complexity that navigation and maintenance of your code would become an issue if you do not.
It becomes more important when working with a team where as the number of authors working concurrently on any given file, the probability of source code commit conflict rises.
I believe the monks are studying their favourite colours and capital cities of countries now.
Some reasons:
Encapsulation / Access Control. It's a bad practice to contain several classes in the same file as you'll be able to access every single variable / method from that source file even if that is marked as private, as stated in Apple documentation:
Private access restricts the use of an entity to its own defining
source file. Use private access to hide the implementation details of
a specific piece of functionality.
Separating your classes in separate files helps the compiler to build faster. When Swift 1.2 compiler was released, incremental builds were introduced to speed up the build times. Files that are not edited are not compiled again on your next build:
Incremental builds — Source files that haven’t changed will no longer
be re-compiled by default, which will significantly improve build
times for most common cases. Larger structural changes to your code
may still require multiple files to be rebuilt.
Writing code well organized. Together with defining correctly the responsabilities of your classes (divide and conquer) that will help you (and your teammates, if any) to understand who does what and where. And, as commented in other answers, to make your source control management easier to track.
You don't have to define just one class per file, but I would suggest doing so. I recently worked on a project for a client where there were several classes in some source files, and where some classes were defined in files who's names didn't match the class names. (This was in Objective-C, so each "file" was really a pair of files, a .h header file and a .m implementation file, but logically they were one.)
It was confusing as h*ll, and I wasted a fair amount of time fumbling around trying to find things.
Defining one class per file and making your filenames and class names match exactly is a good convention. It's like having each school subject in a separate binder. When you need to find a class you know exactly what file to open to find it.
As good practice:
If the classes are unrelated, lengthy or used independently from other unrelated classes then they should be in separate files.
However, if the classes are tightly coupled with one another and are not lengthy then they could be in the same file.
This post also touches on this subject.
As a newbie I really agree it is difficult to get the classes and inheritance concepts.
But believe it is much better to handle code in separate documents, perhaps using MVC concept, rather than having this code in a single massive document.
My own experience, it clears out the clouds of your code.
I just want to add the observation that Swift's fileprivate access modifier actually sometimes requires putting many classes inside a single source file.
The "one class per source file" doesn't necessarily fit the design of Swift. For this reason, when I need to tightly control which properties I expose, I often have one very large source file for a single API.
The only alternative is to make a separate framework for each API and using internal fields.

Most efficient way to translate a Sitecore website to 4 other languages (Not having the translators in you Sitecore CMS)

I am looking for a good way to translate an excisting Sitecore installation (English language is available) to 4 other languages (Russian , Chinese, Portuguese etc.) A dedicated translation company will translate all texts we deliver to the specified languages, but I'm curious on how other companies set this up. I thought about just exporting all Sitecore items which have to be translated using the Database language Export function in Sitecore and having the translation company edit those files. By just replacing the language tags in the XML we should be able to import this file as the newly created other language, however I'm affraid that this XML structure will be totally useless for a translation company and that they will drown in the codes inside this XML. How can we efficiently do this? Is there any other way then just giving those translation people access to the Sitecore environment and having them edit the languages here? Any Shared Source Module to achieve this? I still have alot of questions, is there anyone with some experience in achieving this?
Your primary options are either the language export/import functionality (as you mention), or a workflow-based solution that integrates with your translation agency's Translation Management System (if they have one -- hopefully they do).
The former is better for the initial translation. Typically, your agency should be able to handle translation of content within XML files. A good one can. If you create all needed language versions beforehand and copy english content into them, it will make the files easier to work with as they'll have tags for the new languages in them already. I've seen the creation of these layers done with Revolver (http://www.codeflood.net/revolver/) but could also be done with custom code or workflow.
For ongoing maintenance of your translated content, you'll probably want to integrate through workflow. Clay Tablet Technologies (http://www.clay-tablet.com/) have a middleware component w/ Sitecore integration that can make this easier, depending on your translation agency. You can also do your own workflow-based integration, with workflow commands that allow your users to send content for translation. Then you'd need some sort of listener that pulls the translated content back in, and continues the workflow.
Hope this helps!
You could also check out Lionbrdige (http://en-us.lionbridge.com/sitecore-and-lionbridge-announce-partnership-to-help-companies-thrive-across-borders.htm) as a solution.
From my own experience our customers normally use the Sitecore import/export function as a first step and then use Lionbridge or Clay Tablet as a service.
One important thing to think about with translations is the ongoing work. The initial translation is rather simple, but the second and so on might be more troublesome. What if different changes has been made in different languages. If local changes were made in the content for sat the french version you couldn´t just send the English version (second translate then) since you would also have to accomodate for the regional changes in the content.
Having worked with literally dozens of Sitecore clients worldwide — and helped get content to and from all the largest, and many smaller translation firms —, I can attest to the ineffeciency of trying to do translation in situ, that is in Sitecore. I liken it to asking an electrician to come over and rewire your house, but as they reach for their toolbox from the truck you tell them, "Nope — you need to do it by hand".
The very best way to manage anything more than a page or two of content for translation is to export it seamlessly. Deliver it to the LSP in a proper format (XML or XLIFF) and, when possible, auto import it to their TMS. Once translated, the content should then flow seamlessly back into Sitecore.
You can code this yourself — but the pitfalls are non-trivial just on the Sitecore side. (If you want intuitive UI's, scalability, and all the features that meet the needs of translation). Let alone the challenges of connecting to the systems LSP's use. (For example, who here knows the relative merits/risks of using SLD's Nexus connector versus their CTA for connecting to TMS?)
As kindly mentioned above, there are commecially available solutions that meet all these needs and more. So if you've got even a modest amount of content — and want to send that to any translation provider of your choice — I'd be happy to discuss how we can help.
The main issue with translation isn't technical at all, the XML export is a simple enough format and all agencies should be able to deal with it with no porblems. as others have suggested, maintenance after the initial translation is slightly more problematic but they also point to tools to achieve this.
The main issue we've found with translation is actually linguistic: how to achieve consistency of phrasing and that matches the original but is sufficiently adjusted to local requirements. Translation companies usually have software to aid this - libraries of of the phrases they translate etc. - working with an exported XML file doesn't provide the context of seeing content in situ. A particular item may be translated correctly and the site consistently, but as each page may be built from multiple items there can easily be conflicts between content as presented.
That makes working with the Sitecore backend (maybe with field security settings to limit ) or in the page editor (possibly pre filling fields with English values) a viable idea.

Fluent mapping verification for Entity Framework 4

Note: This is a follow-up question for this previous question of mine.
Inspired by this blog post, I'm trying to construct a fluent way to test my EF4 Code-Only mappings. However, I'm stuck almost instantly...
To be able to implement this, I also need to implement the CheckProperty method, and I'm quite unsure on how to save the parameters in the PersistenceSpecification class, and how to use them in VerifyTheMappings.
Also, I'd like to write tests for this class, but I'm not at all sure on how to accomplish that. What do I test? And how?
Any help is appreciated.
Update: I've taken a look at the implementation in Fluent NHibernate's source code, and it seems like it would be quite easy to just take the source and adapt it to Entity Framework. However, I can't find anything about modifying and using parts of the source in the BSD licence. Would copy-pasting their code into my project, and changing whatever I want to suit my needs, be legal for non-commercial private or open source projects? Would it be for commercial projects?
I was going to suggest looking at how FluentNH does this, until I got to your update. Anyway, you're already investigating that approach.
As to the portion of your question regarding the BSD license, I'd say the relevant part of the license is this: Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: [conditions follow].
From my reading of that line, you can modify (which would include the removal of any code not relevant to your use cases) the code however you wish, and redistribute it as long as you meet the author's conditions.
Since there are no qualifications on how you may use or redistribute the code or binaries, then you are free to do that however you wish, for any and all applications.
Here and here are descriptions of the license in layman's terms.
I'm always writing simple set of integration tests for each entity. Tests are persisting, selecting, updating and deleting entity. I thing there is no better and easier way to test your mapping and other features of the model (like cascade deletes).

Localization asset management

We have several large products we'd like to integrate with a consistent localization strategy.
We're already doing the right things from a code point of view - ie. strings in resource files.
I'm looking for something that will organize localized strings in a database, and generate the appropriate resource files (ie. .RESX files for .NET, .js files, etc.) during the build process. Ideally, it would also be able to read in the files as well (detecting strings that have been added/removed).
The database would allow us to reuse translations in different products, switch to different technologies, and track what translations are missing in each release.
Has anyone found a good product that handles these requirements? What have others done to manage localized assets?
Found some good links in the answers for this question: Do you know of a good program for editing/translating resource (.rc) files?
There's a number of products which we're now evaluating:
http://www.lingobit.com/
http://www.sisulizer.com/
http://www.multilizer.com/
WinTrans - http://www.schaudin.com/
None of have quite the database-based approach we were initially looking for, but they seem to have the core functionality. Lingobit is an early favorite, but we haven't trialed in too much detail yet. Does anyone have a recommendation between those products (or similar)?
Check out GlobalSight or Alchemy's Catalyst
Catalyst is a standalone translation memory and localization engine that can be used in your build process (and is used by many large software companies). GlobalSight is a relatively new and open source translation database and workflow tool that looks very promising.

Resources