Programmers have their own novel style; however, it is not practically easy to find out a cue from a lot of lines of code about the author of the program. But, now it is not that difficult because of the advent of machine learning.
Scientists have developed a machine learning program that has the capability to finding out the programmers, even from thousands of lines of code. As shared with Wired, an algorithm named approach train can recognize the structure of the coding done by the programmers, in reference with their earlier work, and use the data to interpret the common practice the programmer follow during coding. You do not need to hand over the complete source file for this; a small chunk of data can do wonders.
During a test performed on the outcome of the Code jam of Google, the new algorithm was found relatively precise. Having 600 coders and 8 samples of code from every single program, the algorithm was right 83$ of time.
The technology can be a great invention for the investigators. It can be a very effective application for differentiating between the malware coders, especially in the time when the perpetrators are trying to create an illusion that it has been done by someone else. This as well is very applicable in case of plagiarism, where the algorithm can tell the difference that the content has similarities which are pure coincidence or it is an obvious copying.
However, it can be somewhat risky and has a dark side tool. Though it is effective in finding out the code origin, the invention will make it difficult to code with effective secrecy.
Any implementation of the code in the near future needs to consider and make a line between the requirement of one’s privacy and need of security.