Partial and exact matches results in a lot of wasted time chasing false positives
The open source external reference sites for partial and exact matches results in a lot of wasted time chasing false positives since the actual original source is not included in the search results. The product needs to identify the true origin of the source rather than rather than listing all possible matches from the universe of open source projects. From a usability perspective, any match list beyond 10-20 items is impractical to research manually! If the match list is too large, CodeInsight should offer methods of narrowing the list of search items such as:
- Include the ability to sort by earliest occurrence date -- The earliest file contain the source match is more likely to the be the original source code
- Establish code lineage (i.e. graph relationship model)
- Identify if any of the match items include attribution to the original source code
Thanks for the feedback, we appreciate the input. We are always looking for ways to reduce false positives and make the review of scans faster. I sent you information on how you can fine tune snippet matches in my previous post. In v7 we have added a confidence level to our matches so that you can choose at which level you automatically publish items.
I will share you ideas with our product team.
Your comments did not address the issue. Below are some examples to further explain the problem.
Exact Matches Example: Scan detected icon file "Magic_Wand.png" in source code and provides potential matches to 58 open source projects. None of these open source projects were the author of the file which was located at https://www.iconfinder.com/icons/58574/magic_wand_icon along with details on author and license. Since all 58 project contain the exact file, it would be good to establish timeline to know who created the file first to help narrow search. In this specific example, they likely all copied off internet. It would be good enhancement for CodeInsight to index the publicly available image & icons from https://www.iconfinder.com & http://www.softicons.com
Partial Match Example: Scan detected partial code match against >1000 open source projects which all had 95% code match. When there is such as large number of potential matches, it would be good to establish timeline to know who created the file first to help narrow search. We eventually located the author website https://www.wpftutorial.net/PasswordBox.html by searching Google using code snippet of the first couple of lines.