How does Data Platform perform deduplication with two or more sources. Can you change which source has precedence, can you change precedence priority.
This is a email from my customer explaining what they found.
"
A question came up this morning in some discussions regarding how Data Platform performs deduplication that we would like to see if you all can help us get some clarification on.
We understand the concept of Data Source Precedence and in a scenario where asset data is coming in from two different data sources, the process is configured to assign a level of precedence to those sources. This means that if an asset record is available in the data source with the higher precedence, then the inventory data from that data source is used, while other data is left out of the deduplicated results.
The question is about whether there is any chance that the deduplication decision could be based on the currency of the data as well. For example, if “Data Source A” has a higher precedence but “Data Source B” has more current data, could the data from “Data Source B” be used for the deduplicated results?
The scenario is especially important due to what I’m referring to as a “loophole” in the deduplication logic. What we found was an asset that is showing in the Tanium discovery data, but the device inventory has not been updated in 2 years (most likely because the Tanium software client was either disabled or removed). The same asset is also showing in the SCCM data but is current within the past week. In this case, due to the higher precedence that we have set for Tanium, the asset appears that it has not been on the network in 2 years. You can see the dilemma.
It just so happens that the server we found that has this problem is the Central Admin Server for the enterprise SCCM solution. So, when you look at our consolidated discovery data, it appears the SCCM primary server is not active on the network. I’m certain there are more in this same “loophole”. It would be EXTREMELY difficult to explain to someone who is looking at our data to get information about servers supporting enterprise solutions."
May 15, 2023 08:28 AM
Hi @terobinson
You can click the up and down triangle arrows to adjust the priority.
The closer to the top of the priority list, the higher the priority
May 15, 2023 02:53 PM - edited May 15, 2023 02:54 PM
I understand that but is there a way to set priority for single setting or only the entire list.
May 15, 2023 03:09 PM
That's for the entire list. And that's the only way we currently have to adjust the precedence priority.
May 15, 2023 03:15 PM