Getting started with supply graph analysis
Accessing Big Data with Cyberfame
Supply chain security and dependency management are critical concerns for modern software development, as they involve managing the complex network of dependencies that underlie modern software systems. Cyberfame was built exactly for this purpose.
In today's intricate, agile software landscape, identifying scalable and data driven methods to secure software supply chains is crucial.
In math and computer science, graphs are commonly used to represent complex relationships between different entities, such as neural networks, computer networks, or in our case supply chains.
Cyberfame is build on-top of Cypher, a powerful query language specifically designed for graph databases like Neo4j. Learn more about Neo4j here.
Supply Chain Security Problem Today: Non-technical users often struggle to understand security ratings and complex supply chain relationships, hindering informed decision-making.
Solution: Cyberfame's graph interface visually represents supply chain relationships with color-coded nodes and user-friendly security ratings, making insights accessible to non-technical users.
You are not required to understand any graph theoretical concepts to gain invaluable knowledge by using Cyberfame WebApp today.
Supply Chain Security Problem Today: Organizations need to safeguard against cyber threats targeting the software supply chain to prevent potential damage.
Solution: Cyberfame offers enhanced visibility into supply chain security with its dynamic graph interface, allowing organizations to proactively detect and address potential risks. Users can apply graph theoretical concepts like Betweenness Centrality, Degree Centrality, PageRank Algorithm and more.
Below we will provide some practical examples of working with Cyberfame to assist in software supply chain security.
First, let's explore the concept of betweenness centrality to identify assets that serve as critical intermediaries between other assets in the network. By securing these key assets, organizations can minimize the risk of cyber threats targeting the software supply chain.
For example, attackers might compromise a critical intermediary asset to gain access to multiple downstream assets, bypassing more secure core components. Mitigating such threats is difficult without understanding the interconnections between dependencies.
Query:
MATCH (e:Entity)
WHERE e.name CONTAINS $owner + "/"
MATCH (e)-[r:HAS_DEPENDENCY]->(d:repo)
WITH d, COUNT(r) as ndeps
ORDER BY ndeps DESC
LIMIT 50
RETURN d.name as name, d.score as score, ndeps
Want a detailed line-by-line breakdown of this query? See "Centrality in a Supply Chain of Organization" Query Breakdown.
As you may have noticed, this Cypher query is not returning the entire graph, sub-graph(s) or graph nodes. Instead, we are returning a set of data that resembles rows in traditional SQL databases. Therefore, it is more appropriate to use a tool that is better suited for reviewing this type of data, rather than a graph visualization tool. In our case - it's Neo4j Browser.
Now, let's explore the concept of betweenness centrality in the global dependency chain of open-source. With this example query, we can identify repositories that are the most central in the dependency network by querying repositories with the highest number of dependents - i.e. projects that use them. By analyzing the ratings of these central repositories, we can gain insights into the overall health and security of the entire network.
Query:
MATCH ()-[r:HAS_DEPENDENCY]->(d:repo)
WITH d, COUNT(r) as ndeps
ORDER BY ndeps DESC
LIMIT 200
RETURN d.url as url, d.score as score, ndeps
Want a detailed line-by-line breakdown of this query? See "Centrality in Global Supply Chain" Query Breakdown.
PageRank Algorithm determines the most influential nodes within the dependency graph, highlighting critical dependencies requiring special attention.
For example, an attacker may target a highly influential node to maximize the impact of their attack, affecting numerous downstream dependencies. It helps identify such critical nodes, which are hard to spot without proper visibility into the entire dependency network.
Query:
CALL gds.pageRank.stream('deps-graph')
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS d, score
RETURN d.name as name, score
ORDER BY score DESC
LIMIT 200
Want a detailed line-by-line breakdown of this query? See "PageRank Algorithm for a Global Supply Chain" Query Breakdown.
Supply Chain Security Problem Today: Identifying vulnerabilities in dependencies throughout the extended supply chain is challenging, leading to increased risk.
Solution: Cyberfame supports comprehensive scanning of dependencies, providing tools to ensure that potential threats and vulnerabilities are detected. By leveraging graph theory and concepts like Shortest Path Analysis and Degree Centrality to reveal potential attack vectors and helping prioritize remediation efforts.
Attackers often follow the path of least resistance, targeting less important but interconnected auxiliary components. Shortest Path Analysis helps in detecting these paths and blocking potential entry points for attackers, which is difficult to achieve without a clear understanding of the dependency network.
For the purpose of this example, let's define "high-risk" dependencies as any dependency up to (and including) third order with a score value of less than 3/10. Since a higher-order dependency can have multiple connections to our target project, we will identify only the shortest path to any such dependency and connect it once.
To accomplish this, we will write the following Cypher query:
MATCH (e:Entity {url: $url})
OPTIONAL MATCH p=shortestPath((e)-[*..3]->(d:repo))
WHERE d.score < 3
RETURN p
Want a detailed line-by-line breakdown of this query? See "Isolating High-Risk Dependencies" Query Breakdown.
After saving the query to be run, we are ready to proceed. Our first target project will be ansible/awx repository.
Identifying high-risk dependencies can be very valuable for individual projects. However, on an organizational level, it may be beneficial to expand the scope and assess high-risk dependencies throughout the entire GitHub organization.
So, Instead of writing a query for an individual repository, we can modify it to search through any repository related to our target organization, "ansible":
MATCH (e:Entity)
WHERE e.name CONTAINS $owner + "/"
OPTIONAL MATCH p=shortestPath((e)-[*..3]->(d:repo))
WHERE d.score < 3
RETURN p
Want a detailed line-by-line breakdown of this query? See "Risk Assessment for GitHub Organization" Query Breakdown.
Having saved the query, we can proceed and type in our target GitHub organization -
ansible
.In this section, we will explore rating and perform granular querying on specific sub-parts of the score, in addition to the overall score. This approach can help identify weaknesses in dependencies or focus on particular areas for improving security.
As an example, we will use the vulnerabilities score as an additional filtering parameter, along with the overall score. This will allow for more precise analysis and identification of potential security threats in your supply chain.
Query:
MATCH (r:repo|Entity)
WHERE r.score < 4 AND r.sc_vulnerabilities_score < 2
RETURN r.url, r.score, r.sc_vulnerabilities_score
ORDER BY r.sc_vulnerabilities_score ASC
LIMIT 1000
Want a detailed line-by-line breakdown of this query? See "Querying by a Score Component" Query Breakdown.
Here we are again not returning the graph, so Neo4j Browser will be used appropriately both for running the query and viewing the result.
In order to secure a specific repository, it might be necessary to query based on a particular score component. When focusing on a known repository of interest, additional information, such as the dependency degree (distance from the target repository), can be obtained through the query.
The overall score threshold may be set higher as we concentrate on a specific score component and some correlation with the vulnerability score, that is going to be there anyways, is sufficient. For this example, we will use the repository
ansible/awx
as our target.Query:
MATCH (e:Entity {url: "github.com/ansible/awx"})
MATCH (e)-[r*..3]->(d:repo)
WHERE d.score < 5 AND d.sc_vulnerabilities_score < 2
RETURN DISTINCT d.url, size(r) as distance, d.score, d.sc_vulnerabilities_score
ORDER BY distance ASC
LIMIT 200
Want a detailed line-by-line breakdown of this query? See "Querying by a Score Component in a Specific Repository" Query Breakdown.
Here we are again not returning the graph, so Neo4j Browser will be used appropriately both for running the query and viewing the result.
Degree Centrality measures the number of direct connections an asset has, indicating its importance in the supply network. This helps pinpoint critical dependencies for focused attention.
For example, instead of attacking an organization's core asset, an attacker could target a widely used but less obvious helper library or utility. Degree Centrality helps identify such dependencies, allowing organizations to prioritize their security efforts.
Query:
CALL gds.degree.stream('deps-graph')
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS d, score
RETURN d.name as name, score
ORDER BY score DESC
LIMIT 200
Want a detailed line-by-line breakdown of this query? See "Degree Centrality in a Global Supply Chain" Query Breakdown.
Here we are again not returning the graph, so Neo4j Browser will be used appropriately both for running the query and viewing the result.
In summary, graph theory provides a powerful framework for understanding and analyzing the complex network of dependencies that underlie modern software systems. By applying these concepts, security researchers and developers can identify weak points and vulnerabilities in the supply chain, and take steps to improve the security and resilience of their software systems.