Getting started with supply graph analysis

Accessing Big Data with Cyberfame

Supply chain security and dependency management are critical concerns for modern software development, as they involve managing the complex network of dependencies that underlie modern software systems. Cyberfame was built exactly for this purpose.

In today's intricate, agile software landscape, identifying scalable and data driven methods to secure software supply chains is crucial.

Graph Theory and Dependency Supply Chain

In math and computer science, graphs are commonly used to represent complex relationships between different entities, such as neural networks, computer networks, or in our case supply chains.

Cyberfame is build on-top of Cypher, a powerful query language specifically designed for graph databases like Neo4j. Learn more about Neo4j here.

Understanding Complex Supply Chain Relationships and Security Ratings for Non-Technical Users

Supply Chain Security Problem Today: Non-technical users often struggle to understand security ratings and complex supply chain relationships, hindering informed decision-making.

Solution: Cyberfame's graph interface visually represents supply chain relationships with color-coded nodes and user-friendly security ratings, making insights accessible to non-technical users.

You are not required to understand any graph theoretical concepts to gain invaluable knowledge by using Cyberfame WebApp today.

Want to get started immediately? See Our Pro and Unlimited Subscription Tiers.

Protecting the Organization from Cyber Threats Targeting the Software Supply Chain

Supply Chain Security Problem Today: Organizations need to safeguard against cyber threats targeting the software supply chain to prevent potential damage.

Solution: Cyberfame offers enhanced visibility into supply chain security with its dynamic graph interface, allowing organizations to proactively detect and address potential risks. Users can apply graph theoretical concepts like Betweenness Centrality, Degree Centrality, PageRank Algorithm and more.

Below we will provide some practical examples of working with Cyberfame to assist in software supply chain security.

1. Centrality in a Supply Chain of Organization

First, let's explore the concept of betweenness centrality to identify assets that serve as critical intermediaries between other assets in the network. By securing these key assets, organizations can minimize the risk of cyber threats targeting the software supply chain.

For example, attackers might compromise a critical intermediary asset to gain access to multiple downstream assets, bypassing more secure core components. Mitigating such threats is difficult without understanding the interconnections between dependencies.

Query:

MATCH (e:Entity)
WHERE e.name CONTAINS $owner + "/"
MATCH (e)-[r:HAS_DEPENDENCY]->(d:repo)
WITH d, COUNT(r) as ndeps
ORDER BY ndeps DESC
LIMIT 50
RETURN d.name as name, d.score as score, ndeps

Want a detailed line-by-line breakdown of this query? See "Centrality in a Supply Chain of Organization" Query Breakdown.

Running the query

As you may have noticed, this Cypher query is not returning the entire graph, sub-graph(s) or graph nodes. Instead, we are returning a set of data that resembles rows in traditional SQL databases. Therefore, it is more appropriate to use a tool that is better suited for reviewing this type of data, rather than a graph visualization tool. In our case - it's Neo4j Browser.

As an example target organization, we will choose ansible.

Want to try it out for yourself? See Cyberfame Unlimited Plan.

2. Centrality in a Global Supply Chain

Now, let's explore the concept of betweenness centrality in the global dependency chain of open-source. With this example query, we can identify repositories that are the most central in the dependency network by querying repositories with the highest number of dependents - i.e. projects that use them. By analyzing the ratings of these central repositories, we can gain insights into the overall health and security of the entire network.

Query:

MATCH ()-[r:HAS_DEPENDENCY]->(d:repo)
WITH d, COUNT(r) as ndeps
ORDER BY ndeps DESC
LIMIT 200
RETURN d.url as url, d.score as score, ndeps

Want a detailed line-by-line breakdown of this query? See "Centrality in Global Supply Chain" Query Breakdown.

Running the query

Here we will again use Neo4j Browser tool as it's the most appropriate for given query.

Want to try it out for yourself? See Cyberfame Unlimited Plan.

3. PageRank Algorithm for a Global Supply Chain

PageRank Algorithm determines the most influential nodes within the dependency graph, highlighting critical dependencies requiring special attention.

For example, an attacker may target a highly influential node to maximize the impact of their attack, affecting numerous downstream dependencies. It helps identify such critical nodes, which are hard to spot without proper visibility into the entire dependency network.

Query:

CALL gds.pageRank.stream('deps-graph')
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS d, score
RETURN d.name as name, score
ORDER BY score DESC
LIMIT 200

Want a detailed line-by-line breakdown of this query? See "PageRank Algorithm for a Global Supply Chain" Query Breakdown.

Running the query

Here we will again use Neo4j Browser tool as it's the most appropriate for given query.

Want to try it out for yourself? See Cyberfame Unlimited Plan.

Identifying and Assessing Vulnerabilities in Dependencies and Their Extended Supply Chain

Supply Chain Security Problem Today: Identifying vulnerabilities in dependencies throughout the extended supply chain is challenging, leading to increased risk.

Solution: Cyberfame supports comprehensive scanning of dependencies, providing tools to ensure that potential threats and vulnerabilities are detected. By leveraging graph theory and concepts like Shortest Path Analysis and Degree Centrality to reveal potential attack vectors and helping prioritize remediation efforts.

Attackers often follow the path of least resistance, targeting less important but interconnected auxiliary components. Shortest Path Analysis helps in detecting these paths and blocking potential entry points for attackers, which is difficult to achieve without a clear understanding of the dependency network.

1. Isolating High-Risk Dependencies

For the purpose of this example, let's define "high-risk" dependencies as any dependency up to (and including) third order with a score value of less than 3/10. Since a higher-order dependency can have multiple connections to our target project, we will identify only the shortest path to any such dependency and connect it once.

Want to learn more about our ratings? See Source Code Repository Scanning & Rating.

To accomplish this, we will write the following Cypher query:

MATCH (e:Entity {url: $url})
OPTIONAL MATCH p=shortestPath((e)-[*..3]->(d:repo))
WHERE d.score < 3
RETURN p

Want a detailed line-by-line breakdown of this query? See "Isolating High-Risk Dependencies" Query Breakdown.

Running the query

After saving the query to be run, we are ready to proceed. Our first target project will be ansible/awx repository.

Want to try it out for yourself? See Cyberfame Unlimited Plan.

2. Risk Assessment for GitHub Organization

Identifying high-risk dependencies can be very valuable for individual projects. However, on an organizational level, it may be beneficial to expand the scope and assess high-risk dependencies throughout the entire GitHub organization.

So, Instead of writing a query for an individual repository, we can modify it to search through any repository related to our target organization, "ansible":

MATCH (e:Entity)
WHERE e.name CONTAINS $owner + "/"
OPTIONAL MATCH p=shortestPath((e)-[*..3]->(d:repo))
WHERE d.score < 3
RETURN p

Want a detailed line-by-line breakdown of this query? See "Risk Assessment for GitHub Organization" Query Breakdown.

Running the query

Having saved the query, we can proceed and type in our target GitHub organization - ansible.

Want to try it out for yourself? See Cyberfame Unlimited Plan.

3. Querying by a Score Component

In this section, we will explore rating and perform granular querying on specific sub-parts of the score, in addition to the overall score. This approach can help identify weaknesses in dependencies or focus on particular areas for improving security.

As an example, we will use the vulnerabilities score as an additional filtering parameter, along with the overall score. This will allow for more precise analysis and identification of potential security threats in your supply chain.

Want to learn more about how we rate? See Source Code Repository Scanning & Rating.

Query:

MATCH (r:repo|Entity)
WHERE r.score < 4 AND r.sc_vulnerabilities_score < 2
RETURN r.url, r.score, r.sc_vulnerabilities_score
ORDER BY r.sc_vulnerabilities_score ASC
LIMIT 1000

Want a detailed line-by-line breakdown of this query? See "Querying by a Score Component" Query Breakdown.

Running the query

Here we are again not returning the graph, so Neo4j Browser will be used appropriately both for running the query and viewing the result.

Want to try it out for yourself? See Cyberfame Unlimited Plan.

4. Querying by a Score Component in a Specific Repository

In order to secure a specific repository, it might be necessary to query based on a particular score component. When focusing on a known repository of interest, additional information, such as the dependency degree (distance from the target repository), can be obtained through the query.

The overall score threshold may be set higher as we concentrate on a specific score component and some correlation with the vulnerability score, that is going to be there anyways, is sufficient. For this example, we will use the repository ansible/awx as our target.

Query:

MATCH (e:Entity {url: "github.com/ansible/awx"})
MATCH (e)-[r*..3]->(d:repo)
WHERE d.score < 5 AND d.sc_vulnerabilities_score < 2
RETURN DISTINCT d.url, size(r) as distance, d.score, d.sc_vulnerabilities_score
ORDER BY distance ASC
LIMIT 200

Want a detailed line-by-line breakdown of this query? See "Querying by a Score Component in a Specific Repository" Query Breakdown.

Running the query

Here we are again not returning the graph, so Neo4j Browser will be used appropriately both for running the query and viewing the result.

Want to try it out for yourself? See Cyberfame Unlimited Plan.

5. Degree Centrality in a Global Supply Chain

Degree Centrality measures the number of direct connections an asset has, indicating its importance in the supply network. This helps pinpoint critical dependencies for focused attention.

For example, instead of attacking an organization's core asset, an attacker could target a widely used but less obvious helper library or utility. Degree Centrality helps identify such dependencies, allowing organizations to prioritize their security efforts.

Query:

CALL gds.degree.stream('deps-graph')
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS d, score
RETURN d.name as name, score
ORDER BY score DESC
LIMIT 200

Want a detailed line-by-line breakdown of this query? See "Degree Centrality in a Global Supply Chain" Query Breakdown.

Running the query

Here we are again not returning the graph, so Neo4j Browser will be used appropriately both for running the query and viewing the result.

Want to try it out for yourself? See Cyberfame Unlimited Plan.

Conclusions

In summary, graph theory provides a powerful framework for understanding and analyzing the complex network of dependencies that underlie modern software systems. By applying these concepts, security researchers and developers can identify weak points and vulnerabilities in the supply chain, and take steps to improve the security and resilience of their software systems.

Last updated