{"id":1908,"date":"2020-02-10T12:59:25","date_gmt":"2020-02-10T10:59:25","guid":{"rendered":"http:\/\/www.ludovicocaldara.net\/dba\/?p=1908"},"modified":"2020-08-18T15:58:37","modified_gmt":"2020-08-18T13:58:37","slug":"understand-through-graphs","status":"publish","type":"post","link":"https:\/\/www.ludovicocaldara.net\/dba\/understand-through-graphs\/","title":{"rendered":"Understand your Database through graphs"},"content":{"rendered":"<p>During the last months I have had to deal with highly consolidated databases, where there was no basic rule to achieve something maintainable. Over many years (some schemas are 30 years old) the number of schemas and their dependencies became a problem up to a point where basic operations like patching or upgrading were very complex to achieve.<\/p>\n<p>In order to try to split these big databases and move towards Oracle Multitenant, I have created some graphs that helped me and my colleagues to understand the connections between schemas (grants) and databases (db links).<\/p>\n<p>&nbsp;<\/p>\n<blockquote class=\"twitter-tweet\">\n<p dir=\"ltr\" lang=\"en\">Attacking the big beast with math algorithms to get a scientific approach on how to split it. Each color will likely be a separated PDB. Thanks <a href=\"https:\/\/twitter.com\/sandeshr?ref_src=twsrc%5Etfw\">@sandeshr<\/a> and <a href=\"https:\/\/twitter.com\/HeliFromFinland?ref_src=twsrc%5Etfw\">@HeliFromFinland<\/a> for the inspiration given by your ML talks \ud83d\ude42 <a href=\"https:\/\/t.co\/LQln0DwpQn\">pic.twitter.com\/LQln0DwpQn<\/a><\/p>\n<p>\u2014 Ludovico Caldara (@ludodba) <a href=\"https:\/\/twitter.com\/ludodba\/status\/1186610217141846017?ref_src=twsrc%5Etfw\">October 22, 2019<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<blockquote class=\"twitter-tweet\">\n<p dir=\"ltr\" lang=\"en\">All our &#8220;working&#8221; database links in a single image\ud83d\ude33 <a href=\"https:\/\/t.co\/uxjLwmjUpa\">pic.twitter.com\/uxjLwmjUpa<\/a><\/p>\n<p>\u2014 Ludovico Caldara (@ludodba) <a href=\"https:\/\/twitter.com\/ludodba\/status\/1189539992437252096?ref_src=twsrc%5Etfw\">October 30, 2019<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p><strong>How\u00a0 have I created the graphs?<\/strong><\/p>\n<p>I used <a href=\"https:\/\/gephi.org\/\">Gephi<\/a> , an open source software to generate graphs. Gephi is very powerful, I feel I have used just 1% of its capabilities.<\/p>\n<p>How to create a graph, depends mostly on what you want to achieve and which data you have.<\/p>\n<p>First, some basic terminology: <strong>Nodes<\/strong> are the &#8220;dots&#8221; in the graph, <strong>Edges<\/strong> are the lines connecting the dots. Both nodes and edges can have properties (e.g. edges have weight), but you might not need any.<\/p>\n<p><strong>Basic nodes and edges without properties<\/strong><\/p>\n<p>If you need just to show the dependencies between nodes, a basic edge list with source-&gt;target will be enough.<\/p>\n<p>For example, you can have a edge list like this one: <a href=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_1_edges.csv\">gephi_1_edges.csv<\/a><\/p>\n<p>Open Gephi, go to <em>New Project<\/em><strong>,\u00a0<\/strong><em>File -&gt; Import Spreadsheet<\/em>, select the file. If you already have a workspace and you want to add the edges to the same workspace, select\u00a0<em>Append to existing workspace<\/em>.<\/p>\n<p><a href=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/import_edges.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1912\" src=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/import_edges.png\" alt=\"\" width=\"782\" height=\"487\" srcset=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/import_edges.png 782w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/import_edges-300x187.png 300w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/import_edges-768x478.png 768w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/import_edges-482x300.png 482w\" sizes=\"auto, (max-width: 782px) 100vw, 782px\" \/><\/a>This will lead to something very basic:<a href=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_basic.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-1914\" src=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_basic-1024x616.png\" alt=\"\" width=\"584\" height=\"351\" srcset=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_basic-1024x616.png 1024w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_basic-300x180.png 300w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_basic-768x462.png 768w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_basic-500x300.png 500w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_basic.png 1468w\" sizes=\"auto, (max-width: 584px) 100vw, 584px\" \/><\/a><\/p>\n<p>In the\u00a0<em>Data Laboratory<\/em> tab, you might want to copy the value of the ID column to the label column, so that they match:<\/p>\n<p><a href=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_copy_column.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-1916\" src=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_copy_column-1024x812.png\" alt=\"\" width=\"584\" height=\"463\" srcset=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_copy_column-1024x812.png 1024w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_copy_column-300x238.png 300w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_copy_column-768x609.png 768w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_copy_column-378x300.png 378w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_copy_column.png 1114w\" sizes=\"auto, (max-width: 584px) 100vw, 584px\" \/><\/a><\/p>\n<p>Now you must care about two things:<\/p>\n<p>First, re-arranging the nodes. With few nodes this is often not required, but when there are many, the default visualization is not satisfying. In the tab <em>Overview<\/em> , pane <em>Layout<\/em> there are a few algorithms you can choose. For big graphs I prefer <em>Force Atlas<\/em>. There are a few parameters to tune, especially\u00a0 the attraction\/repulsion strengths and the gravity. Speed is also crucial if you have many nodes. For this small example I put<em> Repulsion Strengt<\/em>h to 2000, <em>Attraction Strength<\/em> to 1. Clicking on\u00a0<em>Run<\/em> starts the algorithm to rearrange the nodes, which with few edges will almost instantaneous (don&#8217;t forget to stop it afterwards).<\/p>\n<p>Here is what I get:<\/p>\n<p><a href=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_force_atlas.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-1915\" src=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_force_atlas-1024x812.png\" alt=\"\" width=\"584\" height=\"463\" srcset=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_force_atlas-1024x812.png 1024w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_force_atlas-300x238.png 300w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_force_atlas-768x609.png 768w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_force_atlas-378x300.png 378w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/01\/gephi_force_atlas.png 1114w\" sizes=\"auto, (max-width: 584px) 100vw, 584px\" \/><\/a>Now that the nodes are in place, in the preview pane I can adjust the settings, like showing labels and changing colors. Also, in the <em>Appearance <\/em>pane I can change the scheme to have for example colors based on ranking.<\/p>\n<p>In this example, I choose to color based on ranking (nodes with more edges are darker).<\/p>\n<p><a href=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_basic_result.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1917\" src=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_basic_result.png\" alt=\"\" width=\"800\" height=\"633\" srcset=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_basic_result.png 800w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_basic_result-300x237.png 300w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_basic_result-768x608.png 768w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_basic_result-379x300.png 379w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<p>I also set the <em>Preset\u00a0<\/em>Default Straight,\u00a0<em>Show labels <\/em>(with smaller size) , <em>proportional size<\/em>.<\/p>\n<p><strong>Adding nodes properties<\/strong><\/p>\n<p>Importing edges from CSV gives only a dumb list of edges, without any information about nodes properties. Having properties set might be important in order to change how the graph is displayed.<\/p>\n<p>By importing a node list containing the properties of each node , I can add important information. In this example file, I have columns Id, Label and Sex, that I will use to color the nodes differently: <a href=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_1_nodes.csv\">gephi_1_nodes.csv<\/a><\/p>\n<p><a href=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_basic_result_by_sex.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1922\" src=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_basic_result_by_sex.png\" alt=\"\" width=\"928\" height=\"774\" srcset=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_basic_result_by_sex.png 928w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_basic_result_by_sex-300x250.png 300w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_basic_result_by_sex-768x641.png 768w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_basic_result_by_sex-360x300.png 360w\" sizes=\"auto, (max-width: 928px) 100vw, 928px\" \/><\/a><\/p>\n<p>In the appearance node, I have just selected to partition by <em>sex<\/em> with a meaningful color.<\/p>\n<h2><strong>Using real metadata to understand schemas or dependencies&#8230;<\/strong><\/h2>\n<p>I will take, as an example, the dependencies in a database between objects of type VIEW, MATERIALIZED VIEW and TABLE. The database has quite a usage of materialized views and understanding the relation is not always easy.<\/p>\n<p>This is the query that interests me:<\/p>\n<pre class=\"lang:plsql decode:true \">SELECT\r\n    owner,\r\n    name,\r\n    type,\r\n    referenced_owner,\r\n    referenced_name,\r\n    referenced_type\r\nFROM\r\n    dba_dependencies\r\nWHERE\r\n    type IN ('TABLE', 'MATERIALIZED VIEW', 'VIEW')\r\n    AND referenced_type IN ('TABLE', 'MATERIALIZED VIEW', 'VIEW')\r\n    AND owner IN (\r\n        SELECT username FROM dba_users\r\n        WHERE oracle_maintained = 'N'\r\n    );<\/pre>\n<p>So I need the nodes, for that I need a UNION to get nodes from both sides of the dependency. The best tool to achieve this is SqlCl as it has the native CSV output format:<\/p>\n<pre class=\"lang:plsql decode:true \">set sqlformat CSV\r\nset feedback off\r\nspool nodes.csv\r\nSELECT\r\n    owner || '.' || name || ' (' || type || ')' AS \"Id\",\r\n    owner || '.' || name || ' (' || type || ')' AS \"Label\",\r\n    owner,\r\n    name,\r\n    type\r\nFROM\r\n    dba_dependencies\r\nWHERE\r\n    type IN ('TABLE', 'MATERIALIZED VIEW', 'VIEW')\r\n    AND referenced_type IN ('TABLE', 'MATERIALIZED VIEW', 'VIEW')\r\n    AND owner IN (\r\n        SELECT  username FROM  dba_users  WHERE oracle_maintained = 'N'\r\n    )\r\nUNION\r\nSELECT\r\n    referenced_owner || '.' || referenced_name || ' (' || referenced_type || ')' AS \"Id\",\r\n    referenced_owner || '.' || referenced_name || ' (' || referenced_type || ')' AS \"Label\",\r\n    referenced_owner,\r\n    referenced_name,\r\n    referenced_type\r\nFROM\r\n    dba_dependencies\r\nWHERE\r\n    type IN ('TABLE', 'MATERIALIZED VIEW', 'VIEW')\r\n    AND referenced_type IN ('TABLE', 'MATERIALIZED VIEW', 'VIEW')\r\n    AND owner IN (\r\n        SELECT  username FROM  dba_users  WHERE oracle_maintained = 'N'\r\n    )\r\n;\r\nspool off<\/pre>\n<p>The edge list:<\/p>\n<pre class=\"lang:plsql decode:true \">spool edges.csv\r\nSELECT\r\n    owner || '.' || name || ' (' || type || ')' AS \"Source\",\r\n    referenced_owner || '.' || referenced_name || ' (' || referenced_type || ')' AS \"Target\"\r\nFROM\r\n    dba_dependencies\r\nWHERE\r\n    type IN ('TABLE', 'MATERIALIZED VIEW', 'VIEW')\r\n    AND referenced_type IN ('TABLE', 'MATERIALIZED VIEW', 'VIEW')\r\n    AND owner IN (\r\n        SELECT username FROM dba_users WHERE oracle_maintained = 'N'\r\n    );\r\nspool off<\/pre>\n<p>Using the very same procedure as above, it is easy to generate the graph.<\/p>\n<p>I am interested in understanding what is TABLE, what is VIEW and what MATERIALIZED VIEW, so I partition the color by type. I also set the edge color to <em>source<\/em> so the edge will have the same color of the source node type.<\/p>\n<p>I am also interested in highlighting which tables have more incoming dependencies, so I rank the node size by <em>In-Degree<\/em>.<\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_mv_graph.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-1923\" src=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_mv_graph-1024x827.png\" alt=\"\" width=\"584\" height=\"472\" srcset=\"https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_mv_graph-1024x827.png 1024w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_mv_graph-300x242.png 300w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_mv_graph-768x620.png 768w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_mv_graph-371x300.png 371w, https:\/\/www.ludovicocaldara.net\/dba\/wp-content\/uploads\/2020\/02\/gephi_mv_graph.png 1342w\" sizes=\"auto, (max-width: 584px) 100vw, 584px\" \/><\/a><\/p>\n<p>In the graph:<\/p>\n<ul>\n<li>All the red dots are MVIEWS<\/li>\n<li>All the blue dots are VIEWS<\/li>\n<li>All the black dots are TABLES<\/li>\n<li>All the red lines are dependencies between a MVIEW and a\u00a0 (TABLE|VIEW).<\/li>\n<li>All the blue lines are dependencies between a VIEW and a\u00a0 (TABLE|MVIEW).<\/li>\n<li>The bigger the dots, the more incoming dependencies.<\/li>\n<\/ul>\n<p>With the same approach I can get packages, grants, roles, db_links, client-server dependencies, etc. to better understand the infrastructure.<\/p>\n<p>I hope you like it!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>During the last months I have had to deal with highly consolidated databases, where there was no basic rule to achieve something maintainable. Over many years (some schemas are 30 years old) the number of schemas and their dependencies became &hellip; <a href=\"https:\/\/www.ludovicocaldara.net\/dba\/understand-through-graphs\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[321,335,326,3],"tags":[],"class_list":["post-1908","post","type-post","status-publish","format-standard","hentry","category-aced","category-graph-ai-ml","category-oracle","category-oracledb"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/posts\/1908","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/comments?post=1908"}],"version-history":[{"count":2,"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/posts\/1908\/revisions"}],"predecessor-version":[{"id":1925,"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/posts\/1908\/revisions\/1925"}],"wp:attachment":[{"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/media?parent=1908"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/categories?post=1908"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/tags?post=1908"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}