63d24e947b296f85b9ae75a6-【原创】复杂网络分析案例:社会网络分析案例报告(附代码数据)

上传人:痛*** 文档编号:192216349 上传时间:2023-03-06 格式:PDF 页数:6 大小:621.78KB
返回 下载 相关 举报
63d24e947b296f85b9ae75a6-【原创】复杂网络分析案例:社会网络分析案例报告(附代码数据)_第1页
第1页 / 共6页
63d24e947b296f85b9ae75a6-【原创】复杂网络分析案例:社会网络分析案例报告(附代码数据)_第2页
第2页 / 共6页
63d24e947b296f85b9ae75a6-【原创】复杂网络分析案例:社会网络分析案例报告(附代码数据)_第3页
第3页 / 共6页
点击查看更多>>
资源描述
【原创】R 语言数据分析可视化案例报告论文(附代码数据)有问题到淘宝找“大数据部落”就可以了 社交网络复杂网络分析 我们中的一些人期待着圣诞节的到来,我们中的一些人期待着星球大战系列中的新电影原力觉醒。同时,我决定去看看从定量的角度全 6-movie 循环提取星球大战的社会网络,在每个电影和整个宇宙星球大战。在社会网络的结构揭示了原三部曲和前传之间的一些惊人的差异。*更新:阅读我对第七集星球大战:原力觉醒的分析。如果你对我如何提取数据的技术细节感兴趣,那就来看看我是如何做分析部分的。让我们从一些可视化开始。这是所有6部电影联合起来的社交网络:You can open the network in a full window which will show an interactive visualization of the network where you can drag individual nodes around.If you hover over the individual nodes,youll see the name of the corresponding character.Here the nodes represent characters in the movies.The characters are connected by a link if they both speak in the same scene.And the more the characters speak together,the thicker the link between them.The size of each node corresponds to the total number of scenes the character appears in.I made a few arguable decisions though:Anakin and Darth Vader are represented by two separate nodes,because this distinction is important to the story.On the other hand,the Emperor node also jointly represents Palpatine and Darth Sidious.I also merged Amidala with Padme.The original trilogy(episodes IV,V and VI)on the right is mostly separated in the network from the prequel trilogy on the left because most characters appear only in one of the trilogies.The crucial nodes that are connecting the two networks are Obi-Wan Kenobi,R2-D2 and C-3PO.Especially the robots seem to play an important social function because they appear frequently across all the movies.The structures of the two sub-networks are also different.s The original trilogy has fewer important nodes(Luke,Han,Leia,Chewbacca,Darth Vader)and they are【原创】R 语言数据分析可视化案例报告论文(附代码数据)有问题到淘宝找“大数据部落”就可以了 densely interconnected between themselves.The prequel trilogy has more nodes overall,with many more connections.Ill look at individual films in more detail later in the post.您可以在一个完整的窗口中打开网络,这将显示一个网络的交互式可视化,您可以在其中拖动单个节点。如果在单个节点上悬停,您将看到对应字符的名称。在这里,节点代表电影中的人物。如果两个角色在同一个场景中讲话,则这些字符是通过链接连接的。字符越多,它们之间的联系就越紧密。每个节点的大小对应于角色出现的场景总数。我做了一些有争议的决定,尽管 Anakin 和达斯维德是由两个独立的节点来表示,因为这种区分是重要的故事。另一方面,皇帝结还共同代表帕尔帕廷和达斯西迪厄斯。我也将阿米达拉与 Padme。原三部曲(情节 IV、V 和 VI)右边的是网络中的主要分离左侧的前传三部曲因为只有其中的三部曲出现最多的人物。这是连接两个网络Obi Wan Kenobi的关键节点,R2-D2和C-3PO。特别是机器人似乎发挥着重要的社会功能,因为它们经常出现在所有的电影中。两个子网络的结构也不同。少的原三部曲的重要节点(卢克,汉族,莱娅、巴卡、达斯维德),他们彼此之间的紧密结合。前传三部曲中有更多的节点的整体,更多的连接。我将在后面的文章中更详细地看各个电影。Character timelines Many of the characters feature in multiple movies,so I also created a comparison of their timelines across the individual episodes.The following graphics shows where the individual characters are mentioned in the film scripts.In order of appearance,these are the timelines of some of the main characters:许多角色在多部电影中都有特点,所以我也在各个情节中创建了他们的时间线比较。下面的图形显示电影脚本中提到的各个字符的位置。在外观上,这是一些主要人物的时间线:Here I included all mentions of each character,which includes other characters discussing their name.It is interesting to see how Anakin appears simultaneously with Darth Vader during Episode III,and then Darth Vader takes over.Anakin again reappears towards the end of Episode VI when Darth Vader turns away from the Dark side.The characters that appear most consistently across all the films are the same ones that are in the centre of the social network-Obi-Wan,C-3PO and R2-D2.Yoda and the Emperor also appear across all of the films but they dont talk directly with many people in the original trilogy,which moves them off the centre in the social network.在这里,我提到了每个角色的所有提到,其中包括其他字符讨论他们的名字。很有趣的是Anakin 在第三集时是如何与达斯维德同时出现的,然后达斯维德接手了。Anakin 在第六集结束时又重新出现了,达斯维德转身离开黑暗面。出现最一致的所有的电影都是在社会网络的中心的 Obi Wan 一样的人物,C-3PO 和 R2-D2。【原创】R 语言数据分析可视化案例报告论文(附代码数据)有问题到淘宝找“大数据部落”就可以了 Yoda and the Emperor 也出现在所有的电影,但他们不说话直接与原三部曲里的许多人,使他们在社会网络的中心。Networks in individual films 个人电影中的网络 Now lets look at the networks in individual films.Notice how the number of nodes and complexity of the networks change between the prequels and the original movies.Again,a link appears between characters if they speak within the same scene.现在让我们看看个人电影中的网络。注意:节点和网络的复杂性的数量变化的前传和电影之间。同样,如果字符在同一场景中说话,则会出现字符之间的链接。Importance of characters The individual networks again show that the prequel trilogy has more characters and more interactions overall.The original episodes have less characters,but they interact more with each other.George Lucas said:It really is the story of the tragedy of Darth Vader,and it starts when hes nine,and it ends when hes dead.(source)But is Darth Vader/Anakin really the central character?Lets use some methods from network analysis to see who is really important in the stories and their social structures.I computed two measures of importance in the networks for each of the films:重要人物 个人网络再次表明,前传三部曲有更多的人物和更多的互动的整体。原剧集的角色较少,但它们之间的互动更多。乔治卢卡斯说:“这确实是达斯维德悲剧的故事,故事从他九岁开始,他死后就结束了。”但是达斯维达/阿纳金真的是主角吗?让我们使用网络分析的一些方法,看看谁在故事和他们的社会结构中真的很重要。我计算了两个衡量网络中每一个电影的重要性:Degree centrality-this is simply the number of connections the node has in the network.In the Star Wars movies,this corresponds to the total number of scenes where each character speaks.度中心性-这仅仅是节点在网络中的连接数量。在星球大战电影中,这相当于每个角色说话的场景总数。Betweenness-this measure looks at how many shortest paths in the network lead through the node.For example,imagine you are Leia and you want to send a message to Greedo-the shortest path how to send it is via Han Solo,because he interacted both with Leia and with Greedo.On the【原创】R 语言数据分析可视化案例报告论文(附代码数据)有问题到淘宝找“大数据部落”就可以了 other hand if you want to send a message to Luke,you dont have to go through Han because Leia knows Luke directly.The betweenness centrality for Han is computed using the number of shortest paths between all other characters that pass through him.中间性-这项措施着眼于网络中有多少条最短路径通过该节点。例如,假设你是莱娅和你想发送一个消息给 Greedo 的最短路径如何发送它通过 Han Solo,因为他有 Leia 和 Greedo 的互动。另一方面,如果你想发送一个消息给卢克,你不必经过 Han 因为莱娅知道卢克直接。Han 使用通过他的所有其他字符之间的最短路径数来进行计算。The two measures both show how important is a character in the network.The degree centrality shows how many people does each character interact with directly.The betweenness relates more to how integral each of the characters is to the story.Characters with high betweenness connect different areas of the social network.For both measures,higher values mean more importance.Here are the top 5 characters for each movie:这两项措施都显示了网络中人物的重要性。度中心表示每个字符直接与多少人交互。这中间涉及到如何整体的每一个人物的故事。中介性连接的社会网络的不同的区域。对于这两种度量,更高的值意味着更重要。下面是每部电影的前 5 个字符:It seems that Anakin is overall the most connected character in the first three films,based on his degree.He is however not very integral to the relations in the films!His betweenness score is so small he never makes it to the top-5 characters.This means that all the other characters interact directly between themselves rather than through Anakin.How do the same measures look for the original trilogy?根据他的学位,Anakin 似乎是前三部电影中最有联系的人物。然而,他并不是电影中关系中不可或缺的人物!他的得分是如此之小,他从不介了五大人物。这意味着所有其他字符直接在自己之间进行交互,而不是通过 Anakin。同样的方法是如何寻找原始三部曲的?【原创】R 语言数据分析可视化案例报告论文(附代码数据)有问题到淘宝找“大数据部落”就可以了 Here both the centrality measures show very similar results-Luke is the most central character across all the films,and using both measures.The order of characters based on the two measures is almost the same.The centrality analysis quantifies some of the things we could see from the social networks.The prequel trilogy has more complex social structures,with more interconnected characters.This also leads to the fact that Anakin is not that central to the story-some of the storylines happen alongside Anakins story,or involve Anakin only on the side.On the other hand,the original trilogy has a more tight-knit structure.There is a smaller number of central characters and they bind the story together-this results into the agreement between the degree and betweenness centrality measures.Perhaps this is part of the reason why the original trilogy is more popular-the plots are more consistent and driven by the main characters.The prequels have a more decentralized structure and no clear hero.Although the stories are linked by Anakin,he is not binding the other characters together.How do the measures look when we look at the full social network from all the episodes together?I looked at two variants of the network.In the first one Anakin and Darth Vader appear as two separate individuals,in the second I merged them together into a single person.在这里,两个中心测量都显示出非常相似的结果卢克是所有电影中最核心的人物,同时使用这两种方法。基于这两个度量的字符顺序几乎相同。中心性分析量化了我们从社交网络中可以看到的一些东西。前传三部曲有更复杂的社会结构,与更多的关联特征。这也导致 Anakin 不是一些情节发生在 Anakin 的故事中,或是 Anakin只能在一旁。另一方面,原始三部曲的结构更加紧凑。有少量的中心人物和他们的故事一起绑定结果进度和中介中心性措施之间的协议。也许这是原始三部曲更受欢迎的部分原因情节更为一致,由主要人物推动。前传有更分散的结构并没有明确的英雄。虽然这些故事是由 Anakin 联系在一起的,但他并没有把其他的人物连在一起。当我们从所有的剧集中看完整的社交网络时,这些措施是怎么看的?我看了两种不同的网络。第一个阿纳金和达斯维德是两个独立的个体,第二个我把他们合并成一个人。【原创】R 语言数据分析可视化案例报告论文(附代码数据)有问题到淘宝找“大数据部落”就可以了 If we look at Anakin and Darth Vader separately,Anakin is still the most connected character but hes not central to the network.If we merge them together,things improve a bit.Now Darth Vader/Anakin is the third most important character in terms of betweenness.Overall,the social networks seem to show that the Star Wars movies are actually linked together by Obi-Wan Kenobi rather than Darth Vader.如果我们分别看 Anakin 和达斯维德,Anakin 仍然是最有联系的人物,但他不是网络的核心人物。如果我们把它们合并在一起,情况就会有所改善。现在,达斯维德/阿纳金从中间最重要的第三个字。总的来说,社会网络似乎表明,星球大战电影实际上是由 Obi Wan Kenobi而不是达斯维德联系在一起的。How I did the analysis As this is part of the F#Advent calendar,I used F#for most of the analysis.I combined it together with D3.js for the social network visualizations,and with R for the network centrality analysis.You can find all the source code on my GitHub.Because the whole code turned out to be relatively long,here I look only at some of the more interesting parts.我是怎么做分析的?这是 F#日历的一部分,我用 F#大部分的分析。我把它连同社会网络可视化 d3.js,和 R 的网络中心性分析。你可以在我的 github 找到所有的源代码。因为整个代码比较长,所以我只看一些更有趣的部分。Parsing the screenplays I started by downloading all the scripts for the 6 Star Wars movies.They are freely available from The Internet Movie Script Database(IMSDb),for example heres the script for Episode IV:The New Hope.The screenplays are only in the form of drafts that sometimes differ from the actual films-the differences are however not very big.我开始下载6星球大战电影的所有脚本。他们可以自由地从互联网电影数据库(imsdb脚本),例如这里的插曲四脚本:新的希望。剧本只在有时与实际电影的差异却不很大的汇票的形式。
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 管理文书 > 施工组织


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!