Storm概念详解和工作原理，topology、spout、bolt的细节和API讲解之一-白红宇

Storm概念详解和工作原理，topology、spout、bolt的细节和API讲解之一

阅读量：6835 次

发布时间：2019-06-26

本文共 8361 字，大约阅读时间需要 27 分钟。

storm

datasource -->bolt

| |

bolt--> 有向无环图bolt

storm与传统数据库区别

传统数据库先存后计算，而storm则是先算后存甚至不存

传统关系数据库很难部署实时计算，只能部署定时任务统计分析窗口数据

关系型数据库注重事务，并发控制，相对storm来说比较简陋

storm【速度】与hadoop【海量数据】，spark【内存计算框架】等流行的大数据方案

核心代码clojure实用程序python，使用java开发拓扑

wordcount逻辑

sentence spout -->split sentence bolt -->word count bolt -->report bolt

[root target]# cd ~/soft

[root soft]# ls

maven Python-2.7.2 storm-0.9.1 storm-starter zookeeper-3.3.6

[root soft]# cd zookeeper-3.3.6

[root zookeeper-3.3.6]# ls

bin contrib ivysettings.xml NOTICE.txt zookeeper-3.3.6.jar

build.xml data ivy.xml README.txt zookeeper-3.3.6.jar.asc

CHANGES.txt dist-maven lib recipes zookeeper-3.3.6.jar.md5

conf docs LICENSE.txt src zookeeper-3.3.6.jar.sha1

[root zookeeper-3.3.6]# cd bin

[root@localhost bin]# ls

README.txt zkCli.cmd zkEnv.cmd zkServer.cmd zookeeper.out

zkCleanup.sh zkCli.sh zkEnv.sh zkServer.sh

[root@localhost bin]# ./zkCli.sh

[zk: localhost:2181(CONNECTED) 0] ls /

[storm, zookeeper]

[zk: localhost:2181(CONNECTED) 1]

storm管理命令

storm rebalance增加节点之后/activate/deactivate/kill 拓扑名

topology运行流程

storm提交后，会把代码首先存放到nimbus节点的inbox目录下，之后会把当前storm

运行配置生成一个stormconf.ser文件放到nimbus节点的stormdist目录下，在此目录下还有序列化后的toplogy代码文件

2.在设定topology所关联的spout和bolts时，可以同时设置当前spout和bolt的

executor数目和task数目，默认情况下，一个topology的task总和是executor的总和一致的，之后，系统根据workerd的数

尽量平均这些task的执行，work在哪个supervisor节点运行是由本身决定的

3.任务分配好后，niimbus节点会将任务信息提交到zoopeeker集群，同时在zoopecker集群中会有workerbeats

节点，这里存储了所有worker进程的心跳信息

4supervisor节点会不断的轮训zookeeper集群，在zookeeper的assignment节点保存了所有toplogy的任务分配信息

代码存储目录之间的关联关系，supervisor通过轮训此节点的内容，来领取自己的任务，启动worker进程

5.一个topogy运行之后，就会不断的通过spout来发送spout流，通过bolts来不断处理接收的stream流，stream流式误解的

本地运行的提交方式

LocalCluster cluster=new LocalCluster();

cluster.submitTopology(TOPLOGY_NAME,conf,builder.createTopology())

Thread.sleep(2000)

cluster.shutdown();

分布式提交方式：

StormSubmitter。submitToplogy(TOPLOGY_NAME,conf,builder.createTopology())

topology的运行

需要注意的是，在storm代码编写完成之后，需要打成jar包放在nimbus中运行

打包的时候不要加依赖的包，否者会出现重复的配置文件，因为他运行之前会加载

本地的storm。yaml配置文件

storm jar StormTology.jar mainclass

storm守护进程

nimbus

toUI

DRPC

JAR storm jar topology_jar topology_class args

jar是用于提交集群拓扑他运行topology_class main方法，上传jar到nimbus，由nimbus发布到集群

一旦提交，storm会激活拓扑并开始处理topology_class main方法，main方法会调用stormsubmit.submittopology

方法，并且提供一个唯一的拓扑名，若这个名字存在那么失败，常见方法是用命令方法来指定拓扑名称

maven新建一个项目

mvn archetype:create/generate -DgroupId=storm.test -DartifactId=teststorm -DpackageName=cn.dataguru.storm

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

<groupId>storm.test</groupId>

<artifactId>teststorm</artifactId>

<version>0.0.1-SNAPSHOT</version>

<name>teststorm</name>

<url>http://maven.apache.org</url>

<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

</properties>

<groupId>junit</groupId>

<artifactId>junit</artifactId>

</dependency>

<groupId>junit</groupId>

<artifactId>junit</artifactId>

</dependency>

<groupId>storm</groupId>

<artifactId>storm</artifactId>

<scope>provided</scope>

</dependency>

</dependencies>

<id>clojars.org</id>

<url>http://clojars.org/repo</url>

</repository>

</repositories>

<build>

<artifactId>maven-assembly-plugin</artifactId>

<descriptorRef>jar-with-dependencies

</descriptorRef>

</descriptorRefs>

</manifest>

</archive>

</configuration>

<id>make-assembly</id>

<phase>package</phase>

<goals>

<goal>single</goal>

</goals>

</execution>

</executions>

</plugin>

</plugins>

</build>

</project>

package cn.dataguru.storm;

import backtype.storm.topology.BasicOutputCollector;

import backtype.storm.topology.OutputFieldsDeclarer;

import backtype.storm.topology.base.BaseBasicBolt;

import backtype.storm.tuple.Tuple;

public class LearningStormBolt extends BaseBasicBolt {

private static final long serialVersionUID = 1L;

public void execute(Tuple input, BasicOutputCollector collector) {

// fetched the field "site" from input tuple.

String test = input.getStringByField("site");

// print the value of field "site" on console.

System.out.println("Name of input site is : " + test);

}

public void declareOutputFields(OutputFieldsDeclarer delarer) {

}

package cn.dataguru.storm;

import java.util.HashMap;

import java.util.Map;

import java.util.Random;

import backtype.storm.spout.SpoutOutputCollector;

import backtype.storm.task.TopologyContext;

import backtype.storm.topology.OutputFieldsDeclarer;

import backtype.storm.topology.base.BaseRichSpout;

import backtype.storm.tuple.Fields;

import backtype.storm.tuple.Values;

public class LearningStormSpout extends BaseRichSpout {

private static final long serialVersionUID = 1L;

private SpoutOutputCollector spoutOutputCollector;

private static final Map<Integer, String> map = new HashMap<Integer, String>();

static {

map.put(0, "google");

map.put(1, "facebook");

map.put(2, "twitter");

map.put(3, "youtube");

map.put(4, "linkedin");

}

public void open(Map conf, TopologyContext context,

SpoutOutputCollector spoutOutputCollector) {

// Open the spout

this.spoutOutputCollector = spoutOutputCollector;

}

public void nextTuple() {

// Storm cluster repeatedly calls this method to emit a continuous

// stream of tuples.

final Random rand = new Random();

// generate the random number from 0 to 4.

int randomNumber = rand.nextInt(5);

spoutOutputCollector.emit(new Values(map.get(randomNumber)));

}

public void declareOutputFields(OutputFieldsDeclarer declarer) {

declarer.declare(new Fields("site"));

}

package cn.dataguru.storm;

import backtype.storm.Config;

import backtype.storm.LocalCluster;

import backtype.storm.generated.AlreadyAliveException;

import backtype.storm.generated.InvalidTopologyException;

import backtype.storm.topology.TopologyBuilder;

public class LearningStormTopology {

public static void main(String[] args) throws AlreadyAliveException,

InvalidTopologyException {

// create an instance of TopologyBuilder class

TopologyBuilder builder = new TopologyBuilder();

// set the spout class

builder.setSpout("LearningStormSpout", new LearningStormSpout(), 2);

// set the bolt class

builder.setBolt("LearningStormBolt", new LearningStormBolt(), 4)

.shuffleGrouping("LearningStormSpout");

Config conf = new Config();

conf.setDebug(true);

// create an instance of LocalCluster class for

// executing topology in local mode.

LocalCluster cluster = new LocalCluster();

// LearningStormTopolgy is the name of submitted topology.

cluster.submitTopology("LearningStormToplogy", conf,

builder.createTopology());

try {

Thread.sleep(10000);

} catch (Exception exception) {

System.out.println("Thread interrupted exception : " + exception);

}

// kill the LearningStormTopology

cluster.killTopology("LearningStormToplogy");

// shutdown the storm test cluster

cluster.shutdown();

}

mvn install

[root@localhost teststorm]# cd target

[root@localhost target]# ls

archive-tmp test-classes

classes teststorm-0.0.1-SNAPSHOT.jar

maven-archiver teststorm-0.0.1-SNAPSHOT-jar-with-dependencies.jar

maven-status teststorm-1.0-SNAPSHOT.jar

surefire-reports teststorm-1.0-SNAPSHOT-jar-with-dependencies.jar

[root@localhost teststorm]# mvn compile exec:java -Dexec:java -Dexec.classpathScope=compile -Dexec.mainClass=cn.dataguru.storm.LearningStormTopology

cn.dataguru.storm.LearningStormTopology

[root@localhost teststorm]# storm jar teststorm-0.0.1-SNAPSHOT-jar-with-dependencies.jar cn.dataguru.storm.LearningStormTopology

集群方式

conf.setNumWorkers(3);

StormSubmitter.submitTopology("name", conf, builder.createTopology());

以下的注释掉

cluster.submitTopology("LearningStormToplogy", conf,

builder.createTopology());

try {

Thread.sleep(10000);

StormSubmitter.submitTopology("name", conf, builder.createTopology());

} catch (Exception exception) {

System.out.println("Thread interrupted exception : " + exception);

}

// kill the LearningStormTopology

cluster.killTopology("LearningStormToplogy");

// shutdown the storm test cluster

cluster.shutdown();

转载于:https://my.oschina.net/goudingcheng/blog/614300

你可能感兴趣的文章

浏览器事件window.onload、o…

查看>>

对象回收时Weak指针自动被置为nil的实现原理

查看>>

php URLEncode() / php URLEncode函数 php urldecode...

查看>>

phpunit mock

查看>>

NodeJS、NPM安装配置步骤(windows版本)

查看>>

mac常用的命令

查看>>