OVERVIEW
scalable
distributed
realtime DATA STORE [QUERY LEYER]
iteractive data analysis
DESIGN
segment(聚合和查询都是through segment)
node:broker(接受查询请求,实现路由)
- coordinator(通知节点去加载数据)
- historical(受coordinator通知,去指定的地方load segment)
- overload(组织构建segment,建筑工程师)
- indexing service(create druid segment)
- realtime processing()
dependencies(deep storage,zookeeper(政府,接受报建),metadata storage)
REALTIME
QUREY TIME
DATA FRESHNESS
TIMELINE FOR REALTIME PROCESS(ingest[内存里面,可以马上查询]->persist[内存不够]->merge[过了阀值,不会再接受,把持续化的小碎片变成大的]->handsoff[])
COLUMN ORIENTION
DATA LOSS
- realtime
- patch your segments[所有的数据从streaming过来也要从batch过来]
- using kafka [read at least one,实时数据会被保留在kafka的pipe里]
- approximation is good enough[少一两条数据没有关系]
PRACTICE
- gitlab kubernetes/enn-kubernetes/task/druid2
WHAT DRUID CAN DO
- 对错误有容忍度的
WHAT DRUID CAN NOT DO
- 对错误没有容忍度的