linxx's blog

druid讲座from siyun

OVERVIEW

scalable
distributed
realtime DATA STORE [QUERY LEYER]
iteractive data analysis

DESIGN

segment(聚合和查询都是through segment)
node:broker(接受查询请求,实现路由)

  • coordinator(通知节点去加载数据)
  • historical(受coordinator通知,去指定的地方load segment)
  • overload(组织构建segment,建筑工程师)
  • indexing service(create druid segment)
  • realtime processing()
    dependencies(deep storage,zookeeper(政府,接受报建),metadata storage)

REALTIME

QUREY TIME
DATA FRESHNESS
TIMELINE FOR REALTIME PROCESS(ingest[内存里面,可以马上查询]->persist[内存不够]->merge[过了阀值,不会再接受,把持续化的小碎片变成大的]->handsoff[])

COLUMN ORIENTION

DATA LOSS

  • realtime
  • patch your segments[所有的数据从streaming过来也要从batch过来]
  • using kafka [read at least one,实时数据会被保留在kafka的pipe里]
  • approximation is good enough[少一两条数据没有关系]

PRACTICE

  • gitlab kubernetes/enn-kubernetes/task/druid2

WHAT DRUID CAN DO

  • 对错误有容忍度的

WHAT DRUID CAN NOT DO

  • 对错误没有容忍度的