Managed State & Raw State

According to the management mode, Flink divides the State into Managed State and Raw State. The differences between the two states are as follows

  • Management style
    • Managed State is Managed by Flink Runtime, automatic storage, automatic recovery, optimized for memory management. (Checkpoint and Savepoint store this type of state.)
    • Raw State needs to be managed by users and serialized by users (Flink Checkpoint and Savepoint mechanisms are not available, and State tolerance and recovery need to be implemented by users).
  • Supported data structures
    • Managed State supports known data structures, such as values, lists, and maps.
    • Raw State only supports byte arrays, and all State data is converted to binary byte arrays for storage
  • Usage scenarios
    • Managed State applies to most scenarios
    • Raw State You are advised to use the Raw State only when the Managed State is insufficient, for example, when you need to customize an Operator

Keyed State & Operator State

Managed State is the most commonly used State type, which is further divided into two types. The first type is called Keyed State. Operator State (Operator State); Operator State (Operator State); The differences are as follows:

  • Which **Operator ** is available

    • Keyed State can only be used on the **Operator ** of a KeyedStream
    • Operator StateCan be used with all ** operators **, commonly usedsourceOperator, for exampleFlinkKafkaConsumer
  • State correspondence

    • KeyedStreamIn the flowEach (note, not every) keyThe correspondingA Keyed State, aKeyed StateOnly the correspondingA key, bothAre allOne-to-one correspondence
      • If multiple keys in a KeyedStream are assigned to the same **Operator **, then the **Operator ** can access the Keyed State corresponding to these keys. An Operator may correspond to multiple Keyed states, but a Keyed State must correspond to only one Operator
    • One Operator ** corresponds to one Operator State, and one Operator State corresponds to only one Operator **, both of which are one-to-one
  • The degree of parallelism changed

    • Keyed State migrates between instances with the key
    • Operator StateRedistribute. There are two strategies
      • After merging all Operator states, evenly allocate the Operator State to each **Operator **
      • After all Operator states are merged, each **Operator ** is fully allocated
  • access

    • Keyed State needs to be accessed through RuntimeContext
    • Operator StateYou need to implement it yourselfCheckpointedFunctionListCheckpointedinterface
  • Supported data structures

    • Keyed State
      • ValueState: the type ofTThe single-valued state of theKeyBinding, the simplest state, passesupdateMethod to update the value throughvalueMethod to get a value
      • ListState : KeyIs a list that can be passedaddMethod to add a value to a listget()Method returns aIterable<T>To iterate over the state values
      • ReducingState: Each calladd()Method is called when the user passes in a valuereduceFunction, and finally merged into a single state value
      • MapState<UK, UV>: The status value is oneMap, the user passes theputputAllMethod to add elements,get(key)By specifiedkeyTo obtainvalue, the use ofentries(),keys(),values()retrieve
      • AggregatingState<IN, OUT>: Preserves a single value representing an aggregation of all values added to the state. andReducingStateConversely, the aggregation type may be different from the type of the element added to the state. useadd(IN)The added element calls the user-specifiedAggregateFunctionaggregated
      • FoldingState<T, ACC>: Obsolete Recommended useAggregatingStatePreserve a single value that represents an aggregation of all values added to the state. withReducingStateIn contrast, the aggregation type may be different from the element type added to the state. useAdd (T)The added element calls the user-specifiedFoldFunctionFold into aggregate values
    • Operator State
      • ListState

Relationships between several State data structures