BitSet的源码研究 - 服务器托管|北京服务器租用|机房托管租用|IDC托管租用|机房机柜带宽租用-价格及费用咨询

这几天看Bloom Filter，因为在java中，并不能像C/C++一样直接操纵bit级别的数据，所以只能另想办法替代：

1）使用整数数组来替代；

2）使用BitSet；

BitSet实际是由“二进制位”构成的一个Vector。如果希望高效率地保存大量“开－关”信息，就应使用BitSet。它只有从尺寸的角度看才有意义；如果希望的高效率的访问，那么它的速度会比使用一些固有类型的数组慢一些。

BitSet的大小与实际申请的大小并不一定一样，BitSet的size方法打印出的大小一定是64的倍数，这与它的实际申请代码有关，假设以下面的代码实例化一个BitSet:

BitSet set =            new            BitSet(           129           );

我们来看看实际是如何申请的：申请源码如下：

`/** * Creates a bit set whose initial size is large enough to explicitly * represent bits with indices in the range 0 through * nbits-1. All bits are initially false. * * @param nbits the initial size of the bit set. * @exception NegativeArraySizeException if the specified initial size * is negative. */ public BitSet( int nbits) { // nbits can't be negative; size 0 is OK if (nbits`

/**          


                      * Creates a bit set whose initial size is large enough to explicitly          


                      * represent bits with indices in the range 0 through          


                      * nbits-1. All bits are initially false.          


                      *          


                      * @param     nbits   the initial size of the bit set.          


                      * @exception NegativeArraySizeException if the specified initial size          


                      *               is negative.          


                      */          


                      public            BitSet(           int            nbits) {          


                      // nbits can't be negative; size 0 is OK          


                      if            (nbits

实际的空间是由initWords方法控制的，在这个方法里面，我们实例化了一个long型数组，那么wordIndex又是干嘛的呢？其源码如下：

`/** * Given a bit index, return word index containing it. */ private static int wordIndex( int bitIndex) { return bitIndex >> ADDRESS_BITS_PER_WORD; }`

这里涉及到一个常量ADDRESS_BITS_PER_WORD，先解释一下，源码中的定义如下：

`private final static int ADDRESS_BITS_PER_WORD = 6 ;`

那么很明显2^6=64,所以，当我们传进129作为参数的时候，我们会申请一个long[(129-1)>>6+1]也就是long[3]的数组，到此就很明白了，实际上替代办法的1）和2）是很相似的：都是通过一个整数（4个byte或者8个byte）来表示一定的bit位，之后，通过与十六位进制的数进行and,or,~等等操作进行Bit位的操作。

接下来讲讲其他比较重要的方法

1）set方法，源码如下：

`/** * Sets the bit at the specified index to true. * * @param bitIndex a bit index. * @exception IndexOutOfBoundsException if the specified index is negative. * @since JDK1.0 */ public void set( int bitIndex) { if (bitIndex`

/**          


                      * Sets the bit at the specified index to true.          


                      *          


                      * @param     bitIndex   a bit index.          


                      * @exception IndexOutOfBoundsException if the specified index is negative.          


                      * @since     JDK1.0          


                      */          


                      public            void            set(           int            bitIndex) {          


                      if            (bitIndex

这个方法将bitIndex位上的值由false设置为true,解释如下：

我们设置的时候很明显是在改变long数组的某一个元素的值，首先需要确定的是改变哪一个元素，其次需要使用与或操作改变这个元素，在上面的代码中，首先将bitIndex>>6，这样就确定了是修改哪一个元素的值，其次这里涉及到一个expandTo方法，我们先跳过去，直接看代码：

`words[wordIndex] \|= (1L`

这里不是很好理解，要注意：需要注意的是java中的移位操作会模除位数，也就是说，long类型的移位会模除64。例如对long类型的值左移65位，实际是左移了65%64=1位。所以这行代码就等于：

`int transderBits = bitIndex % 64 ; words[wordsIndex] \|= (1L`

上面这样写就很清楚了。

与之相对的一个方法是：

`/** * Sets the bit specified by the index to false. * * @param bitIndex the index of the bit to be cleared. * @exception IndexOutOfBoundsException if the specified index is negative. * @since JDK1.0 */ public void clear( int bitIndex) { if (bitIndex = wordsInUse) return ; words[wordIndex] &= ~(1L`

/**          


                      * Sets the bit specified by the index to false.          


                      *          


                      * @param     bitIndex   the index of the bit to be cleared.          


                      * @exception IndexOutOfBoundsException if the specified index is negative.          


                      * @since     JDK1.0          


                      */          


                      public            void            clear(           int            bitIndex) {          


                      if            (bitIndex = wordsInUse)          


                      return           ;          


                      


                      words[wordIndex] &= ~(1L

这段代码理解上与set大同小异,主要是用来设置某一位上的值为false的。

上面有个方法，顺带着解释一下：

expandTo方法：

`/** * Ensures that the BitSet can accommodate a given wordIndex, * temporarily violating the invariants. The caller must * restore the invariants before returning to the user, * possibly using recalculateWordsInUse(). * @param wordIndex the index to be accommodated. */ private void expandTo( int wordIndex) { int wordsRequired = wordIndex+ 1 ; if (wordsInUse`

/**          


                      * Ensures that the BitSet can accommodate a given wordIndex,          


                      * temporarily violating the invariants.  The caller must          


                      * restore the invariants before returning to the user,          


                      * possibly using recalculateWordsInUse().          


                      * @param   wordIndex the index to be accommodated.          


                      */          


                      private            void            expandTo(           int            wordIndex) {          


                      int            wordsRequired = wordIndex+           1           ;          


                      if            (wordsInUse

这里面又有个参数wordsInUse,定义如下：

`/** * The number of words in the logical size of this BitSet. */ private transient int wordsInUse = 0 ;`

根据其定义解释，这个参数表示的是BitSet中的words的逻辑大小。当我们传进一个wordIndex的时候，首先需要判断这个逻辑大小与wordIndex的大小关系，如果小于它，我们就调用方法ensureCapacity:

`private void ensureCapacity( int wordsRequired) { if (words.length`

也就是说将words的大小变为原来的两倍，复制数组，标志sizeIsSticky为false,这个参数的定义如下：

`/** * Whether the size of "words" is user-specified. If so, we assume * the user knows what he's doing and try harder to preserve it. */ private transient boolean sizeIsSticky = false ;`

执行完这个方法后，我们可以将wordsInUse设置为wordsRequired。（换句话说，BitSet具有自动扩充的功能）

2）get方法：

`/** * Returns the value of the bit with the specified index. The value * is true if the bit with the index bitIndex * is currently set in this BitSet; otherwise, the result * is false. * * @param bitIndex the bit index. * @return the value of the bit with the specified index. * @exception IndexOutOfBoundsException if the specified index is negative. */ public boolean get( int bitIndex) { if (bitIndex`

/**          


                      * Returns the value of the bit with the specified index. The value          


                      * is true if the bit with the index bitIndex          


                      * is currently set in this BitSet; otherwise, the result          


                      * is false.          


                      *          


                      * @param     bitIndex   the bit index.          


                      * @return    the value of the bit with the specified index.          


                      * @exception IndexOutOfBoundsException if the specified index is negative.          


                      */          


                      public            boolean            get(           int            bitIndex) {          


                      if            (bitIndex

这里主要是最后一个return语句，

`return (wordIndex`

只有当wordIndex越界，并且wordIndex上的wordIndex上的bit不为0的时候，我们才说这一位是true.

3）size()方法：

`/** * Returns the number of bits of space actually in use by this * BitSet to represent bit values. * The maximum element in the set is the size - 1st element. * * @return the number of bits currently in this bit set. / public int size() { return words.length BITS_PER_WORD; }`

/**          


                      * Returns the number of bits of space actually in use by this          


                      * BitSet to represent bit values.          


                      * The maximum element in the set is the size - 1st element.          


                      *          


                      * @return  the number of bits currently in this bit set.          


                      */          


           public            int            size() {          


           return            words.length * BITS_PER_WORD;          


           }

这里也有一个常量，定义如下：

`private final static int ADDRESS_BITS_PER_WORD = 6 ; private final static int BITS_PER_WORD = 1`

很明显，BITS_PER_WORD = 64，这里很重要的一点就是，如果使用size来返回BitSet数组的大小，其值一定是64的倍数，原因就在这里

4）与size相似的一个方法：length()源码如下：

`/** * Returns the "logical size" of this BitSet: the index of * the highest set bit in the BitSet plus one. Returns zero * if the BitSet contains no set bits. * * @return the logical size of this BitSet. * @since 1.2 / public int length() { if (wordsInUse == 0 ) return 0 ; return BITS_PER_WORD (wordsInUse - 1 ) + (BITS_PER_WORD - Long.numberOfLeadingZeros(words[wordsInUse - 1 ])); }`

/**          


                      * Returns the "logical size" of this BitSet: the index of          


                      * the highest set bit in the BitSet plus one. Returns zero          


                      * if the BitSet contains no set bits.          


                      *          


                      * @return  the logical size of this BitSet.          


                      * @since   1.2          


                      */          


                      public            int            length() {          


                      if            (wordsInUse ==            0           )          


                      return            0           ;          


                      


                      return            BITS_PER_WORD * (wordsInUse -            1           ) +          


                      (BITS_PER_WORD - Long.numberOfLeadingZeros(words[wordsInUse -            1           ]));          


                      }

方法虽然短小，却比较难以理解，细细分析一下：根据注释，这个方法法返回的是BitSet的逻辑大小，比如说你声明了一个129位的BitSet,设置了第23，45，67位，那么其逻辑大小就是67，也就是说逻辑大小其实是的是在你设置的所有位里面最高位的Index。

这里有一个方法，Long.numberOfLeadingZeros，网上没有很好的解释，做实验如下：

`long test = 1 ; System.out.println(Long.numberOfLeadingZeros(testSystem.out.println(Long.numberOfLeadingZeros(testSystem.out.println(Long.numberOfLeadingZeros(test`

打印结果如下：

60
23
23

也就是说，这个方法是输出一个64位二进制字符串前面0的个数的。

总结：

其实BitSet的源码并不复杂，只要理解其原理，对整数的移位等操作比较熟悉，细心阅读就可以理解。

服务器托管，北京服务器托管，服务器租用 http://www.fwqtg.net
机房租用，北京机房租用，IDC机房托管， http://www.fwqtg.net

相关推荐: 制造业数字化转型要注重哪些方面？

我常说的一句话是：脱离背景谈发展，都是耍流氓。在聊如何实现之前，一定要搞清楚的点是：为什么与目标？制造企业的数字化转型的生命周期很长，不同规模、阶段对应的转型措施都不同，因此想清楚自己企业处于什么阶段？想要实现什么效果？行动上才会更有方向。下面我会为大家分析制…

服务器托管，北京服务器托管，服务器租用，机房机柜带宽租用